Category Archives: Research

Test Validity

Advertisements

Validity is often seen as a close companion of reliability. Validity is the assessment of the evidence that indicates that an instrument is measuring what it claims to measure. An instrument can be highly reliable (consistent in measuring something) yet lack validity. For example, an instrument may reliably measure motivation but not valid in measuring income. The problem is that an instrument that measures motivation would not measure income appropriately.

In general, there are several ways to measure validity, which includes the following.

  • Content validity
  • Response process validity
  • Criterion-related evidence of validity
  • Consequence testing validity
  • Face validity

Content Validity

Content validity is perhaps the easiest way to assess validity. In this approach, the instrument is given to several experts who assess the appropriateness or validity of the instrument. Based on their feedback, a determination of the validity is determined.

Response Process Validity

In this approach, the respondents to an instrument are interviewed to see if they considered the instrument to be valid. Another approach is to compare the responses of different respondents for the same items on the instrument. High validity is determined by the consistency of the responses among the respondents.

Criterion-Related Evidence of Validity

This form of validity involves measuring the same variable with two different instruments. The instrument can be administered over time (predictive validity) or simultaneously (concurrent validity). The results are then analyzed by finding the correlation between the two instruments. The stronger the correlation implies the stronger validity of both instruments.

Consequence Testing Validity

This form of validity looks at what happened to the environment after an instrument was administered. An example of this would be improved learning due to test. Since the the students are studying harder it can be inferred that this is due to the test they just experienced.

Face Validity

Face validity is the perception that the students have that a test measures what it is supposed to measure. This form of validity cannot be tested empirically. However, it should not be ignored. Students may dislike assessment but they know if a test is testing what the teacher tried to teach them.

Conclusion 

Validity plays an important role in the development of instruments in quantitative research. Which form of validity to use to assess the instrument depends on the researcher and the context that he or she is facing.

Logical Flow in R: If/Else Statements Part II

Advertisements

In a previous post, we looked at If/Else statements in R. We developed a function that calculated the amount of money James and the owner would give based on how many points the team scored. Below is a copy of this function.

CashDonate <- function(points, Dollars_per_point=40, HomeGame=TRUE){
 game.points<- points * Dollars_per_point  if(points > 100) {game.points <-points * 30}
 if(HomeGame) {Total.Donation <- game.points * 1.5
 } else {Total.Donation <- game.points * 1.3}
 round(Total.Donation)
}

There is one small problem with this function. Currently, you have to input each game one at a time. You can have R calculate the results of several games at once. For example, look at the results of the code below when we try to input more than one game at once.

> CashDonate(c(99,100,78))
[1] 5940 6000 4680
Warning message:
In if (points > 100) { :
  the condition has length > 1 and only the first element will be used

As you can see, we get a warning message and some of the values are wrong. For example, the second value should be 4,500 and not 6,000.

In order to deal with this problem, R has the ‘ifelse’ function available. The ‘ifelse’ allows R to choose values in two or more vectors to complete an action. We need to be able to choose the appropriate action based on the following information

  • points scored is less than or greater than 100
  • Home game or not a home game

Remember, R could do this if one value was put into the ‘CashDonate’ function. Now we need to be able to calculate what to do based on several values in each of the vectors above. Below is the modified code for doing this.

CashDonate <- function(points, HomeGame=TRUE){
 JamesDonate<- points * ifelse(points > 100, 30, 40)
 totalDonation<- JamesDonate * ifelse(HomeGame, 1.5, 1.3)
 round(totalDonation)
}

Here is what the modified function does

  1. It has the argument ‘points’ and the default argument of ‘HomeGame = TRUE’
  2. The first calculation is the number of points but with the ‘ifelse’ function. If the number of points is greater than 100 the points is multiplied by 30 if less than 100 than the points are multiplied by 40. All this is put in the variable ‘JamesDonate’
  3. Next, the amount from ‘JamesDonate’ is multiplied by 1.5 if it was a home game or 1.3 if it was not a home game. All this is put into the variable ‘totalDonation’
  4. The results are rounded

To use CashDonate to its full potential you need to make a dataframe. Below is the code for the ‘games’ data frame we will use.

games<- data.frame(game.points=c(88,100,99,111,96), HomeGame=c(TRUE, FALSE, FALSE, TRUE, FALSE))

In the ‘games’ data frame we have two columns, one for game points and another that tells us if it was a home game or not. Now we will use the ‘games’ data frame with the new ‘CashDonate’ and calculate the results. We need to use the ‘with’ function to do this. This function will be explained at a later date. Below are the results.

> with(games, CashDonate(game.points, HomeGame=HomeGame))
[1] 5280 5200 5148 4995 4992

You can calculate this manually if you would like. Now, we can calculate more than one value in are ‘CashDonate’ function which makes it much more useful than before. All thanks to the use of the ‘ifelse’ function in the code.

Assessing Reliability

Advertisements

In quantitative research, reliability measures an instruments stability and consistency. In simpler terms, reliability is how well an instrument is able to measure something repeatedly. There are several factors that can influence reliability. Some of the factors include unclear questions/statements, poor test administration procedures, and even the participants in the study.

In this post, we will look at different ways that a researcher can assess the reliability of an instrument. In particular, we will look at the following ways of measuring reliability…

  • Test-retest reliability
  • Alternative forms reliability
  • Kuder-Richardson Split Half Test
  • Coefficient Alpha

Test-Retest Reliability

Test-retest reliability assesses the reliability of an instrument by comparing results from several samples over time. A researcher will administer the instrument at two different times to the same participants. The researcher then analyzes the data and looks for a correlation between the results of the two different administrations of the instrument. in general, a correlation above about 0.6 is considered evidence of reasonable reliability of an instrument.

One major drawback of this approach is that often given the same instrument to the same people a second time influences the results of the second administration. It is important that a researcher is aware of this as it indicates that test-retest reliability is not foolproof.

Alternative Forms Reliability 

Alternative forms reliability involves the use of two different instruments that measure the same thing. The two different instruments are given to the same sample. The data from the two instruments are analyzed by calculating the correlation between them. Again, a correlation around 0.6 or higher is considered as an indication of reliability.

The major problem with this is that it is difficult to find two instruments that really measure the same thing. Often scales may claim to measure the same concept but they may both have different operational definitions of the concept.

Kuder-Richardson Split Half Test

The Kuder-Richardson test involves the reliability of categorical variables. In this approach, an instrument is cut in half and the correlation is found between the two halves of the instrument. This approach looks at internal consistency of the items of an instrument.

Coefficient Alpha

Another approach that looks at internal consistency is the Coefficient Alpha. This approach involves administering an instrument and analyze the Cronbach Alpha. Most statistical programs can calculate this number. Normally, scores above 0.7 indicate adequate reliability. The coefficient alpha can only be used for continuous variables like Lickert scales

Conclusion

Assessing reliability is important when conducting research. The approaches discussed here are among the most common. Which approach is best depends on the circumstances of the study that is being conducted.

Logical Flow in R: If/Else Statements

Advertisements

If statements in R are used to define choice in the script. For example, if there are two choices ‘a’ and ‘b’ and an if statement is used. Then if a certain condition exists ‘a’ happens if not, ‘b’ happens.

Before we explore this more closely we need to setup a scenario and a function for it. Imagine that James wants to donate $40.00 for every point his team scores in a game. To calculate this we create the following function.

CashDonate <- function(points, Dollars_per_point=40){
 game.points<- points * Dollars_per_point
 round(game.points)
}

Here is what we did

  1. We created the function “CashDonate”
  2. The function has the arguments ‘points’ and ‘Dollars_per_point’ which has a default value of 40
  3. The variable ‘game.points’ is created to hold the value of ‘points’ times ‘Dollars_per_point’
  4. The output of ‘game.points’ is then rounded which is the total amount of money that should be donated

The function works perfectly

Below is an example of the function working when James’ team scores 95 points

> CashDonate(95)
[1] 3800

Later, James comes to you upset. His team is scoring so many points that it is starting to affect his budget. He now wants to change the amount he donates IF the team scores over 100 points. If his team scores 100 points or more James wants to donate $30.00 per point instead of $40.00 dollars. Below is the modified function. Notice the use of the if in the script.

CashDonate <- function(points, Dollars_per_point=40){
 game.points<- points * Dollars_per_point  if(points > 100) {game.points <-points * 30
 }
 round(game.points)
}

Most of the code is the same except notice the following changes

  1. We added the argument ‘if’ after this, we put the condition
  2. If the number of points was greater than 100 we now would multiply the results of the variable ‘games.points’ by 30 instead of the default value of 40

Below are two examples. Example 1 shows the results of the modified ‘CashDonate’ function when less than 100 points are scored. Example two will show the results when more than 100 points are scored.

EXAMPLE 1
> CashDonate(98)
[1] 3920
EXAMPLE 2
> CashDonate(103)
[1] 3090

Else Statements

Else statements allow for the use of TRUE/FALSE statements. For example, if ‘a’ is true do this if not do something else. Below is a scenario that requires the use of an ‘else’ statement.

The owner of James’ team decides that he wants to contribute to the donating as well. He states that if it is a home game he will give 50% of whatever James give and he will give 30% of whatever James gives for away games. The total will be added together as one amount. Below is the modified code.

CashDonate <- function(points, Dollars_per_point=40, HomeGame=TRUE){
 game.points<- points * Dollars_per_point  if(points > 100) {game.points <-points * 30}
 if(HomeGame) {Total.Donation <- game.points * 1.5
 } else {Total.Donation <- game.points * 1.3}
 round(Total.Donation)
}

Here is an explanation

  1. At the top, we add the argument “HomeGame” and set the default to “TRUE” which means the other choice is “FALSE” which is for an away game
  2. The rule for points over 100 is the same as before
  3. The second ‘if’ statement talks about the logical statement for “HomeGame”. If it is a home game the results of ‘game.points’ is multiplied by 1.5 or 50%. If it is not a home game (notice the ‘else’) then the results of ‘game.points’ is multiplied by 1.3 or 30%. The results of either choice are stored in the variable ‘Total.Donation’
  4. Lastly, the results of ‘Total.Donation’ are rounded

Below are 4 examples

Example 1 is less than 100 points scored in a home game

> CashDonate(98)
[1] 5880

Example 2 is more than 100 points scored in a home game

> CashDonate(102)
[1] 4590

Example 3 is less than 100 points scored at an away game

> CashDonate(99, HomeGame = FALSE)
[1] 5148

Example 4 is more than 100 points scored at an away game

> CashDonate(104, HomeGame = FALSE)
[1] 4056

Conclusion

If statements provide choice. Else statements provide choice for logical arguments. Both can be used in R to provide several different actions in a script.

Measuring Variables

Advertisements

When conducting quantitative research, one of the earliest things a researcher does is determine what their variables are. This involves developing an operational definition of the variable which description of how you define the variable as well as how you intend to measure it.

After developing an operational definition of the variable(s) of a study, it is now necessary to measure the variable in a way that is consistent with the operational definition. In general, there are five forms of measurement and they are…

  • Performance measures
  • Attitudinal measures
  • Behavioral observation
  • Factual Information
  • Web-based data collection

All forms of measurement involve an instrument which is a tool for actually recording what is measured.

Performance Measures

Performance measures assess a person’s ability to do something. Examples of instruments of this type include an aptitude test, intelligence test, or a rubric for assessing an essay. Often these form of measurement leads to “norms” that serves as a criterion for the progress of students.

Attitudinal Measures

Attitudinal measures assess peoples’ perception They are commonly associated with Lickert Scales (strongly disagree to strongly agree). This form of measurement allows a research access to the attitudes of hundreds instead of the attitudes of few as would be found in qualitative research.

Behavioral Observation

Behavioral observation is the observation of behaviors of interest to the researcher. The instrument involved is normally some sort of checklist. When the behavior is seen it is notated using tick marks.

Factual Information

Data that has already been collected and is available to the public is often called factual information.  The researcher takes this information and analyzes it to answer their questions.

Web-Based Data Collection

Surveys or interviews conducted over the internet are examples of web-based data collection. This is still relatively new. There are still people who question this approach as there are concerns over the representativeness of the sample.

Which Measure Should I Choose?

There are several guidelines to keep in mind when deciding how to measure variables.

  • What form of measurement are you able to complete?  Your personal expertise, as well as the context of your study, affected what you are able to do. Naturally, you want to avoid doing publication quality research with a measurement form you are unfamiliar with or do research in an uncooperative place.
  • What are your research questions? Research questions shape the entire study. A close look at research questions should reveal the most appropriate form of measurement.

The actual analysis of the data depends on the research questions. As such, almost any statistical technique can be applied for all of the forms of measurement. The only limitation is what the researcher wants to know.

Conclusion

Measuring variables is the heart of quantitative research. The approach taken depends on the skills of the researcher as well as the research questions. Ever form of measurement has its place when conducting research.

Developing Functions in R Part III: Using Functions as Arguments

Advertisements

Previously, we learned how to add nameless arguments to a function using ellipses ‘. . .’.  In this post, we will learn how to use functions as arguments in functions. An argument is the information found within parentheses ( ) or braces { } in R programming. The reason for doing this is that it allows for many shortcuts in coding. Instead of having to retype the formula for something you can pass the function as an argument in order to save a lot of time.

Below is the code that we have been working with for awhile before we add a function as an argument.

Percent_Divided <- function(x, divide = 2, ...) {
 ToPercent <- round(x/divide, ...)
 Output <- paste(ToPercent, "%", sep = "")
 return(Output)
 }

As a reminder, the ‘Percent_Divided’ function takes a number or variable ‘x’ divides it by two as a default and adds a ‘ % ‘ sign after number(s). With the ‘. . .’ you can pass other arguments such as ‘digits’ and specify how many number you want after the decimal.  Below is an example of the ‘Percent_Divided’ function in action for a variable called ‘B’ and with the add argument of ‘digits =3’

> B
[1] 23.35345 45.56456 32.12131
> Percent_Divided(B, digits = 3)
[1] "11.677%" "22.782%" "16.061%"

Functions as Arguments

We will now make a function that has a function for an argument. We will set a default function but remember that anything can be passed for the function argument. Below is the code for one way to do this.

Percent_Divided <- function(x, divide = 2,FUN_ARG = round, ...) {
 ToPercent <- FUN_ARG(x/divide, ...)
 Output <- paste(ToPercent, "%", sep = "")
 return(Output)
}

Here is an explanation

  1. Most of this script is the same. The main difference is that we added the argument ‘FUN_ARG’ to the first line of the script. This is the place where we can insert whatever function we want. The default function is ’round’. If we do not specify any function ’round’ will be used.
  2. In the second line of the code you again see ‘FUN_ARG’ this function will be activated after ‘x’ is divided by 2 and whatever arguments are used with the ‘. . .’
  3. The rest of the code has already been explained and has not been changed
  4. Important note. If we do not change the default of ‘FUN_ARG’, which is the ’round’ function, we will keep getting the same answers as always. The ‘FUN_ARG’ is only interesting if we do not use the default.

Below is an example of our modified function. The function we are going to pass through the ‘FUN_ARG’ argument is ‘signif’. ‘signif’ sounds the values in its first argument to the specified number of significant digits. We will also pass the argument ‘digits = 3’ through the ellipses ‘. . .’ The values of variable B (see above) will be used for the function

> Percent_Divided(B, FUN_ARG = signif, digits = 3)
[1] "11.7%" "22.8%" "16.1%"

Here is what happen

  1. The Percent_Divided’ function was run
  2. The function ‘signif’ was passed through the ‘FUN_ARG’ argument and the argument ‘digits = 3’ was passed through the ellipses ‘. . .’ argument.
  3. All the values for B were transformed based on the script in the ‘Percent_Divided’ function

Conclusion

Using functions as argument is mostly for saving time when developing a code. Even though this seems complicated this is actually rudimentary programming.

Developing Functions in R Part II: Adding Arguments

Advertisements

In this post, we will continue the discussion on working with functions in R. Functions serve the purpose of programming R to execute several operations at once. This post, we will look at adding additional arguments to a function.

Arguments are the various entries within the parentheses. For example, in are example below the arguments of the function is x.

  • MakePercent <- function(x) {
     ToPercent <- round(x, digits = 2)
     Output <- paste(ToPercent, "%", sep = "")
     return(Output)
    }

In the example above there are many other arguments beside x. However, the only argument for the function is x. The other arguments in other parentheses belong to  other objects in the script. In this post, we are going to learn how to add additional arguments to the function.

Let’s say that we want to convert a number to a percentage like a previous function we made but we now want to be able to divide the number by whatever we want. Here is how it could be done.

Percent_Divided <- function(x, divide) {
 ToPercent <- round(x/divide, digits = 2)
 Output <- paste(ToPercent, "%", sep = "")
 return(Output)
}

Here is what we did

  1. We created the object ‘Percent_Divided’ and assigned the function with the arguments ‘x’ and ‘divide’
  2. Next we use a { and we create the variable ‘ToPercent’ and we assigned the function ’round’  to round ‘x’ divided by whatever value ‘divide’ from the function takes. We then round the results of this two digits.
  3. The results of ‘ToPercent’ are then assigned to the variable ‘Output’ where a ‘ %’ sign is assigned to the value
  4. Lastly, the results of ‘Output’ are printed in the console.

Sounds simple. Below is the function in action dividing a number by 2 and then by 3

> source('~/.active-rstudio-document', echo=TRUE)

> Percent_Divided <- function(x, divide) {
+         ToPercent <- round(x/divide, digits = 2)
+         Output <- paste(ToPercent, "%", sep = "")
+    .... [TRUNCATED] 
> Percent_Divided(22.12234566, divide=2)
[1] "11.06%"
> Percent_Divided(22.12234566, divide=3)
[1] "7.37%"

Here is what happen

  1. You source the script from the source editor by typing ctrl + shift + enter
  2. Next, I used the function ‘Percent_Divided’ with the number 22.12234566 and I decided to divide the number by two
  3. R returns the answer 11.06%
  4. Next I repeat the process but I divide by 3 this time
  5. R returns the answer 7.37%

There is one problem. The argument ‘divide’ has no default  value. What this means is that you have to tell R what the value of ‘divide’ is every single time. As an example see below

> Percent_Divided(22.12234566)
Error in Percent_Divided(22.12234566) : 
  argument "divide" is missing, with no default

Because I did not tell R what value ‘divide’ would be, R was not able to complete the process of the function. To solve this problem we will set the default value of ‘divide’ to 10 in the script as shown below.

Percent_Divided <- function(x, divide = 10) {
 ToPercent <- round(x/divide, digits = 2)
 Output <- paste(ToPercent, "%", sep = "")
 return(Output)
}

If you look closely you will see ‘divide = 10’. This is the default value for ‘divide’ if we do not set another number for ‘divide’ R will use 10. Below is an example using the default value of ‘divide’ and another example with ‘divide’ set to 5.

> Percent_Divided <- function(x, divide = 10) {
+         ToPercent <- round(x/divide, digits = 2)
+         Output <- paste(ToPercent, "%", sep = "") .... [TRUNCATED] 
> Percent_Divided(22.12234566)
[1] "2.21%"
> Percent_Divided(22.12234566, divide = 5)
[1] "4.42%"

First we sourced the script using ctrl + shift + enter. In the first example, the number is automatically divided by 10 because this is the default. In the second example, we specific we wanted to divide by five by adding the argument ‘divide = 5’. You can see the difference in the results.

In a future post, we will continue to examine the role of arguments in functions.

Reviewing the Literature: Part II

Advertisements

In the last post, we began a discussion on the steps involved in reviewing the literature and we look at the first two steps, which are identifying key terms and locating literature. In this post, we will look at the last three steps of developing a review of literature which are…

3. Evaluate and select literature to include in your review
4. Organize the literature
5. Write the literature review

Evaluating Literature

This step was alluding to when I wrote about using google scholar and google book in part I. For articles, you want to assess the quality of them by determining who publishes the journal. Reputable publishers usually publish respectable journals. This is not to say that other sources of articles are totally useless. The point is that you want to attract as few questions as possible when it comes to the quality of the sources you use to develop a literature review.

One other important concept in evaluating literature is the relevancy of the sources. You want sources that focus on a similar topic, population, and or problems. It is easy for a review of literature to lose focus so this is a critical criteria to consider.

Organizing the Literature 

There are many options for organizing sources. You can make an outline and group the sources together in by heading or you can construct some sort of visual of the information. The place to start is to examine the abstract of the articles that are going to be a part of your literature review. The abstract is a summary of the study and is a way to get an understanding of a study quickly.

If the abstract indicates that a study is beneficial you can look at the whole article to learn more. If the whole article is unavailable you can use the abstract as a potential source.

Writing a Review of Literature

Writing involves taking your outline or visual and convert it into paragraph format. There are at least three common ways to write a literature review. The three ways are thematic review, study-by-study review, and combo review.

The thematic review shares a theme in research and cites several sources. There is very little detail. The cites support the claim made by the theme. Below is an example using APA formatting.

Smoking is bad for you (James, 2013; Smith, 2012; Thomas, 2009)

The details of the studies above are never shared but it is assumed that these studies all support the claim that smoking is bad for you.

Another type of literature review is the study-by-study review. In this approach, a detailed summary is provided of several studies under a larger theme. Consider the example below

Thomas (2009) found in his study among middle class workers that smoking reduces lifespan by five years.

This example provides details about the dangers of smoking as found in one study.

A combo review is a mixture of the first two approaches. Sometimes you provide a thematic review other times you provide the details of a study-by-study review. This is the most common approach as it’s the easiest to read because it provides an overview with an occasional detail.

Conclusion

The ideas presented here are for providing support in writing review of literature. There are many other ways to approach this but the concepts presented here will provide some guidance.

Reviewing the Literature: Part I

Advertisements

The research process often begins with a literature review. A review of literature is a systematic summary of books, journal articles, and other sources pertaining to a particular topic.The purpose of a literature review is to demonstrate how your study adds to the existing literature and also to show why your study is needed.

In general, there are five common steps to reviewing the literature and they are…

  1. Identify key terms
  2. Locate literature
  3. Evaluate and select literature to include in your review
  4. Organize the literature
  5. Write the literature review

In this post, we will discuss the first two

Identify Key Terms

The purpose of identifying key terms is that they give you words to “google” when you conduct a search. Below are some ways to develop key terms.

  • Creating some sort of title, even if it is temporary, and conduct a search based on words in this title is one way to begin.
  • If you already have research questions, you can look for important words in these questions to conduct a search.
  • Find an article that is studying something similar to you and look at the keywords that they include. Many articles have a list of keywords on the first page that can be used for other studies.

Locating Literature

Locating literature is not as difficult as it was years ago thanks to the internet. Now, the search for high-quality sources doesn’t even require the need to leave home. There is some sort of hierarchy in terms of the quality and age of material available and it is as follows. Each example below is rate on a scale of 1-5 for quality and newness the higher the rating the higher the quality and newness of the example

  • Websites, newspapers, and blogs Quality 1 Newness 5
  • Academic publications such as conference papers, theses, Quality 2 Newness 4
  • Peer-reviewed Journal Articles Quality 3 Newness 3
  • Books Quality 4 Newness 2
  • Summaries like encyclopedias Quality 5 Newness 1

In this example, normally the lower the quality the younger the information is. Keep in mind that there are many exceptions to the example above. Self-published books would obviously have a  much lower quality rating while some online sources are of much higher quality because of who is providing the information.

Once you have some keywords it is time to begin the search. Google books is an excellent place to begin. When you get to this website, you type in your key term and Google returns a list of books that contain the key term. You click on the book and it takes you to the page where the term is. This is like holding the book in your hand at the library. You note whatever information you need and go to another book.

For Google scholar, you go to the site and type in your key term. Google Scholar gives you several pages of articles. Before choosing, there are a few guidelines to keep in mind.

  • Depending on your field, you will probably be expected to cite new literature in your review often in the last 5-10 years. To do this you need set a custom range for articles you want to view. Focusing on the last 5-10 actually helps you to focus and gets things done quicker. You only cite older material if it was groundbreaking.
  • Google Scholar gives you any article with concern for quality. To protect yourself from citing poor research one strategy is to consider who the publisher was. Below is a few examples of high-quality publishers of academic journals. If the article was published by them it is probably of decent quality.
    • Sage, JSTOR, Wiley, Elsevier

Conclusion

This provides some basic information on beginning the process. In a later post, we will go over the last few steps of conducting a literature review.

Data Frames in R: Part II

Advertisements

In this post, we will explore how to create data frames as well as looking at other aspects of using data frames in R. The first example below is a data frame that contains information about fictional faculty members. Our job will be to put this information a data frame and to rename the columns. Below is the example and it will be followed by an explanation.

> Faculty <- c('Darrin Thomas', 'Hank Smith', 'Sarah William')
> Salary <- c(60000, 50000, 53000)
> Hire_Date <- as.Date(c('2015-1-1', '2000-6-1', '2012-9-1'))
> Lecturers.data <- data.frame(Faculty, Salary, Hire_Date)
> str(Lecturers.data)
'data.frame':	3 obs. of  3 variables:
 $ Faculty  : Factor w/ 3 levels "Darrin Thomas",..: 1 2 3
 $ Salary   : num  60000 50000 53000
 $ Hire_Date: Date, format: "2015-01-01" "2000-06-01" "2012-09-01"

Here is what happen

  1. We started by making three different vectors and assigning a variable to each. The variables are ‘Faculty’, ‘Salary’, and ‘Hire_Date’.
  2. We then assigned all three variables to the data frame ‘Lecturers.data’ ‘Faculty’ is a factor vector, ‘Salary’ is a numeric vector, and ‘Hire_Date’ is a date vector. Again, the advantage of data frames is their ability to have several different types of data
  3. We then used the ‘str’ function to see the attributes of the ‘Lecturers.data’ data frame.

There is one small problem with the data frame above. ‘Faculty’ is a factor vector but our original vector for “Faculty’ was a character vector. We want ‘Faculty’ to continue to be a character vector instead of it becoming a factor. The example below shows one way to deal with this small problem.

> Lecturers.data <- data.frame(Faculty, Salary, Hire_Date, stringsAsFactors=FALSE)
> str(Lecturers.data)
'data.frame':	3 obs. of  3 variables:
 $ Faculty  : chr  "Darrin Thomas" "Hank Smith" "Sarah William"
 $ Salary   : num  60000 50000 53000
 $ Hire_Date: Date, format: "2015-01-01" "2000-06-01" "2012-09-01"

By adding the argument ‘stringsAsFactors=FALSE’ it make forces all vectors to not be factors. If you look closely you will see that ‘$ Faculty’ is not a Factor anymore as in the previous example. Instead it is now a ‘chr’ or character variable.

It is also possible to rename column names in a data frame just like in a matrix. For example, let’s say you made a mistake with the ‘Hire_Date’ variable. You did not mean the the date the lecturers were hired but the date they resigned. Below is an example of how to fix this.

> Lecturers.data
        Faculty Salary  Hire_Date
1 Darrin Thomas  60000 2015-01-01
2    Hank Smith  50000 2000-06-01
3 Sarah William  53000 2012-09-01
> names(Lecturers.data) [3] <- 'Resign_Date'
> Lecturers.data
        Faculty Salary Resign_Date
1 Darrin Thomas  60000  2015-01-01
2    Hank Smith  50000  2000-06-01
3 Sarah William  53000  2012-09-01

Here is what happening

  1. We displayed the data frame ‘Lecturers.data’ as a reference point
  2. We noticed that we did not want a column named ‘Hired_Date’ but want to change the name to ‘Resign_Date’
  3. To change the name we use the ‘names’ function to change the name of a column in ‘Lecturers.data’. We specifically tell are to rename the third column by using the subset brackets [3] and assign the name ‘Resign_Date’
  4. We then redisplay the ‘Lecturers.data’ data frame. If you compare this data frame with the first you can see that the third column has been renamed as desired.

This post provided some basic information on developing data frames. We learned how to combine vectors into a data frame, how to change a factor to a character vector, and how to rename a column. Such skills as these are beneficial to anyone who needs to use data frames.

Data Frames in R: Part I

Advertisements

So far we have looked at vectors, matrices, and arrays. One thing these three objects have in common is that they consist of one type of data. In other words, vectors, matrices and arrays contain either numerical or character information but not both at the same time.

Data frames are different. They allow you to have a mixture of information all contained within one place. You can have character data, such as names, and numerical data, such as salaries all in one place. In this post, we will first look at how to convert a matrix to a data frame.

Converting a Matrix to a Data Frame

One of the benefits of converting a matrix to a data frame is that the columns in a matrix become variables in a data frame. This is useful for data analysis at times. In the example below, we will convert the matrix of ‘points.team’ into a data frame.  In order to do this we have to use to new functions the ‘as.data.frame’ function which converts the matrix into a data frame and the ‘t’ function which transposes the rows so that they become the columns. Below is the code for completing this

> points.team
      1st 2nd 3rd 4th 5th 6th
James  12  15  30  25  23  32
Kevin  20  19  25  30  31  22
> team.points.df <- as.data.frame(t(points.team))
> team.points.df
    James Kevin
1st    12    20
2nd    15    19
3rd    30    25
4th    25    30
5th    23    31
6th    32    22

This is what we did

  1. We displayed the matrix ‘points.team’ as a reference. The code for creating this is available here.
  2. We then created the variable ‘team.points.df’ and used two functions to convert the matrix ‘points.team’ to a data frame
    1. ‘as.data.frame’ was used to make the actually data frame
    2. ‘t’ was used to move or transpose the names of the rows in the matrix (James and Kevin) to be the names of columns in the data frame. This gives us two variables (James and Kevin) with six entries (1st-6th) if we had not done this we would have made the example below
      > team.points.df
            1st 2nd 3rd 4th 5th 6th
      James  12  15  30  25  23  32
      Kevin  20  19  25  30  31  22

      In this example, we did not transpose ‘James’ and ‘Kevin’ to be columns. Instead 1st-6th are the variables instead of ‘James’ and ‘Kevin’. For our purposes this does not make sense but it may be appropriate it other situations.

  3. The last step involved displaying the new data frame ‘team.points.df’

Using the ‘str’ function allows you to learn some information about a data frame as in the example below

> str(team.points.df)
'data.frame':	6 obs. of  2 variables:
 $ James: num  12 15 30 25 23 32
 $ Kevin: num  20 19 25 30 31 22

Here is what we now know about our data frame

  • The variable is a data frame (data.frame)
  • It has six observations (6 obs.) which are the points scored by the players in six games
  • There are two variables ‘James’ and ‘Kevin’
  • The variables are numeric (num)

This is just the beginning of our examination of data frames in R. In a future post we will look at making original data frames.

Beyond Vectors: Introduction to Matrices and Arrays in R Part IV

Advertisements

In this post, we will look at how to rename the rows and columns in a matrix. We will also examined how to do the following

  • Name rows and columns in matrices
  • Make an array
  • Basic math in matrices & arrays

Naming Rows and Columns

Renaming rows and columns has a practical use. It allows people to provide meaning and or context to the data they are interpreting. In the example below, we will rename the rows and the columns of the basketball players so that it contains their names as well as what game.

> points.of.James
[1] 12 15 30 25 23 32
> points.of.Kevin
[1] 20 19 25 30 31 22
> points.team <- rbind(points.of.James, points.of.Kevin)
> points.team
                [,1] [,2] [,3] [,4] [,5] [,6]
points.of.James   12   15   30   25   23   32
points.of.Kevin   20   19   25   30   31   22
> rownames(points.team) <- c("James", "Kevin")
> points.team
      [,1] [,2] [,3] [,4] [,5] [,6]
James   12   15   30   25   23   32
Kevin   20   19   25   30   31   22
> colnames(points.team) <- c("1st", "2nd", "3rd", "4th", "5th", "6th")
> points.team
      1st 2nd 3rd 4th 5th 6th
James  12  15  30  25  23  32
Kevin  20  19  25  30  31  22

.Here is what’s going on

  1. We make the variables ‘point.of.James’ and ‘point.of.Kevin’ and display them
  2. We combine ‘point.of.James’ and ‘point.of.Kevin’ into a matrix using the ‘rbind’ function and assign this to the variable ‘points.team’
  3. We then display ‘points.team’
  4. We then rename the rows using the ‘rownames’ function. We replace ‘point.of.James’ and ‘point.of.Kevin’ with ‘James’ and “Kevin’ in the rows. We then display this change in the matrix.
  5. Next we rename 1,2,3,4,5,6 with 1st, 2nd, 3rd, 4th, 5th, & 6th. using the ‘colnames’ function.
  6. Lastly, we display the finished table.

Making Arrays

A vector has one dimension (row), a matrix has two dimensions (row & column), and an array has three or more dimensions (length, width, & height for example). We are not going to cover arrays extensively because it is hard to envision data beyond 3 dimensions. In addition, please remember that all the rules and tricks for vectors and matrices apply towards an array. Below is an example of how to make an array

> array1 <-array(1:48, dim=c(6, 8, 4))
> array1
, , 1

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    7   13   19   25   31   37   43
[2,]    2    8   14   20   26   32   38   44
[3,]    3    9   15   21   27   33   39   45
[4,]    4   10   16   22   28   34   40   46
[5,]    5   11   17   23   29   35   41   47
[6,]    6   12   18   24   30   36   42   48

, , 2

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    7   13   19   25   31   37   43
[2,]    2    8   14   20   26   32   38   44
[3,]    3    9   15   21   27   33   39   45
[4,]    4   10   16   22   28   34   40   46
[5,]    5   11   17   23   29   35   41   47
[6,]    6   12   18   24   30   36   42   48

, , 3

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    7   13   19   25   31   37   43
[2,]    2    8   14   20   26   32   38   44
[3,]    3    9   15   21   27   33   39   45
[4,]    4   10   16   22   28   34   40   46
[5,]    5   11   17   23   29   35   41   47
[6,]    6   12   18   24   30   36   42   48

, , 4

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    7   13   19   25   31   37   43
[2,]    2    8   14   20   26   32   38   44
[3,]    3    9   15   21   27   33   39   45
[4,]    4   10   16   22   28   34   40   46
[5,]    5   11   17   23   29   35   41   47
[6,]    6   12   18   24   30   36   42   48

Here is what is happening

  1. We created ‘array1’ and assigned an array to it that contains numbers 1 to 48 and has 6 rows, 8 columns, and 4 dimensions.
  2. When then have R display the results

To locate values in specific indices you have to add an additional number to the address. For example, if you are looking for 1,1,1 you would go to the first dimension, first row, and first column. Also do not forget how extracting rows and columns re-shuffles the numbering of the remaining rows and columns.

Basic Math in Matrices and Arrays

You can change all the values in a matrix and array. For example, let us say we want to add four points to every game that Kevin and James played. We can do this by using the following code.

> points.team
      1st 2nd 3rd 4th 5th 6th
James  12  15  30  25  23  32
Kevin  20  19  25  30  31  22
> new.points.tream <- points.team+4
> new.points.tream
      1st 2nd 3rd 4th 5th 6th
James  16  19  34  29  27  36
Kevin  24  23  29  34  35  26

What happening

  1. We redisplayed the variable “points. team’
  2. We then create a new variable called ‘new.points.team’ and we told R to add 4 points to every value in the variable ‘points.team’
  3. We display the new table. A closer look will show how every value is 4 points higher in the new table than the results of

Conclusions

This conclude the examinations on matrices and arrays.

Beyond Vectors: Introduction to Matrices and Arrays in R Part III

Advertisements

In this post, we will continue to explore the basic features of matrices. Specifically, we will look at how to replace values in a matrix and how to combine vectors into a matrix.

Replacing Values in a Matrix

There are times when we might input the wrong value into a matrix. The hard way to deal with this problem is to remake the entire matrix. The easy way is to only replace the incorrect value. You can change values by individual index, by row or column or by even importing another matrix (we will not covering replaces values in a matrix with another matrix). Below are examples

> matrix1<-matrix(1:20, ncol=4)
> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> matrix1[1,1] <- 4
> matrix1
     [,1] [,2] [,3] [,4]
[1,]    4    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

In this first example we wanted to replace the value in index 1,1 with the number 4 here is what happened.

  1. We created the matrix ‘matrix1’ using numbers 1-20 (1:20) with 4 columns (ncol = 4).
  2. We type ‘matrix1’ so R displays it. The purpose of this is so that we can compare the original matrix with the modification.
  3. We than type in ‘matrix1’ into R and subset row 1 column 1 using brackets and we assigned the number 4 to row 1 column 1.
  4. We then type ‘matrix1’ to show the modified matrix

If you look carefully at 1,1 in the matrix you will that the value in this index has been changed from 1 to 4.

Below is an example of replace an entire row. This technique can also be used for replacing a column

> matrix1<-matrix(1:20, ncol=4)
> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> matrix1[2, ] <- c(2,3)
> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    3    2    3
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

Here is what happened

  1. We created the matrix ‘matrix1’ and displayed it as a reference
  2. We then subsetted the second row of ‘matrix1’ and told are to put in that row the numbers 2 and 3
  3. We then displayed the modified matrix

If you look carefully, you will see that in the second row we now have the pattern 2,3.  When you tell R to replaces values using a pattern, R will continue to repeat the pattern until the row is filled. Even though we only told R to use two numbers (2 and 3) R filled all four columns in the second row with the pattern 2,3.

Combining Vectors into a Matrix

You can combine two or more vectors into a matrix by using the function ‘rbind’ below is an example.

> points.of.James
[1] 12 15 30 25 23 32
> points.of.Kevin
[1] 20 19 25 30 31 22
> points.of.players <- rbind(points.of.James, points.of.Kevin)
> points.of.players
                [,1] [,2] [,3] [,4] [,5] [,6]
points.of.James   12   15   30   25   23   32
points.of.Kevin   20   19   25   30   31   22

Here is what we did

  1. We redisplayed the values of the variables ‘points.of.James’ and ‘points.of.Kevin’
  2. When then created the variable ‘points.of.players’ and used the ‘rbind’ function to combine the values in the vectors/variables of ‘points.of.James’ and ‘points.of.Kevin’
  3. We then display the results

In a future post, we will examine how to rename the rows and columns so that they provide critical information for people who may be trying to interpret the information.

Beyond Vectors: Introduction to Matrices and Arrays in R Part II

Advertisements

In this post, we will continue to explore the different actions that can be performed when using matrices and arrays. In particular, we will look at how to manipulate the values within a matrix.

Manipulating Values in a Matrix

Before manipulating values it is important to understand what an index is. An index or Indices, which is the plural form, is the address f a value within a matrix. There are two numbers involved in an index. The first number represents the row that the value is in and the second number represents the column. Again this is similar to Microsoft excel and its system for giving every cell an address.

Below is an example to help make this clearer

> matrix1<- matrix(1:20, ncol=4)
> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

Here is what we did,

  1. We made a variable called ‘matrix1’ and this variable has a matrix of 20 consecutive numbers (1:20) with four columns (ncol = 4)
  2. We then print ‘matrix1’ in R

Now for a simple quiz

What value is in row 3 column 2?*

What value is in row 5 column 4?**

The answers are at the bottom of the post.

In R, we do not say ‘row 3’ and ‘column 2’. Instead we would simply say 3, 2 or 5, 4 in the code.

Extracting Values

With our knowledge of indices we can now extract or removes values from a matrix. For example, below is a way to extract and only look at the first two rows and columns two and three of our matrix. In each example, I will display the original matrix and then the extracted matrix so you can compare the differences.

> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> matrix1[1:2, 2:3]
     [,1] [,2]
[1,]    6   11
[2,]    7   12

Here is what happen

  1. We printed matrix1 as a reference for comparison
  2. We then extracted the first two rows (1:2) and the second and third columns (2:3) of ‘matrix1’ using brackets
  3. R then prints the results

Notice how R renames the rows and columns. Rows 1 and 2 are still rows 1 and 2 because we extracted them. However, what use to be rows 2 and 3 has been renamed rows 1 and 2 in the extracted matrix. This can get really confusing so be careful.

If you want to extract an entire row you do not specify the column as in the example below

> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> matrix1[2:3,]
     [,1] [,2] [,3] [,4]
[1,]    2    7   12   17
[2,]    3    8   13   18

Here is what happening.

  1. We print ‘matrix1’ as a reference for comparison
  2. We extract from ‘matrix1’ rows 2 and 3 (2:3) using brackets. Notice that we put a comma after 2:3. This tells R to take the whole row.
  3. R prints the results. Notice what was once rows 2 and 3 is now rows 1 and 2.

You can also remove rows and columns in a matrix as in the example that follows

> matrix1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> matrix1[-2,-3]
     [,1] [,2] [,3]
[1,]    1    6   16
[2,]    3    8   18
[3,]    4    9   19
[4,]    5   10   20

Here is what is going on

  1. We print ‘matrix1’ as a reference.
  2. We then tell R to remove row 2 and column 3 (-2, -3). The removal of a row and or column is indicated through using a negative – sign.
  3. R prints the results. Notice that there are now 4 rows and 3 columns because we remove one row and one column. In addition, don’t forget that the numbers and columns have been renumbered

ANSWERS TO QUIZ

*8

**20

Beyond Vectors: Introduction to Matrices and Arrays in R

Advertisements

Vectors, matrices, arrays, what is the difference? The difference has to do with the number of dimensions that they have. A vector is only only one dimension. Vectors are one row of information. Matrices have two dimensions which are rows and columns. Any Microsoft excel spreadsheet is a matrix of rows and columns and thus contains two dimensions. An array is anything with three or more dimensions such as height, width and depth. Anything beyond three dimensions is hard for the typical mind to grasp.

Creating a Matrix

We will now create a simply matrix below. An explanation is provided after the code

> matrix.1 <- matrix(1:20, ncol=4)
> matrix.1
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

Here is what we did

  1. We created the variable ‘matrix.1’
  2. Within this variable we put a matrix that contain the numbers 1 to 20 (1:20)
  3. When the specified that we want four columns.
  4. R creates the matrix
  5. We type ‘matrix.1’ into R and press enter so we can see the matrix

Once you specify the number of columns R determines the number of rows itself. However, R does not like empty space in the rows and columns. Therefore, you matrix must use all of the space possible within it or you will get the message below.

> matrix.1 <- matrix(1:19, ncol=4)
Warning message:
In matrix(1:19, ncol = 4) :
  data length [19] is not a sub-multiple or multiple of the number of rows [5]

In the example above, I tried to put 19 numbers into a matrix that had 4 columns and 5 rows which equals 20 spaces or indices. Since there was one empty space R did not want to create the matrix.

Examining the Properties of Matrices

If you type in the function “str” you get the following information

> str(matrix.1)
 int [1:5, 1:4] 1 2 3 4 5 6 7 8 9 10 ...

Here is what this information means

  • int means that the matrix contains integers
  • 1:5 means that there are 5 rows and all rows are included in this description
  • 1:4 means that there are 4 columns and all columns are included in this description
  • 1 2 3 4 5 6 7 8 9 10 … is just the first ten indices in the matrix

The “dim” function will tell you the dimensions of the matrix and the “length” function tells you how many indices there are in the matrix. Both are below.

> dim(matrix.1)
[1] 5 4
> length(matrix.1)
[1] 20

The “dim” function indicates we have 5 rows and 4 columns. The ‘length’ function indicates that we have 20 indices.

This serves only as in introduction to matrices. We haven’t really dealt with arrays yet but many of the same ideas and concepts apply equally to them as well.

Introduction to Factors in R

Advertisements

Factors are used in R for data that is categorical. Categorical data is date that does not normally involve numbers but rather descriptions. For example, people can be right or left handed, gender is often defined as male or female. Each of these descriptions are categories with a variable.

To make a factor in R you need to use the “factor” function. Within the “factor” function there are three important arguments which are explained below

  • x–This is the place where the name of the variable containing the vector is placed
  • levels-This is another optional vector of values that x might have taken. If this is confusing it should be.
  • labels-This argument allows you to rename your levels within the factor if you want

In the example below we are going to make a factor that contains several different car manufacturers.

> car.makers <- c("Ford", "Isuzu", "Honda", "Toyota")
> factor(car.makers)
[1] Ford   Isuzu  Honda  Toyota
Levels: Ford Honda Isuzu Toyota

This is what happened

  1. We created the variable ‘car.makers’ and stored the values or names of car makers in the variable using a vector
  2. We used the “factor” function on the variable ‘car.makers’.
  3. R prints the the values in the factor as well as the levels. Notice that the levels and the values are the same.

You can also add labels to a factor. For example, let’s say you wanted to abbreviate the names in the ‘car.makers’ variable by removing vowels. Here is how.

> factor(car.makers, labels=c("Frd", "Isz", "Hnd", "Tyt"))
[1] Frd Hnd Isz Tyt
Levels: Frd Isz Hnd Tyt

All that we did was add the ‘labels’ argument to the “factor” function. We put in are substitute values and we are done.

str() Function

A unique thing about R is that when you look at the structure of the factor using the “str” function you get the following printout.

> str(car.makers)
 Factor w/ 4 levels "Frd","Isz","Hnd",..: 1 3 2 4

This is telling us that the factor ‘car.makers has for levels. R gives us three of them next. After this we get the number 1, 3, 2, 4. What do these numbers mean.

R assigns number to factor levels base on alphabetical order below is a translation of the list above.

  • Ford is the first letter in the list alphabetically so it gets 1
  • Isuzu is second alphabetically and gets 2
  • Honda is next and receives a 3
  • Toyota, which was not listed because R only prints the first three levels is the the last level alphabetically and receives a 4

These numbers can be used to create subsets just as with vectors. Below is another example

> levels(car.makers) [3:4]
[1] "Hnd" "Tyt"

In the example above, we told are that we want the levels of the factor ‘car.makers’. However, we specifically asked for levels 3 and for by using the brackets. R then prints the names of level 3 and 4

This provides some basic understanding on factors.

Finding Text in a Vector in R

Advertisements

This post will cover how to search for text within a vector in R. There are times when you may be working with a lot of information and you want to find a specific piece of information. For example, let’s say you have a list of names that are not in alphabetical order and you want to know how many names start with the letter “E”. To solve this problem, you need to learn how to search text by searching for a pattern. Below is an example of how to do this followed by an explanation.

  • > Student.names
     [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Ed"      "Frank"   "Gabe"    "Hank"   
     [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
    [17] "Quinton" "Alex"    "Andre"   "Aron"    "Bob"     "Rick"    "Simon"   "Steve"  
    [25] "Thomas"  "Tim"     "Victor"  "Vince"   "William" "Warren"  "Wilson"  "Ted"    
    [33] "Dan"     "Eric"    "Ernest"  "Fred"    "Jim"     "Ethan"   "Lance"   "Mitch"  
    [41] "Pete"    "John"   
    > grep("E",Student.names)
    [1]  5 34 35 38
  1. You have to create the variable ‘Student.names’ and type all the names above as a vector
  2. Next, you use the ‘grep’ function to determine which of the names start with “E” in the variable ‘Student.names’
  3. R tells by position or index which names start with ‘E’

Now you know where the names that start with ‘E’ are but you don’t know the actual names. Below is how you extract the names from the variable.

>  Student.names[grep("E", Student.names)]
[1] "Ed"     "Eric"   "Ernest" "Ethan"

Here is what happened

  1. You told the computer that you want a subset of all the names that start with “E” from the variable ‘Student.names’
  2. You used the ‘grep function to do this.
  3. R returned the names that start with ‘E’

Substituting Text

You can also substitute text in a vector. For example, let’s say you want to replace the name ‘Ed’ in the ‘Student.names’ variable with the more formal name of ‘Edward’ here is how it is done. Just so you know, ‘Ed’ was the 5th name in the list but below it will be replaced with ‘Edward.

> gsub("Ed", "Edward", Student.names)
 [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Edward"  "Frank"   "Gabe"    "Hank"   
 [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
[17] "Quinton" "Alex"    "Andre"   "Aron"    "Bob"     "Rick"    "Simon"   "Steve"  
[25] "Thomas"  "Tim"     "Victor"  "Vince"   "William" "Warren"  "Wilson"  "Ted"    
[33] "Dan"     "Eric"    "Ernest"  "Fred"    "Jim"     "Ethan"   "Lance"   "Mitch"  
[41] "Pete"    "John"
  1. In this example, we used the ‘gsub’ function to replace the name ‘Ed’ with ‘Edward
  2. Using ‘gsub’ we tell R to find ‘Ed’ and replace it with ‘Edward in the variable ‘Student.names’
  3. R completes the code and prints the list as seen above

Hopefully, the information provided will give you ideas into using text in R

Introduction to Character Vectors in R

Advertisements

Vectors in R can be used to store characters or text data. This post will go over some basic ways you can use character vectors.

You create character vectors in much the same way as numeric vectors. The only main difference is that you put quotes ” ” around the words. The quotes tell R that the information within the quotes is text and not numeric information. Below is an example of the creation of a character vector consisting of the names of students.

  • > student.names <- c("Andy", "Billy", "Chris", "Darrin", "Ed", "Frank", "Gabe", "Hank", "Ivan", "James", "Karl", "Larry", "Matt", "Norman", "Oscar", "Paul", "Quinton")
    > student.names
     [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Ed"      "Frank"   "Gabe"    "Hank"   
     [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
    [17] "Quinton

In this example, we assigned the ‘student.names’ variable to the names and then we typed the variable named ‘student.names’ so that R would print the results.

Subsetting Values

A subset is the result of pulling some information from a larger vector. This is useful if you want to work with a piece of data from a larger data set. For example, let’s say you only want the fourth name from the ‘student.names’ variable to do this an example is provided.

  • > student.names[4]
    [1] "Darrin"

All you did in the example above was type in the variable ‘student.names’. Next you placed in brackets [ ] the number four. This tells R you only want the information in the fourth index of the vector. R then prints the result with the fourth name of the vector.

The possibilities with subsets are endless. For example, you can ask for a sequence of indicies using a colon as shown in the example below where we ask for the fourth through eighth names in the vector.

> student.names[4:8]
[1] "Darrin" "Ed"     "Frank"  "Gabe"   "Hank"

Creating and Assigning Named Vectors

Vectors can be assigned to other vectors. This is used when you need to combine information from different vectors in a way that certain information is linked with certain information. For example, what if you want to combine the students’ names with their test scores? Below is an example of how to do this.

> student.names <- c("Andy", "Billy", "Chris", "Darrin", "Ed", "Frank", "Gabe", "Hank", "Ivan", "James", "Karl", "Larry", "Matt", "Norman", "Oscar", "Paul", "Quinton")
> test.scores <- c(85, 90, 80, 95, 60, 55, 72, 88, 71, 82, 62, 58, 89, 76, 64, 79, 88) 
> names(test.scores) <- student.names 
> test.scores  
    Andy Billy Chris Darrin Ed Frank Gabe Hank Ivan James Karl 
     85     90    80   95    60 55    72    88  71    82  62 
    Larry Matt Norman Oscar Paul Quinton 
     58    89   76      64   79    88

Here is what happened. NOTE: Each name should have a number under it. The formatting is difficult in WordPress sometimes

  1. We created are ‘student.names’ variable.
  2. We created are ‘test.scores’ variable.
  3. We then assigned the names or values in ‘test.score’ to the variable ‘student.names’. This  basic gives a name to every value in the ‘test.score’ variable. In other words, the first value 85 now has the name Andy, the second value 90 now has the name Billy.
  4. You now type ‘test.scores’ again to print the new combined vector.

Now every score has a named associated with it. Think of the name as another form of identification. You can subset information using the index or the name as shown below.

> test.scores["Andy"]
Andy 
  85 
> test.scores[1]
Andy 
  85

Both examples above give you the score for the first index which is 85.

This is just an introduction to character vectors. Keep in mind that much of this information applies to numeric vectors as well.

Introduction to Vectors Part III: Logical Vectors and More

Advertisements

Logical vectors are vectors that compare values. The response R gives is either TRUE which means that the comparision statement is correct or FALSE which means the comparision statement is incorrect.

Logical vectors use various operators that indicate ‘greater than’, ‘less than’, ‘equal to’, etc. As this is an abstract concept, it is better to work through several examples to understand.

You want to know how many times James scored more than 20 points in a game. to determine this you develop a simple equation that R will answer with a logical vector

  • > points.of.James <- c(12, 15, 30, 25, 23, 32)
    > points.of.James
    [1] 12 15 30 25 23 32
    > points.of.James > 20
    [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Here is what we did

  1. We inputted the values for ‘points.of.James
  2. We then entered the equation ‘points.of.James > 20’ which means which values in the variable ‘points.of.James’ are greater than 20′.
  3. R replied by stating that the first to values are not greater than 20, which is why they are FALSE and that the last 4 values are greater than 20, which is why they are TRUE.

Logical vectors can also be used to compare values in different vectors. This involves the use of the function ‘which(). The function which() is used for comparing different vectors. Below is an example

You want to know which games that James scored more points than Kevin. Below is the code for doing this

  • > points.of.James <- c(12, 15, 30, 25, 23, 32)
    > points.of.Kevin <- c(20, 19, 25, 30, 31, 22)
    > the.best <- points.of.Kevin < points.of.James 
    > which(the.best) 
    [1] 3 6

Here is what happen

  1. You set the values for the variables ‘points.of.James’ and ‘points.of.Kevin’
  2. You create a new variable called ‘the.best’. In this variable you set the equation that compares when Kevin scored less than James by comparing the values in the variable ‘points.of.Kevin’ with ‘points.of.James
  3. You then used the ‘which()’ function for it to tell which times that Kevin scored less than James.
  4. R responds by telling you that Kevin scored less than James the 3rd time and 6th time

You can also find not each time that Kevin scored less than James but instead find out how many times Kevin scored less than James by using the ‘sum()’ function instead of the ‘which()’ function.

> points.of.James <- c(12, 15, 30, 25, 23, 32)
> points.of.Kevin <- c(20, 19, 25, 30, 31, 22)
> the.best <- points.of.Kevin < points.of.James 
> sum(the.best)
[1] 2

R explains that Kevin scored less than James two times.

Naturally, there is much more that can be done with vectors than what was covered here. This is just a glimpse at what is possible.

Introduction to Vectors in R: Part II

Advertisements

In a previous post, we took our first look at vectors and their use in R. In this post, we will build on this knowledge by learning more ways to use and analyze vectors.

Functions for Analyzing Vectors

An important function for analyzing vectors is the str() function. This function allows you to look at the structure or characteristics of any object, including vectors. Objects are the pieces of data you create and or manipulate in R.

We are going to analyze the structure of the variable ‘points.of.James’, which contains a vector. Below is the code for this followed by an explanation.

  • > points.of.James > str(points.of.James)
     num [1:6] 12 15 30 25 23 32

Here is what we did

  1. We created the variable ‘points.of.James’
  2. We assigned the values 12, 15,30, 25, 23, 32 to the variable ‘points.of.James’ Using a vector
  3. We used the str() function to analyze the variable ‘points.of.James’
  4. The output told us the following
    1. num means that the vector is numeric
    2. 1:6 shares two pieces of information. The 1 tells us how many dimensions the example, which is one. The six tells us how many values or indices the vector has, which is six. In other words there are six numbers in the variable. Remember the word ‘indices’, which refers to the location of a value within a vector, as it will be important in the future.

Let’s say you are curious to know how many indices a vector has. To figure this out you use the length()  function is demonstrated below.

  • > points.of.James > length(points.of.James)
    [1] 6

You can probably see that the variable ‘points.of.James’ has six values within it.

Another useful function for vectors is the ability to combine them. In the example below, we will combine the variables ‘points.of.James’ and ‘points.of.Kevin’.

  • > points.of.James > points.of.Kevin 
    > all.points > all.points  
    [1] 12 15 30 25 23 32 20 19 25 30 31 22

Here is what happen

  1. We made the variables ‘point.of.James’ and ‘points.of.Kevin’ and inputted the values
  2. We created a new variable called ‘all.points’ and assigned the variables ‘points.of.James’ and ‘points.of.Kevin’ to it.
    1. NOTE: Assigning a variable to another variable means assigning the values of the first variable to the second one. In other words, the values of ‘points.of.James’ and the values of ‘points.of.Kevin’ are now stored in the variable ‘all.points’
  3. We then read the ‘all.points’ variable and it showed us all of the values within it. Which the same values found with ‘points.of.James’ and ‘points.of.Kevin’.

One last useful tool when using vectors is the ability to extract values from a vector. This can be done by using the brackets or [ ]. Extracting values means taking a handful of values from a vector and looking at them alone. Below is an example.

  • > all.points
     [1] 12 15 30 25 23 32 20 19 25 30 31 22
    > all.points[10]
    [1] 30

In the example above we were looking at our ‘all.points’ variable. I typed all.points[10] into R and this told R to do the following

  • Extract the tenth value from the ‘all.points’ variable

R then replies by telling me the tenth value of the ‘all.points’ variable is 30.

Off course, you can extract multiple values as seen below

  • > all.points
     [1] 12 15 30 25 23 32 20 19 25 30 31 22
    > all.points[c(2, 4, 6, 8)]
    [1] 15 25 32 19

In this example, we extracted the second, fourth, sixth, and 8th value from the variable ‘all.points’

This is still just an introduction into how vectors can be used in R.

Doing Simple Math in R

Advertisements

As a mathematical program, R is able to calculate a wide range of mathematical functions. This post will cover some basic operations involving variables and vectors.

We will start with an example, imagine that two basketball players, James and Kevin, have decided to donate money to a local charity forever point they score in a game. James agrees to give $150.00 per point and Kevin agrees to give $130.00 per point. This promise is for the last six games they have played. You want to know the following

  1. How much money did they give combined for each game?

Below is the code for this scenario.

  • > points.of.James <- c(12, 15, 30, 25, 23, 32)
    > points.of.Kevin <- c(20, 19, 25, 30, 31, 22)
    > James.money <- points.of.James * 150
    > Kevin.money <- points.of.Kevin * 130
    > James.money + Kevin.money
    [1] 4400 4720 7750 7650 7480 7660

Here is what we did

  1. We need to know how many points James and Kevin scored in each game. Each of these values were set to their own variable “points.of.James” and “points.of.Kevin” respectively. This information was provided for you.
  2. We then needed to figure out how much money James gave away by multiplying his points by the 150.00 per point he promised. This value was assigned to the variable “James.money” We also did this for Kevin but he only promised to pay 130.00 per point. Kevin’s amount was set to the variable “Kevin.money”
  3. Finally, we combined the amount James and Kevin promised by adding together the variables “James.money” and “Kevin.money”

Rounding

Numbers can also be rounded in R. Going back to our previous example. Let’s say we want to know how much James and Kevin gave round to the nearest thousand. Below is the code

> round(James.money + Kevin.money, digits = -3)
[1] 4000 5000 8000 8000 7000 8000

Here is what happened

  1. We used the “round” function to round the number.
  2. The variables “James.money” and “Kevin.money were put inside the parenthesis with the operator + put between them so that R adds them together.
  3. Since we are rounding, we told are that we want to round to the nearest thousand by adding the argument “digits” with an equal sign followed by negative three. The negative tells R to round by looking to the left three place from the decimal. If we wanted to round to the nearest thousandth we would have used positive 3.

This was just an introduction to some of the basic mathematical functions of R

Basics of Vectors Functions in R

Advertisements

Vectors were discussed in a previous post. In short, vectors are a collection of information. Functions serve the purpose of performing several operations one after another. Now, will combine these two concepts into what are known as vectorized functions. Vectorized functions are functions that perform several operations on a vector. In other words, you can put several numbers into a vector and then have a function do something to all the numbers at the same time.

Here is an example

You have been asked to keep score during basketball season. Below are the number of points per game for a player name James for his first six games. The information is inputted in R as follows

  • > points.of.James <- c(24, 8, 8, 12, 18, 9)
    > points.of.James
    [1] 24 8 8 12 18 9

What we have done so far is create the variable “points.of.James” and assigned the vector of 24, 8, 8, 12, 18, 9. The points represent the number of points he scored in the six games.

The first function we are going to use is the sum() function. This function will add up how many points James scored in the six games. Below is the code for it.

  • > sum(points.of.James)
    [1] 79

Here is what we did

  1. We told R we want the sum of the values in the variable “points.of.James”
  2. The values in the variable “points.of.James” are the vector 24, 8, 8, 12, 18, 9.
  3. R adds the values up and produces the answer 79

Another more complicated example has to do with how functions can combine vectors. For example, let’s say a list of people in the English department and you want to add after their name that they all have the position of lecturer. To do this you need to use the paste() function which will combine the information. Below is an example of how to do this using R.

  • >  faculty.names <- c("Darrin Thomas", "John Williams", "Sarah Smith")
    >  Rank <- "Lecturer"
    >  Rank <- ", Lecturer"
    >  paste(faculty.name, Rank, sep = "")
    [1] "Darrin Thomas, Lecturer" "John Williams, Lecturer" "Sarah Smith, Lecturer"

Here is what we did.

  1. We created a variable called “faculty.names” and we assigned a vector to it that contain the following names
    1. Darrin Thomas
    2. John Williams
    3. Sarah Smith
  2. When then created a variable called “Rank” and assigned the value “, Lecturer”. We put the comma in front of the word Lecturer so that when we combine the faculty.names and rank variables we have a comma after the name of the faculty person and before their rank, which is lecturer.
  3. Next, we combined all of the names of the faculty members with the rank of “Lecturer” using the paste() function.
    1. NOTE: In the paste function we included the argument sep = “” This prevents a space appearing after the name of the person when the comma is inserted. Below is an example of what we do not want
      1. Darrin Thomas , Faculty   (Remeber we do not want this space. That’s why we use the sep = “” argument because it removes the space in front of the comma)
  4. After setting up the paste function we get the “faculty.names” and “Rank” combined which shows the names of the faculty members with their corresponding rank.

Let’s pretend you are dealing with the same problem but now the faculty members have different rankings. Here is the code for how to do this.

  • > faculty.names <- c("Darrin Thomas", "John Williams", "Sarah Smith")  
    > faculty.ranks <- c(", Lecturer", ", Senior Lecturer", ", Principal Lecturer")
    > paste(faculty.names, faculty.ranks, sep="")
    [1] "Darrin Thomas, Lecturer" "John Williams, Senior Lecturer"  
    [3]"Sarah Smith, Principal Lecturer"

All that is new is that we created a variable called “faculty.ranks”, which included the new rankings. Then we used the paste function to combine the names with ranks. As you can see, each person can their corresponding rank. The first name got the first rank, the second name got the second rank, etc. If there is a difference between the number of names and ranks R will recycle values to complete the function

Sourcing Script in R

Advertisements

Sourcing a script has to do with having R running several commands either one after another with stopping. In order to do this R Studio, you need to type your code into Source Editor which is normally at the top left-hand side of the program. We are going to go through an example in which R will ask a question and you will answer it.

In the source editor, type the following information. At the end of each line, press enter to move to the next line. I will explain the meaning of the text after the example

h <- "Welcome to R"
yourname <- readline("What is your name?")
print(paste(h, yourname))

After typing this information, you should click on the source button near the top middle of the computer screen. If this is done correctly you should see the following in the console.

> source('~/.active-rstudio-document')
What is your name?

Type your name in response to the question and press enter. Then you should see the following.

> source('~/.active-rstudio-document')
What is your name?Darrin
[1] "Welcome to R Darrin"

So what exactly did we do?

  1. We made a variable named h and assigned the object “Welcome to R”
  2. Next, we made a variable called yourname and assigned the function readline to read the phrase “What is your name?”. The readline function literally reads text.
  3. We then told R to print a combination (using the paste function) of the h variable first (Welcome to R) and the yourname variable second (What is your name?)
  4. Next, we clicked the source script button
  5. In the console, we were asked the question “What is your name?”
  6. We responded with our name.
  7. R takes your name and combines it with the h variable to create the following phrase
  8. “Welcome to R Darrin”

Several steps of code were run at once through using the script editor. This same process could have been done in the console but using the script editor saves time. Many aspects of programming focus on saving time and improving efficiency. There are always several ways to do something but the goal is always to find the method that takes the least amount of effort and improves the readability of the code.

Making and Using Variables in R

Advertisements

Variables are used in R to store information for computational purposes. It seems that there is almost no limit to what can be stored in a variable. To make a variable, you need to know the following information.

  • The name you want to give the variable
  • The information you want to store in the variable

Here is an example,

You want to make a variable that will store the following test score: 80, 81, 82, 83, 84, 85. You want to call the variable test_scores. Here is what we know

  • The name of the variable is test_scores
  • It will contain the values of 80, 85, 90, 95, 100

Here is how this would look in R.

  • > test_score <- 80:85

There are a few things to explain

  1. the <- sign means “assigned to” in other words, the variable name on the left of the <- sign is being assigned to the values 80:85.
  2. The colon sign stands for a sequence. We wanted all whole numbers from 80 to 85 and the colon sign provides this information.
  3. Both the <- and : are known as operators in R. Operators symbols you place between numbers in order to make a calculation.

Know, type test_scores into the R console and press enter. You should see the following.

  • > test_scores
    [1] 80 81 82 83 84 85

R now shows you everything that is stored in the variable. We can also created other variables and perform calculations through the use of variables. For example, let’s say that you want to add 5 points of extra credit to the scores of the test. To do this let’s make a variable called extra_credit and add test_scores to extra_credit

  • > extra_credit <- 5
    > test_scores + extra_credit
    [1] 85 86 87 88 89 90

As you can see, R took the values of test_scores and add extra_credit to each value in test_scores.  This is much faster than entering each value separately to calculate it. We can also make a new variable for the new scores and we can call it revised_test_scores Let’s try

  • > revised_test_scores <- test_scores + extra_credit
    > revised_test_scores
    [1] 85 86 87 88 89 90

Variables can also be used for text. The only difference is that you must put quotes around the words. Otherwise, the computer will think the words are numbers, which does not make sense. Below is an example,

  • > h <- "Howdy"
    > h
    [1] "Howdy"

Lastly, variables can be used to store vectors. This is very useful in saving a lot of time in performing calculations. We will now  make the variable student_names and assigned a vector to it containing the names of students.

  • > student_names <- c("David", "Edward", "John")
    > student_names
    [1] "David" Edward" "John"

This is only the beginning of some of the amazing features of R.

Introduction to Vectors in R

Advertisements

A key component of R is the use of vectors. Vectors a single piece of information that contents a collection of information. This probably sounds extremely confusing so an example will be provided.

Think of an organization such as a school. We will call the school Asia International School. Asia International school consist of an administrator named Dr. T, teachers named Mr. Bob and Mrs . Smith, and students named Sam, David, and Mary. In this example, the school is consider a vector or a single piece of information (Asia International School). The administrators, teachers, and students are the collection of information within the large piece of information that is the school.

If we wanted to write this example using the R programming language it would look something like the following…

  • > Asia International School(Dr. T, Mr. Bob, Mrs. Smith, Sam, David, Mary)
    [1] Dr. T Mr. Bob Mrs. Smith Sam David Mary

To be fair this is not exactly how it is done this is strictly an imperfect illustration of a very abstract concept.

A Real Example

In order to make a vector in R you need to use c() function. A function is a piece of code that does something to the information that is within its parentheses. For example, let’s make a vector that contains the numbers 1, 2, 3, 4, 5.

  • > c(1,2,3,4,5)
    [1] 1 2 3 4 5

To get the second line you need you press enter

The c means combine. The information inside the parentheses is the information that is being combined into one vector. Going back to our school example, Dr. T, Mr. Bob, Mrs. Smith, Sam, David, and Mary were being combined into the school Asia International. In a vector, all information inside the parentheses is known as arguments.

Conclusion

This is just some of the most basic ideas about vectors. There is a great deal more to explore about the use of vectors which is considered one of the most powerful features of R. The challenge of learning R is with the abstract nature of programming. You have to think of things you want to do in terms of a code that the computer can understand. This is very confusing for most people.

Using R

Advertisements

The R program can be used only to process many different mathematical functions. However, many people choose to use some sort of editing tool while using R. The editing tool provides a place for developing and save codes and functions.

There are many different editing tools available. For Windows, a popular choice is RGui. For Mac, R.app is a common choice. The choice that is quickly becoming the standard for R Users is RStudio. RStudio works on all software platforms. This provides a consistent interface for people despite whatever operating system they are using. Below are some additional benefits of RStudio.

  • Brackets are automatically setup when developing code
  • Different parts of a code have a corresponding color. This helps in reading the code.
  • Code completion. Saves time

Coding in RStudio

The first thing people often do when learning to code is create the message “Hello World.” To do this in RStudio requires the following

  1. In RStudio make sure the cursor is blinking in the console section (The console is normally in the lower left hand part of the window)
  2. Type the following and press enter
    1. print (“Hello World!”)
  3. After pressing enter you should see the following
    1. [1] “Hello World!”

Congrats, you have just developed and implemented your first R Script

Doing Some Math

R can be used for performing math calculations as well. Consider the following example

  1. Type the following into the console and press enter
    1. 1 + 3 + 5 + 7
  2. You should get the following output
    1. [1] 16

Have some fun playing with the print as well as calculating various math problems as well.

The History and Characteristics of R

Advertisements

R is a programming language and software environment that is used for the development of graphic data products and the computation of many forms of mathematics. The history of R goes back about 20-30 years ago. This post will look at the history of R as well as the Characteristics of this software.

The History

Ross Ihaka and Robert Gentleman are the developers of R. R is actually based on an older programming language known as S which goes back to the 1970″s. Ihaka and Gentleman develop their own programming language while working together in New Zealand. With the release of R in the early 1990’s, several people joined the project to help to improve it. By 1995, the software had become “open-sourced” which means that anyone can use and modify it for themselves without cost. By 2000, the first version of R (1.0) was released to the public.

Characteristics of R

In many peoples opinion, the best feature of R is the price. Being free, R is by far one of the best softwares for statistical analysis is price is the most important criterion. SPSS and SAS are also great and user-friendlier, however, their price is completely outlandish for most individual researchers. R removes this problem completely

R also has an active community around it that supports its development. For example, people are able to develop packages that provide assistance in running various task in the R software. Naturally, most packages are free as well. The focus on community has enabled R to be run on almost any operating system as well, such as Windows, OSX, or Linux.

R also allows people to make graphs and data products. The graphs are actually very well made. The drawback is understanding the coding necessary to develop these various products. This is discussed more below.

One major drawback that affects the typical computer user is learning the programming language of R. This can be challenging for those who are not techie or able to think abstractly in computer codes. I have never seen a satisfactory way to get around this problem but to crack open a book and practice, practice, practice. With time, the code will start to make sense but it is not a five-minute process for someone who has not studied programming.

Conclusion

Despite the challenges of learning computer programming R is becoming the software of choice for many. The benefits far outweigh the problems for many individuals. Personally, I am looking forward to continuing to develop skills in understanding this dynamic software.

Action Research Part II

Advertisements

In the last post, we began a discussion on action research. This post will conclude our look at action research. In this post, we will look at the following concepts

  • Steps in action research
  • Pros and Cons of action research

Steps in Action Research

How to approach action research is highly variable. Often action research will include research questions, the gathering of data, data analysis, and the development of an action plan

Identify the Research Question(s)

Action research is about answering questions. These questions can be used in many ways for example.

  • To ask about information needed to make decisions
  • To ask questions about how well something is doing
  • To ask questions about what people think or feel

The types of questions are endless. For further information on asking questions in research please click here.

Data Collection

All standard forms of methodology are appropriate for action research. Survey, correlational, experimental, and more can all be used. With action research, the approach is often simplified because it is not the rigor but results that are important.

However, there are several forms of data collection that are extremely popular in action research. Observation, interviewing, and document analysis are some of the most common forms of data collection in action research.

The goal is always to try and triangulate whatever information is being collected. The type of collection method depends on what the research question is.

Data Analysis

Data analysis includes the same methods as other forms of research. The difference being that action research is much less complex in the approaches taken to analyze data. The primary goal is to create an accurate picture of whatever is under investigation.

Development of Action Plan

This final step depends on the original purpose of the study. If the purpose of the action research was to gather data to make a decision. The actual decisions that are made will be represented in an action plan. The action plan is a document that specifies the changes that will take place based on the findings of the study.

If the purpose of the action research was to assess how well something is working or to see which method is best or some other question, a plan may not be the final result. For example, if a teacher wants to see if lecture or discussion is better for the academic progress of students the results would indicate which is most appropriate. The teacher may not need to develop a plan for this but just be sure to include more or less lecture/discussion in their teaching.

Pros and Cons of Action Research

Pros

Action Research can be done by any teacher. The simplicity of action research allows anyone to do it. Results do not need to meet the rigors of publication. The results of teacher-led action research is improved classroom teaching

Action research improves educational practices. With data, schools can make plans to improve performance. Otherwise, schools are left to guess what to do. Problems are identified systematically at the school and even classroom level.

Cons

There is always a lack of validity with action research. The results only apply to the local context and generalization is often difficult. The sample size is often small with the population and sample being the same.

Conclusion

Action research is for strengthening classroom practice. The goal is not necessarily to write and publish but rather to empower local decision making and assist stakeholders directly.

Having said this, action research is an accepted form of research worthy of publication if it is conducted in a systematic fashion. The results may not generalize but they still can provide insights for other practitioners in the field.

Action Research Part 1

Advertisements

Action research is a topic that is spoken of a great deal in education and even other fields. However, this term is often used without people knowing exactly what action research is. Therefore, in this post, we will look at the following about action research.

  • What action research is.
  • Types of action research
  • Levels of participation in action research

Defining Action Research

Action research is research performed for the purpose of solving local problems or obtaining local information to make local decisions. This is important. According to this definition, action research does at least two things

  1. Solves a local problem through research
  2. Provides local data through research to make a local decision

In addition, two unique characteristics of action research is a local of generalizability or external validity, and lack of use of rigorous research approaches.

Action researchers are only concern with dealing with local problems in the local context. Therefore it is often difficult to try to apply the local approaches broadly. For example, the sample and the population in action research are often the same. This rarely happens in larger research projects. This means that excellent action research needs to be replicated in several contexts before it is generalizable.

Action research follows any of the most common research methodologies but not at the same level of mastery. Action researchers are normally practitioners in their respective field and not necessarily thoroughly trained scholar-researchers. The purpose of this research is to solve a local problem and not develop dense theories defendable theories.

Action Research Types

There are at least two types of action research. They are practical action research and participatory action research.

Practical action research involves dealing with a local problem by providing solutions to improve short-term performance and or provide information for making decisions. One result of this type of action research is an action plan which is plan that is developed based on research for the purpose of change

Participatory action research is the same as practical action research in that it deals with local problems and provides solutions or data. The main difference between these two forms of action research is philosophical. Participatory action research focuses on empowering individuals and bringing social change through research.

Participatory action research is about the participation of as many stakeholders as possible in the research process. This is one reason why participatory action research is referred to as collaborative research

Levels of Participation

Some believe that there are nine levels of participation in action research. Rarely, does one individual participate in all nine levels in a particular project. The table below provides the nine levels, with a description of what happens at that level, as well as who commonly participates at that level. Please keep in mind that this is not the steps of an action research project but a map of how people participate.

Level Participation Who Participates
9 Initiates a study Administrators, teachers, parents
8 Helps with developing research problems Administrators, teachers, parents
7 Designing the project Administrators, teachers, parents
6 Interpretation of results Administrators, teachers, parents
5 Review results Administrators, teachers, parents
4 Data collection Administrators, teachers, parents
3 Receive findings Administrators, teachers, parents, students
2 Know purpose of study Administrators, teachers, parents, students
1 Provide information for the study Administrators, teachers, parents, students

From the table, it shows that adults can participate at all levels in action research. Students normally do not participate beyond level 3 with exceptions being as they grow older such as high school and university students.

Conclusion

Action research is about change. Looking at a local issue and developing local solutions and or information for developing a local plan of action is the focus of action research. For this reason, action research skills are an important tool for people in the field.

Developing Research Questions for Quantitative Studies

Advertisements

Research questions in the empirical process set the stage for an entire study. For this reason, it is critically important that the research questions of any study are worded in a way that allows a researcher to answer the questions clearly and succinctly. In this post, we will look at general guidelines for forming research questions as well as look at three common formats that are used when making research questions.

General Guidelines

Below are some common traits of research questions in quantitative studies. Naturally, this list is not exhaustive.

  • With sounding redundant, research questions pose a question. This is in contrast to hypotheses which make a statement.
  • Common first words in research questions are “how,” “why,” or “what.”
  • Indicate what are the independent, dependent, and if necessary the mediating and intervening variables.
  • It is important to also include the participants and location of a study in the question(s)
  • Lastly, common verbs used in research question(s) include describe, compare, and relate

The last bullet in the list above is of particular importance because it leads into the next section.

Types of Questions

There are at least three types of research questions in quantitative studies and the are

  • descriptive questions
  • comparison questions
  • relationship questions

Descriptive Questions

Descriptive questions identify a participants response to a question/variable. One possible template of a descriptive question is below. The underlined portion needs to be completed for each study. The template is followed by an example.

TEMPLATE
How often do (participants) (variable) at (research location)?

EXAMPLE
How often do students exercise at the university level?

EXPLANATION
In this question the participants are students, the variable is the amount of exercise, and the research location is university level. Descriptive questions strictly describe a variable. Higher level inferences is not a part of this approach. If you look carefully you will notice there is no independent or dependent variable because we are not looking for any relationship. There is only a variable that we describe.

Comparison Questions

Comparison questions seek to understand if two groups are different on one or more dependent variables. We will modify the previous example on exercise and students for this question. The underlined portion needs to be completed for each study. The template is followed by an example.

TEMPLATE
How are/is (group 1) different from (group 2) in terms of (dependent variable) for (participants) at (research location)?

EXAMPLE
How are men different from women in terms of exercise amount for students at university?

EXPLANATION
The groups are men and women, the dependent variable is the amount of exercise, participants are students, and the research location is university. This type of question only points out the difference but does not explain. For an explanation, we need to use relationship questions as described next.

Relationship Questions

Relationship questions try to answer the question of the strength of a relationship between two or more variables. One possible template of a relationship question is below. The underlined portion needs to be completed for each study. The template is followed by an example.

TEMPLATE
How does (independent variable) relate to/influence (dependent variable) for (participants) at (research location)?

EXAMPLE
How does exercise influence GPA for students at the university level?

EXPLANATION
In this question the independent variable is exercise, the dependent variable is GPA, the participants are students, and the research location is a university. The goal is to see the strength of the relationship. This information can be used to explain exercises influence on GPA or to predict potential GPAs of students based on the amount of exercise they get or vice versa.

Conclusion

Good questions lead to good answers. This is one reason why research questions matter so much. They must be clearly set forth at the beginning of any study. The examples above are not the only way to approach this. However, they do provide a starting point for those who are new to research.

Tips for Writing Quantitative Purpose Statements

Advertisements

There are several equally acceptable ways to write purpose statements for quantitative studies. This post will share some suggestion for getting started

Ideas for Writing Quantitative Purpose Statements

A well-written quantitative purpose statement contains the following elements

  • identified variables
  • the relationship among the variables
  • the participants
  • the site of the research

Here is an example

The purpose of this study is to determine the strength of the relationship between height to weight among undergrad students in Thailand.

Here is a breakdown of the elements of the purpose statement above.

  • identified variables [Height and Weight]
  • the relationship among the variables [Height is the independent variable weight is the dependent variable]
  • the participants [undergrad students]
  • the site of the research [Thailand]

Here are some additional tips

  • Try to write purpose statements in one sentence
  • Start with the phrase “the purpose of this study” it’s a clue to readers
  • Specify all variables in the study such as independent, dependent, mediating etc.
  • The order variables are introduced is the following
    1. Independent
    2. Dependent
    3. Mediating or control
  • Variables are used for relationships between two or more, compare groups, or description
  • If you are testing a theory, comparing groups, or describing something, state this in the purpose statement

Below is an example, the characteristics of a purpose statement are underlined and in parentheses.

The purpose of this study is to test the theory of planned behavior (the theory) by relating social support (independent variable)  to college intention  to dropout (independent variable)  for undergrad students (participants) in Thailand (research site)

Comparison is another common form of research. Below is a purpose statement that focuses on comparing groups. the characteristics of a purpose statement are underlined and in parentheses.

The purpose of this study music choice(independent variable) of classical (group 1), contemporary (group 2),  and no music(group 3) in terms of its influence on academic performance (dependent variable) for undergrad students (participants) in Thailand (research site)

In the above example, music choice is the independent variable that is hypothesized to influence academic performance. Three types of treatment are employed classical, contemporary, and no music. The goal is to see if there is a difference in the means of academic performance at the completion of the study.

Conclusion

Purpose statements for quantitative studies are important as they lay the foundation for a study. A good statement tells a reader what to expect for the rest of the study. For this reason, researchers need to be careful and think of the purpose statement with care.

Developing Theories in Research

Advertisements

Theories in quantitative research serve the purpose of explaining and predicting the influence of the independent variable(s) on the dependent variable(s). Theories provide researchers with an understanding of the various relationships that are found in research. For example, let’s say a researcher finds a relationship between exercise and income. The researcher finds that as exercise increases so do salary. The relationship is found in many different contexts. It is found among men, women, Africans, Asians, various college majors, etc. Since the relationship holds steady across several different contexts and environments it is considered a theory. It is the general applicability of a theory that makes it strong.

The testing of a theory is one of the most rigorous forms of quantitative research. However, the development of theory does not start there. There are many different ways to develop theories. In this post, we will look at four common ways to develop a theory, which are…

  1. Gut-feeling
  2. Theoretical reason
  3. Conceptual framework
  4. Theory testing

Gut-Feeling

Sometimes a researcher has a gut-feeling that there is a relationship between two variables. For example, going back to the example of the link between exercise and income, a researcher may notice that his most fit friends also makes the most money. He has a gut feeling that exercise can predict income. This is an unsophisticated approach but it is a beginning for exploring a potential theory. However, the application of this initial research is highly limited because the motivation to study is from just the experience of one individual.

Theoretical Reason

A theoretical reason is a logical conclusion by another scholar that a researcher uses as support for the development of their own theory. In our exercise example, let’s say that another scholar finds that students who exercise perform better academically than students who do not. This leads to a reasonable conclusion that exercise may affect income as well. Our theory is not only a hunch now but based on the scholarly contribution of another person’s work. This allows the conclusion of our study to have a wider application because it not only based on our own observation.

Conceptual Framework

A conceptual framework is a collection of several others who have come to similar conclusions about a particular relationship. Instead of just one author, let’s say that ten authors made the conclusion that exercise impacts grades. Our questions are slightly different. Does the link found between exercise and grades hold when we look at exercise and income?  The results of this study have an even larger application because more people are involved in the development of this potential relationship.

Theory Testing

Theory testing is just what it says, it is the testing of theory. After we complete our study on exercise and income, several other people replicate the student in various context. The study is done in different cultures, nations, regions, ethnicities, etc. After several reasonable replications, our theory about exercise and income is considered a theory.

Conclusion

Developing a theory is not an easy task. It entails a detailed process that involves a lot of oversight and scrutiny. The benefit of a theory is that it takes a phenomenon in the world and explain the cause and effect of it in simple language for other to use as needed. These explanations help us to make sense of our complex world in succinct sentences and paragraphs. This is why we need more theories because we do not understand everything about the world.

Looking at Variables Part III

Advertisements

Over the last few post, we have been looking at various types of variables used in quantitative research. In this post, we look at several more variables. The variables examined in this post include the following.

  • moderating variable
  • mediating variable
  • confounding variable

Moderating Variable

A moderator variable is a variable that affects or modifies the relationship between two variables. This variable is common in experimental studies. For example, let’s say you are looking at physical activity (aerobic exercise and weightlifting) influence on academic achievement. In this example, we currently have two variables as listed below

independent variable: physical activity
dependent variable: academic achievement 

For physical activity, there are two types, which are weightlifting and aerobics. Suppose that you believe that gender plays a role in the impact on academic achievement. You may think that girls will perform better academically when exposed to aerobic exercise and boys will perform better academically when exposed to weightlifting.

The results of the study show no overall difference, however, when looking at aerobic exercise, the girls who were exposed to it did better academically than the boys exposed to aerobic exercise. In addition, the boys exposed to weightlifting outperformed academically the girls exposed to weightlifting.

What is happening here is an interaction effect. Girls perform high academically when exposed to aerobic exercise while boys perform poorly. However, boys exposed to weightlifting perform better academically while girls perform poorly when weightlifting. When boys go up in performance girls go down and vice versa. This interaction effect is due to gender. Thus, gender is the moderating variable of the study as it impacts the relationship between physical activity and academic achievement. Below is a list of the three variables in the study.

independent variable: physical activity
dependent variable: academic achievement
moderating variable: gender

The conclusion from this is that a teacher should include exercise in order to boost academic achievement. However, it may be beneficial to have specific exercises available for males and females as it appears that different forms of exercise are more beneficially for one sex over another. If boys lift weights and girls do aerobics you can potentially expect maximum academic achievement.

Mediating Variable

Mediating variable is a variable that is between an independent and dependent variable. Mediating variables transmit the effect of the independent variable to the dependent variable. Let’s look at an example

Returning to our study on physical activity and academic achievement. Let’s say that we believe that physical activity leads to higher overall confidence and that the higher confidence is what directly leads to higher academic achievement. Below is a list of the variables.

Independent variable: physical activity
Mediating variable: confidence
Dependent variable: academic achievement

Mediating variables help to further explain what appears to be a simple relationship. There is more to academic achievement than just physical activity. Confidence plays a part as well. Thus mediating variables help to further explain cause and effect. Models can become endlessly complex when including mediating variables. Therefore it is up to the researcher to determine what to include when deciding how to explain a dependent variable.

Confounding Variable

Confounding variables are variables that are not a part of a study that has an unaccounted for influence on the results of a study. Returning, to our example of physical activity and academic achievement. Let’s say that we did not account for age in the study. In other words, participants in the study were as young as five and as old as 40. If we did not control for age it may impact the validity of our study. Since the academic performance of children and adults is different, age is something that needs to be controlled when doing the study. If not, the results may not be accurate.

Confounding variables are a  type of extraneous variable. Extraneous variables are any variable that is not a part of a study. When a variable that is left out of a study impacts the results it goes from only be extraneous to be a confounding variable because it is confusing the results of a study.

Conclusion

The number of available variables that can be included in a study is intimidating. The goal is not to try and figure out what variables to include and what not. Instead, the focus should always be on the research problem and the research questions that come from the problem. If this is clear, the variables that are needed will emerge and the study will go well.

Looking at Variables Part II

Advertisements

In a previous post, we began a discussion on variables. In this post, we will continue our journey in understanding variables by looking at the family of variables. We will look at the following

  • Dependent variable
  • Independent variable
  • Measured variable
  • Controlled variable
  • Treatment variable

Dependent Variable

Dependent variables are variables whose outcome is influenced by the independent variable.They go by many names in research such as criterion, effect, outcome, and or consequence variable. It is important to know the many different names for a dependent variable in order to avoid becoming confusing when reading research.

When looking at research questions, it is possible to identify the dependent variable by looking at a question that is focused on the outcome of the study.

Independent Variables

Independent variables are variables that influence the dependent variable. Other names for independent variable include treatment, factors, determinants, antecedents, and predictors. Below is an example of a research question using independent and dependent variables.

Do students who spend more time studying math have higher grades than students who spend less time studying? 
Independent variable: study time     Dependent variable: grade

This example above wants to see how study time (independent variable) influences the outcome of math grades (dependent variable).

Types of Independent Variables

There is a great deal of variety not only in the names of independent variables but also in the types. Below are some independent variables with examples when appropriate.

Measured variable. This is an independent variable that is observed or measured by the researcher. A research question example would be “How does verbal ability influence GPA in college?” You measured verbal ability in this example.

Control variable. Researchers include control variables in order to neutralize their influence on the dependent variable. This is done because the controlled variable is not the focus of explaining the dependent variable. Control variables are often demographic traits such as race, gender, socioeconomic status, etc. By removing the influence of control variables a researcher is able to obtain a clearer picture of how the independent variables of the study influence the dependent variables.

Treatment variable. For experimental studies, you will use treatment variables. These variables are categorical. This is because one group receive the “treatment” while another does not. The desire is to see if the treatment makes a difference in the dependent variable. Below is a research question that indicates a treatment variable

How does pop music influence reading comprehension?
Independent variable: Music        Dependent variable: reading comprehension

Within the independent treatment variable, there need to be at least two levels or groups. One group might hear pop music and the other group might not hear pop music. Both groups would take some sort of assessment to measure their reading comprehension. Your desire is to see if there is a difference between the groups. If there is you could contribute this difference to the effect of music which is what you used as a treatment for one group in the experiment.

Conclusion

Variables serve the purpose of allowing a researcher to examine a phenomenon quantitatively. The type of variable used depends on the purpose of the study. Each type of variable comes with certain rules that indicate when it is applicable to use it. This is important for researchers to know when deciding on the research design of their project.

Looking at Variables Part I

Advertisements

Variables are the heart of quantitative research. This post begins a discussion on the types of variables. This will prepare for understanding how to develop purpose statements, research questions, and hypotheses that involve the use of variables. In this post, we will look at

  • Defining variable
  • Categorical and continuous variable
  • Variable vs construct

Defining Variable

A variable is a trait or characteristic that is measurable and that varies. Common examples include height, weight, income, grade level, gender, etc. All of these examples of variables can be measured, which means that the information can be recorded by the researcher, and they vary, which means their values on individual cases are different. If the principle of measurement or variable is violated then the concept is not a variable.

Categorical and Continuous Variables

Variables can be measured in two common ways, either in categories or continuously. Categorical measurement is looking at a variable that has discrete groups that cannot overlap. An example would be gender. A person is either male or female and cannot be both. Categorical variables record the frequency of the occurrence of a variable.

Continuous variables measure a variable along a continuum or range. Examples include age, weight, and height. All of these variables can take on an infinite number of values. A persons weight could be 80kg, 81kg, 81.5kg, 90kg and on and on.

Variables vs. Construct

Two words commonly confused in research are variable and construct. A variable is a trait or characteristics that is stated in a specific way. An example would be a person’s blood pressure. This can be measured directly and it is possible for it to vary.

A construct, on the other hand, is a trait or characteristic that is stated in a general way. A construct is too abstract to be measured directly. An example of a construct would be health. You have to wonder how to measure the construct of health. One way would be to measure the variable of a person’s blood pressure as blood pressure is an indication of a person’s overall health. By measuring blood pressure, you are developing an understanding of the construct of health because of the known relationship between blood pressure and health.

Constructs help with the explanation of the results of different variables. What does it mean to have high blood pressure? Often it means that a person has poor health. What does it mean to have poor health? One example of poor health can be found by knowing a person’s blood pressure. Constructs also help us to lump together similar variables. If someone has a high weight, high blood pressure, and low amounts of exercise these are all indicators of poor health.

A construct must be based on a strong review of literature to assure it is theoretically sound. Anybody can make any construct they want. The difference between a good and bad construct is its relation to existing theories.

Conclusion

Variables are critical to quantitative research as they provide the concepts that are measured and varied. Variables can measure frequency or account for a continuum of responses. Variables can also be used to explain construct. As such, variables will be a topic that we will return to in the future as their use in research is almost the total focus of quantitative research.

Research Purpose, Hypotheses, and Questions

Advertisements

Four key components to a research project are the purpose statement, research questions, hypotheses, and research objectives. In this post, we will define each of these.

Definitions

The purpose statement provides the reader with the overall focus and direction of a study. Both quantitative and qualitative research use purpose statements. Purpose statements normally begin with the phrase “the purpose of this study…” Below is an example of a quantitative purpose statement.

The purpose of this study is to examine the relationship between college completion and organizational commitment of undergraduate students in Thailand. 

Here is an example of a qualitative purpose statement.

The purpose of this study is to explore student experiences at a university in Thailand about completing their tertiary degree.

Both of these examples are short one-sentence responses to what the study will attempt to do. This is a critical first step in shaping the study.

Research Question

The research question(s) in a quantitative or qualitative study narrows the purpose down to a specific question(s) for the researcher to find answers. Below are examples from both the quantitative and qualitative perspective. We are continuing the research themes from the previous section on the purpose statement.

Quantitative

Does organizational commitment affect college completion of students?

Qualitative

What kinds of experiences have students had while completing their degree?

On closer examination, you may have noticed that the research questions sound a lot like the purpose statement. Research questions often split a part a long complex purpose statement into several questions. This is why questions sound so redundant when compared to the purpose statement. Despite this apparent problem, this thought process helps researchers to organize their thinking and proceed in a manner that is much more efficient.

The next two components only relate to quantitative research and they are the hypotheses and research objective(s). For this reason our illustration of qualitative concepts will stop at this point.

Hypotheses

Hypotheses are statements a researcher makes about the potential outcome(s) of a study based on the examination of literature. Below is an example from the same theme as before.

Students who have a higher perception of organizational commitment will also have a higher likelihood of completing college.

Again, the wording of the research questions, hypotheses and purpose statement are similarly. The difference is only slightly and is due to context. Seeing these similarities quickly will help you to move faster in finishing a study. The difference between these elements is a matter of perspective rather than a strong difference, as they do sound awfully similar.

Research Objectives

Research objectives are the goals a researcher has for a study. This component is not always included in a study. Below is an example.

To examine the correlation between organizational commitment and the rate of college completion

Conclusion

The purpose statement, research questions, hypotheses, and research objectives help a researcher to focus on what he is studying about. With this focus comes a clearer understanding of what to do. Not all forms of study have all components nor are all components always required. The point is that attention to these details will help in the success of a study.

Developing a Statement of the Problem Part II

Advertisements

In the previous post, we looked at two of the five characteristics of developing a statement of the problem, which was the topic and the research problem. In this post, we look at the last three characteristics, which are

  • a justification for the problem
  • exposing the gap
  • determining the audience

Justification of the Problem

When conducting research, it is a researchers job to explain the importance of a study or to justify it. There are several ways to do this.

  • Cite literature that recommends such a study as yours
  • Share experiences others had that call for the investigation of the problem
  • Share personal experiences that call for the investigation of the problem

These approaches are most self-explanatory. Many journal articles include recommendations for further study at the end of the article. This is a great place to find justification for a new study. The experiences of others and yourself provides anecdotal support for a current study. Recommendations in research is a stronger justification but anecdotal evidence is often acceptable as well.

Exposing the Gap

Your current study must clearly explain what other studies have not examined. This is called exposing the gap or indicating the deficiencies in prior studies. It is easy to find out what others have done. However, it is much more challenging to notice what people have not done since people do not often share what they did not do in a study. Identifying what is missing in a field is challenging.

Determining the Audience

This aspect of the statement of the problem indicates who will benefit from the study. All research is not for everybody. Usually, research is very specific in who it will benefit. Your responsibility is to identify who will want to read and possibly use your research,

Conclusion

The statement of the problem is the first section of a study. This component of an article lays the foundation for the entire paper. As such, it must be well-written with a clear sense of direction. One way to enhance and maintain the clarity of this section is by adhering the five characteristics of a statement of the problem. Through approaching the statement of the problem with these characteristics in mind a researcher can develop a succinct approach to expressing this component of research.

Developing a Statement of the Problem Part I

Advertisements

In the last post, there was a look at developing research problems. The research problem is actually a part of a larger section in a research article called the statement of the problem. The statement of the problem is written at the beginning of a research article and has five parts to it.

  1. The topic (discussed previously)
  2. The research problem (discussed previously)
  3. Explanation of the importance of the study
  4. The gap in knowledge the study tries to fill
  5. The audience that will benefit from the study

In this post, we will look at the first two parts of the statement of the problem.

Topic

The topic is the broadest level at which an article can begin. This serves the purpose of gradually introducing the issue of the research article to a reader. By starting broadly, it allows the reader to grasp where the writer is coming from. For example, if we are doing a study about the differences between lecture and discussion teaching one appropriate topic would be the general idea of teaching. From there we can discuss specifically lecture teaching and discussion teaching.

The topic is often found in the first sentence of a statement of the problem and it needs to pull the audience into the paper. There are several ways to do this and they are below with examples from our teaching methods idea.

  • The pull as a question “How do teaching methods impact academic performance?”
  • The pull as a need for research “Teaching methods are under intense scrutiny in the 21st century.”
  • The pull as statistical data “80% of university professors rely on a lecture approach to teaching.”

These are not the only way to do this but they do provide some indication on how to develop a topic for a statement of the problem.

Research Problem

After addressing the topic, the next step is to provide the research problem. The research problem is the issue that you as a researcher are going to investigate. There are several ways to indicate a research problem. Two ways are a practical research problem and the other is the gap based research problem.

A practical research problem is a problem coming from a particular environment or setting. In education, the setting might be a school, student homes, or even teacher working conditions. Below is an example of a practical research problem. The example begins with a topic and moves to the research problem. Both the topic and research problem are identified.

Teaching methods are under intense scrutiny in the 21st century ( the topic is teaching methods). At the university level, there has traditionally been an emphasis on lecturing. However, many tertiary schools are beginning to promote discussion as a viable alternative to lecture (shift to practical research problem). This leads to the dilemma of deciding whether lecture or discussion teaching is most beneficial to students in terms of their academic performance (practical based research problem).

The practical problem is deciding which teaching method is most beneficial to students.

A gap based research problem is a problem that calls for further research in an existing area because prior research has missed something, there is conflicting evidence, and or there is a need to extend the research into new fields. Below is an example.

Teaching methods are under intense scrutiny in the 21st century ( the topic is teaching methods). At the secondary level, there has been a tremendous amount of research into various teaching approaches. In particular, both lecture-based and discussion based teaching have been examined. These two methods have been compared in terms of their influence on the academic performance of secondary students. However, there is little data on the impact of lecture and discussion teaching at the tertiary level in relation to their influence on student achievement (potential gap). As such, there is a need to examine how lecture and discussion teaching impact the student achievement of tertiary students and to see if there is any difference between the results of tertiary students and secondary students (gap based research problem). 

In the example above, the problem is indicating that there is a gap in our knowledge about how lecture and discussion impact tertiary student achievement. The example indicates how there are lots of studies at the secondary level but few at the tertiary level. This lack of data at the tertiary level indicates a gap in the existing knowledge and thus a need for further study.

Conclusion

The statement of the problem sets the stage for the entire study. Therefore, it needs to be clearly explained and developed in order to successfully complete a study. If the beginning is poor the end will be a disaster. As such, the hardest part of a research study is the beginning.

As you can see from the examples employed in this post, the same problem can be practical or gap based. What makes the difference is the context, your personal background, and perhaps even where you want to publish. Different journals lean in different directions and a practical based problem may need to be reworded as a gap based problem and vice versa. All this comes out of your own experience which will affect how you see a problem. However, you frame a problem everyone expects to see certain components in a statement of the problem

Identifying a Research Problem

Advertisements

The research problem is the issue or concern in a particular setting that motivates and guides the need for conducting a study. Identifying a research problem is important because it lays the foundation for an entire project. If the foundation is shaky the entire project is doomed to failure. This is why absolute caution is necessary for the initial stages of a research project.

In addition to a research problem, there are several terms closely related to it. These terms are

  • research topic
  • research problem (already discussed)
  • purpose
  • research questions

We will now look at each of these with an appropriate example.

Research Topic

A research topic is the subject matter of a study. This is the broadest aspect of research and many people begin here. An example would be student satisfaction. What we are going to do and how is not explained yet. This is only pointing us in a certain direction without knowing what our destination is.

Research Problem

As mentioned previously, the research problem is the issue being addressed. The problem helps in narrowing the topic down to something that is reasonable for conducting a study. Continuing with our topic of student satisfaction, we may look at student satisfaction of teaching methods at the university level. The problem is that student satisfaction is often low at many universities and we want to do a study to explain some reasons for this.

Purpose

The purpose is the objective of the study. For our student satisfaction example, the purpose could be To identify how various teaching methods influence student satisfaction at a university. At this point, we are explaining what we are going to do.

It is common for people to confuse the research problem and the purpose. The problem simply identifies a problem. The purpose explains how you will study the problem or what you are going to do. The purpose looks at the problem and states what you are going to look at in order to generate data about the problem that could be used one day to solve the problem.

This distinction is important because small missteps in the beginning will lead to major issues in completing a study in the future.  Think that problems are passive in terms of what you will do while purpose is active in terms of what you will do.

Research Questions

Research questions narrow the purpose down into questions that provide evidence for addressing the research problem. For example, in our student satisfaction example, below are two questions that could be used for the study

  1. How does a lecture method approach to teaching affect student satisfaction?
  2. How does a discussion method approach to teaching affect student satisfaction?

From here the study can move forward.

Conclusion 

The initial step of research involves several sub-steps which are the research topic, problem, purpose, and questions. The topic is the broadest aspect of research and the research question(s) is the most focus aspects of the initial aspects of a study. The purpose of this process is narrow and shape a study in order to make it feasible and relevant.

Qualitative Research Part II

Advertisements

In a previous post, we looked at the first three steps of the process of qualitative research. The steps of this process are below as a review.

  1. Explore a problem to understand the phenomenon
  2. Minor literature review
  3. State purpose and research questions in a general way
  4. Collect data normally from a small sample relying on words instead of numbers
  5. Analyze the data using text analysis to find themes and descriptions
  6. Write up

In this post, we will look at the last three steps of the qualitative research process.

Data Collection

Data collection allows a researcher to learn about the participants of a study. Usually, a protocol or a form for collecting data is created. The protocol can be a list of questions to ask during an interview or a place to record behavior that the researcher observes during the course of data collection. For example, if we are looking at the experience of African students in Thai government schools, we may use an interview protocol, or a list of questions, when collecting data from the students.

The most common forms of data collection include interviews, observation, and document analysis. Interviews is a question and answer session with another individual(s). Observation is the act of watching others. Lastly, document analysis is evaluating written or other objects in the sure for useful information.

Whatever is collected, whether text from interviews, imagines, or other sources becomes a database. Words become a text database. Imagines become an image database. These databases of information are used for the data analysis.

Data Analysis

Data is analyzed in qualitative research in a number of ways. Text segments are the dividing of sentences from the text database into groups. These various groups are used to explain the central phenomenon of the study.

Themes and categories is another analysis technique. In this approach, the researcher looks for commonalities among the data and attempts to organize these themes in order to explain the central phenomenon. For example, if during the course of the interviews with the African students in Thai schools the student mention rejection and humiliation consistently in several interviews, this could be a theme or category of information about the central phenomenon of the experience of these students in Thai schools.

Write Up

The format for qualitative research is similar to quantitative. There is a problem, purpose, literature review, methodology, results, and conclusion. However, this format is much looser in qualitative research and is not strictly followed. Some qualitative studies begin with a long narrative that serves as providing the background of a study as an example.

Qualitative studies have an extensive write up of the data collection which shares the themes and categories as well as the relationship among them. The researcher must also share their biases, values, and assumptions in order to indicate why results were interpreted a certain way. For example, as an African American, I am familiar with the discrimination of Africans in Thailand from my own experience. Therefore, if I were to interview African students about their experience in Thai schools, there would be a temptation to attempt to confirm my own experience as I speak to the students. By sharing this in the write up it informs readers of my own biases about living in Thailand.

Conclusion

Qualitative research is about explaining a central phenomenon. Data collection is for the purpose of gathering information about the topic of the study. The analysis is for the purpose of explaining the results. Lastly, the write up is about conveying the results in a way that is clear for the public.