Category Archives: Research

Qualitative Research Part I

Advertisements

Another form of research is qualitative research. This form of research is employed when the researcher does not know what variables to explore in a study. There are six characteristics of qualitative research. The characteristics are below.

  1. Explore a problem to understand the phenomenon
  2. Minor literature review
  3. State purpose and research questions in a general way
  4. Collect data normally from a small sample relying on words instead of numbers
  5. Analyze the data using text analysis to find themes and descriptions
  6. Write up

In this post, we will explore the first three characteristics.

Exploration of a Problem

Qualitative research is often used when numbers are not able to shed light on
the research problem. Instead, the problem is explored through examining a central phenomenon. For example, if a researcher wanted to know the experiences of African primary students in Thai schools. This is not a study employing numbers but rather exploring the experiences of African children in Thai schools. The central phenomenon would be the testimony and experiences of these minority children in schools in Thailand.

Minor Literature Review

Since it is often exploratory in nature, qualitative research often includes a minor literature review as there is often little information on the central phenomenon. The literature review is mostly used to justify the need for a study. An extensive search of the literature would foreshadow the results and this is inconsistent with the idea of exploration in qualitative research. The desire is to focus on the views of the participants and less on prior literature that does not take into account the uniqueness of the participants and the setting.

In our example about African students in Thai schools, there is little data on this phenomenon. To justify this study, we may need to find some articles that mention the struggles of international students in school as they deal with culture shock and discrimination. This leads to the question of how African students are coping with their experience in Thai schools.  We now know at least that we need to explore culture shock and discrimination as we collect data from African students who attend Thai schools. However, we have no idea what to expect. In other words, we have no hypotheses to test only a desire to explore.

Purpose and Research Questions

The purpose and research questions are written in a way that you are able to gather data from the participants about the central phenomenon. For example, in our African students in Thai school case, we are exploring the African students’ experiences with culture shock and discrimination. These two components are the central phenomenon of the study.

The phenomenon of culture shock and discrimination as experienced by African students in Thai schools will yield verbal data that needs to be analyzed. In other words, we are not using a numerical survey. Instead, the data collected will be the words of the participants as they share their experience with the central phenomenon.

Conclusion

Qualitative research is about examining small samples normally in a non-numerical way. The researcher explores a central phenomenon through the use of interviews, observation and other means. Since there is often a lack of data on the central phenomenon, many qualitative studies have a minor literature review and lack hypotheses as there is no clear direction from the literature. This form of research is an interesting way to gather first-hand experiences from the lives of people.

Quantitative Research Part II

Advertisements

In a previous post, we look at the first three characteristics of the quantitative research process, which was problem identification, review of literature, and developing a purpose for the research.  In this post, we will look at the last three characteristics of the quantitative approach to research which are…

  1. Collection of numeric data
  2. Statistical analysis
  3. Write up of the results using a standardized format

Collection of Numerical Data

Once the purpose of the study has been developed you can think of ways to measure the variables of the study. There are various instruments that can be used to measure the data. One common form of an instrument is a survey. The questions on a survey indicate what people perceive or think about the variables in the study. For example, if a variable is student satisfaction, the question on the survey would relate to what the students think about the school.

People answer the questions on the survey normally using some sort of numerical response such as a Likert scale, which has values from strongly disagree to strongly agree. The respondents select the number that is most closely align with their attitude on the subject. For example, for the student satisfaction survey, we could ask “The teachers are prepared for class.” The students may indicate strongly agree or circle a 5 on the questions or they could strongly disagree and select a 1. A response somewhere in the middle could be 2, 3 or 4.

Statistical Analysis

The analysis is where the data is broken down in order to answer the research questions. The results of this are interpreted in light of your predictions and prior studies. How you analyze the data depends completely on the type of questions you asked. There are many interesting things in almost any data set. However, you must focus on answering your research questions and not on some new discovery you found in your data. New discoveries need to be dealt with in future studies since it is often not acceptable to modify research questions after data collection.

For our student satisfaction example, if the students strongly disagree that the teachers are prepared for class this indicates that the students may not be satisfied with the school since they are not happy with the teachers. One conclusion drawn from this would be that the school must focus on improving the preparedness of their teachers in order to improve student satisfaction.

Write Up

The reporting of quantitative research is formalized into the following pattern.

  1. Introduction
  2. Review of Literature
  3. Methodology
  4. Results
  5. Discussion

There is little variation on this format when publishing. For internal documents, there is much more variation in reporting a quantitative study. The writing style for publication is usually objective and impersonal. There is also a desire to avoid any bias or opinions in the study.

Conclusion

The quantitative research process involves the development of research questions that are answered in a systematic way. It is highly important that you focus on answering your research questions in a study. This is where many people get lost as they attempt to navigate this experience. The purpose of the study, as specified in the research questions, shapes the rest of the journey.

Quantitative Research Part I

Advertisements

Quantitative research is one of the major forms of research used in the world today. This form of research has several distinct characteristics. In all, there are at least six characteristics of quantitative research and they are listed below.

  1. Description of a research problem
  2. A review of literature that provides justified research problem and questions
  3. Purpose statement that is narrow, observable, and measurable
  4. Collection of numeric data
  5. Statistical analysis
  6. Write up of the results using a standardized format

In this post, we will discuss the first three and look at the next three in a future post.

The Research Problem

The researcher begins by finding a research problem through finding trends in a field or finding something that needs to be explained. An example of a trend would be to assess students’ perception of food services. If the sample does not like the food it can be inferred that the population of the school is disappointed as well.

A different research problem would be to understand the relationship among variables. A variable is a measurable attribute that is studied. For example, a researcher may want to know the relationship between height and weight. He wants to know if an increase in height will lead to an increase in weight.

Review of Literature

The literature review serves two main purposes

  1. Justify the need for the research problem and need
  2. Suggest potential purposes and research questions

The review of literature helps you to find something that has not been studied before or that no one has examined thoroughly. For example, a review of literature might indicate that we know little about how height affects weight. This naturally leads to the question “does height affect weight?” The literature review provides the need for the study as well as helps to shape the research questions.

Research Questions

The research question has already been alluded to. Your goal is to develop specific, narrow, and measurable questions. Specific questions help you to only have a handful of variables you need to deal with. For example, in our height and weight illustration, the question we can ask is “does height influences weight?” In this example, we are looking at two specific variables which are height and weight. We need to measure the height of people and we also need to measure the weight of the same people. We can then use statistics to see if height influences weight.  Clear questions are important in clear research design as we will see in the future.

Conclusion

These initial steps are critical to successful research. If the beginning is off and not clear there is no hope for the study. It is at the early stages of a study that students struggle are most frustrated. If a good job is done here the rest of the study is relatively easy to complete.

Research Process

Advertisements

The research process or scientific method is the default mode for systematically gather information for the purpose of answering questions and solving problems. This process serves the purpose of defining the goals of research, making predictions, gather data, and interpreting results.

In general, there are six steps to the research process as listed below.

  1. Identify the research problem
  2. Review the literature
  3. Specify the purpose of the research or develop research questions
  4. Collect data
  5. Analyze and interpret data
  6. Report and evaluate results

Identify the Problem

The problem can come from personal observation, readings, from others, or any other of a host of ways. Finding a problem also helps in focusing your study. When identifying a problem it is important to make sure that you develop a justification for investigating it as well as the importance of it. People need to know why they should care about what you are studying. This has to do with relevancy.

Reviewing the Literature

Reviewing the literature is about knowing what has been done before your so that you can see how you can build on existing knowledge. Most research tends to add to an existing conversation rather than start a new one. Looking at the literature also helps you to see your contribution to the existing body of knowledge. This is one way in which you can find the “gap” in the knowledge that your study will address.

Purpose of Research or Research Questions

The research purpose is the overall objective of the study. It is a restatement of the research problem. Another term for this is the research questions. The research questions are the questions you are asking about the problem. Many times, you do not solve a problem, instead, you ask questions about a problem. The answers to these questions may help to solve the problem or may not. Many people confuse the research purpose with the research questions when they are one in the same. Your goal at this step is to break a part the aspects of the problem into answerable questions. The answer to each question may contribute to solving the research problem.

Collecting Data

This is where the research design begins. Data collection is influenced by the research questions. What you want to know influences what data you will collect. Data collection includes sampling, methods, procedures, and more.

Analysis and Interpretation

Once data is collected it is analyzed. The method of analysis is also influenced by the nature of the research questions. Interpretation is where you answer the research questions. You found a relationship between variables or you didn’t. These answers to your research questions can be used to solve the research problem.

Reporting and Evaluating Research

At this step, the information is complied in a way so that you can communicate with your audience. The format of communication depends on who you are writing for. From journal articles to science fair projects all researchers must know the expected format for communication.

Evaluation is the experience of having your work judge by others based on a certain standard. These standards are not agreed upon. This lack of agreement is another reason to know who you are writing for so you can communicate in a way that is acceptable to them.

Conclusion

The research process serves the purpose of finding answers to questions about problems. A researcher needs to follow the six steps of the research process in order to communicate their findings in a way that is appropriate to their audience.

Defining Research and its Importance

Advertisements

Research is a process that people use to collect and analyze information in order to deepen their understanding of a topic. Generally, there are three steps in this process.

  1. Ask a question
  2. Collect data that relates to answering the question
  3. Present answer(s) that may answer the question

Informally, everyone has done this in their lives. Examples include looking for one’s keys to deciding what to make for dinner based on what is in the refrigerator. Following this process helps in dealing with the challenges of life. There are also several benefits and problems with research as we shall see.

Benefits of Research

Research also has the following benefits

Research adds to our knowledge. Research provides more and more information on various topics. Each project can potentially provide another witness of a particular phenomenon. The vast amounts of statistics on various topics provide information that enlarges knowledge of a given subject.

Research improves practice. Research helps people to find the most efficient ways to do things. An example would be evaluation research done at schools. The schools examine their practices and decide if what they are doing is best based on the data they collected.

Research provides information for policy debates. Research allows decision makers a chance to weigh various options and determine what is best based on data. For example, a school might want to decide if having single subject or multi-subject teachers is the best by collecting data from standardized tests.

Problems with Research

Problems with research include the following

Issues with the research questions. If the questions of the study are vague and unclear the study is dead from the beginning. The foundation of a successful project is clear and researchable questions.

Issues with collection. The data was not collected properly. This could be due to sampling issues, unethical practices, or more.

Issues with analysis. The number of problems here are endless but they include rounding errors, questionable analysis of outliers, improper analysis technique in relation to the questions, adjusting results to support the hypotheses

There is more that could be shared but this is just an introduction into the process of research. There are pros and cons to almost anything. Research must be planned and conducted carefully in order to benefit those who are seeking answers through this process.

Chi-Square Goodness-of-Fit-Test

Advertisements

The chi-square test is a non-parametric test that is used in statistic to determine if an observed distribution or model conforms or is similar to an expected distribution or model. In simple terms, this test will tell you if the data you collected is similar to other data or to what you expected.

There are several types of chi-square test such as the Chi-square Test of Independence, which is used for nominal data, and the Goodness-of-Fit Test, which deals with data that is not nominal. This post is about the Goodness-of-Fit Test. The Goodness-of-Fit test compares the distribution of the observed data with an expected distribution.

A unique caveat of chi-square test is that we normally desire as a researcher to make sure we do not reject our model. This is opposite of traditional hypothesis testing which desires often to reject the null hypothesis as this indicates that there is a statistical difference. With chi-square test, we want our observed model to be similar to the values found in the expected model. What this means is that our model represents what is happening in the real-world and is not only theoretical. If we reject the null it means that the model we are trying to create is not similar to expected values that might be found in the real world. In other words, we found something that does not conform to what is expected. If a model does not represent the world, it may not serve much purpose.

Here are the assumptions of Goodness-of-Fit Test

  • Random selection of subjects
  • Mutually exclusive categories

Here are the steps

  1. Determine hypothesis
    • H0: There is no difference between the observed values/model and the expected values/model
    • H1: There is a difference between the observed values/model and the expected values/model
  2. Decide level of significance
  3. Determine degree of freedom to find chi-square critical
  4. Compute for the expected frequencies
  5. Compute chi-square
  6. Make decision to accept or reject null
  7. State conclusion

Here is an example

A principal wants to know if the number of students absent each day of the week is the same. Below are the results for one week.

Day                  Absents

Monday                 17

Tuesday                 20

Wednesday            16

Thursday               14

Friday                    13

Step 1: Determine Hypothesis

  • H0: The number of students absent is the same every day
  • H1: The number of students absent is not the same every day

Step 2: Decide level of significance

  • 0.05

Step 3 Determine chi-square critical region (computer does this for you)

  • Chi-square critical region = 9.48

Step 4: Compute expected frequencies

  • Computer does this

Step 5: Compute Chi square (computer does this for you)

  • Chi-square = 1.87

Step 6: Make decision

  • Since the computed chi-square of 1,87 is less than the critical chi-square value of 9.48 we do not reject the null hypothesis

Step 7: Conclusion

  • Since we do not reject the null hypothesis we can say that there is a lack of evidence that there is a difference in the number of absences each day of the week. In other words, the number of students absent each day is the same.

NOTE: There is also a way to do this test when the expected frequencies are unequal

Simple Linear Regression Analysis

Advertisements

Simple linear regression analysis is a technique that is used to model the dependency of one dependent variable upon one independent variable. This relationship between these two variables is explained by an equation.

When regression is employed normally the data points are graphed on a scatterplot. Next, the computer draws what is called the “best-fitting” line. The line is the best fit because it reduces the amount of error between actual values and predicted values in the model. The official name of the model is the least square model in that it is the model with the least amount of error. As such, it is the best model for predicting future values

It is important to remember that one of the great enemies of statistics is explaining error or residual. In general, any particular data point that is not the mean is said to have some error in it. For example, if the average is 5 and one of the data points is three 5 -3 = 2 or an error of 2. Statistics often want to explain this error. What is causing this variation from the mean is a common question.

There are two ways that simple regression deals with error

  1. The error cannot be explained. This is known as unexplained variation.
  2. The error can be explained. This is known as explained variation.

When these two values are added together you get the total variation which is also known as the “sum of squares for error.”

Another important term to be familiar with is the standard error of estimate. The standard error of estimate is a measurement of the standard deviation of the observed dependent variables values from predicted values of the dependent variable. Remember that there is always a slight difference between observed and predicted values and the model wants to explain as much of this as possible.

In general, the smaller the standard error the better because this indicates that there is not much difference between observed data points and predicted data points. In other words, the model fits the data very well.

Another name for the explained variation is the coefficient of determination. The coefficient of determination is the amount of variation that is explained by the regression line and the independent variable. Another name for this value is the r². The coefficient of determination is standardized to have a value between 0 to 1 or 0% to 100%.

The higher your r² the better your model is at explaining the dependent variable. However, there are a lot of qualifiers to this statement that goes beyond this post.

Here are the assumptions of simple regression

  • Linearity–The mean of each error is zero
  • Independence of error terms–The errors are independent of each other
  • Normality of error terms–The error of each variable is normally distributed
  • Homoscedasticity–The variance of the error for the value of each variable is the same

There are many ways to check all of this in SPSS which is beyond this post.

Below is an example of simple regression using data from a previous post

You want to know how strong is the relationship of the exam grade on the number of words in the students’ essay. The data is below

Student         Grade        Words on Essay
1                             79                           147
2                             76                           143
3                             78                           147
4                             84                           168
5                             90                           206
6                             83                           155
7                             93                           192
8                             94                           211
9                             97                           209
10                          85                           187
11                          88                           200
12                          82                           150

Step 1: Find the Slope (The computer does this for you)
slope = 3.74

Step 2: Find the mean of X (exam grade) and Y (words on the essay) (Computer does this for you)
X (Exam grade) = 85.75        Y (Words on Essay) = 176.25

Step 3: Compute the intercept of the simple linear regression (computer does this)
-145.27

Step 4: Create linear regression equation (you do this)
Y (words on essay) = 3.74*(exam grade) – 145.27
NOTE: you can use this equation to predict the number of words on the essay if you know the exam grade or to predict the exam grade if you know how many words they wrote in the essay. It is simple algebra.

Step 5: Calculate Coefficient of Determination r² (computer does this for you)
r² = 0.85
The coefficient of determination explains 85% of the variation in the number of words on the essay. In other words, exam grades strongly predict how many words a student will write in their essay.

Spearman Rank Correlation

Advertisements

Spearman rank correlation aka ρ is used to measure the strength of the relationship between two variables. You may be already wondering what is the difference between Spearman rank correlation and Person product moment correlation. The difference is that Spearman rank correlation is a non-parametric test while Person product moment correlation is a parametric test.

A non-parametric test does not have to comply with the assumptions of parametric test such as the data being normally distributed. This allows a researcher to still make inferences from data that may not have normality. In addition, non-parametric test are used for data that is at the ordinal or nominal level. In many ways, Spearman correlation and Pearson product moment correlation compliment each other. One is used in non-parametric statistics and the other for parametric statistics and each analyzes the relationship between variables.

If you get suspicious results from your Pearson product moment correlation analysis or your data lacks normality Spearman rank correlation may be useful for you if you still want to determine if there is a relationship between the variables. Spearmen correlation works by ranking the data within each variable. Next, the Pearson product moment correlation is calculated between the two sets of rank variables. Below are the assumptions of Spearman correlation test.

  • Subjects are randomly selected
  • Observations are at the ordinal level at least

Below are the steps of Spearman correlation

  1. Setup the hypotheses
    1. H0: There is no correlation between the variables
    2. H1: There is a correlation between the variables
  2. Set the level of significance
  3. Calculate the degrees of freedom and find the t-critical value (computer does this for you)
  4. Calculate the value of Spearman correlation or ρ (computer does this for you)
  5. Calculate the t-value(computer does this for you) and make a statistical decision
  6. State conclusion

Here is an example

A clerk wants to see if there is a correlation between the overall grade students get on an exam and  the number of words they wrote for their essay. Below are the results

Student         Grade        Words on Essay
1                             79                           147
2                             76                           143
3                             78                           147
4                             84                           168
5                             90                           206
6                             83                           155
7                             93                           192
8                             94                           211
9                             97                           209
10                           85                           187
11                           88                           200
12                           82                           150

Note: The computer will rank the data of each variable with a rank of 1 being the highest value of a variable and a rank 12 being the lowest value of a variable. Remember that the computer does this for you.

Step 1: State hypotheses
H0: There is no relationship between grades and words on the essay
H1: There is a relationship between grades and words on the essay

Step 2: Determine level of significance
Level set to 0.05

Step 3: Determine critical t-value
t = + 2.228 (computer does this for you)

Step 4: Compute Spearman correlation
ρ = 0.97 (computer does this for you)
Note: This correlation is very strong. Remember the strongest relationship possible is + 1

Step 5: Calculate t-value and make a decision
t = 12.62   ( the computer does this for you)
Since the computed t-value of 12.62 is greater than the t-critical value of 2.228 we reject the null hypothesis

Step 6: Conclusion
Since the null hypotheses are rejected, we can conclude that there is evidence that there is a strong relationship between exam grade and the number of words written on an essay. This means that a teacher could tell students they should write longer essays if they want a higher grade on exams

Correlation

Advertisements

A correlation is a statistical method used to determine if a relationship exists between variables.  If there is a relationship between the variables it indicates a departure from independence. In other words, the higher the correlation the stronger the relationship and thus the more the variables have in common at least on the surface.

There are four common types of relationships between variables there are the following

  1. positive-Both variables increase or decrease in value
  2. Negative- One variable decreases in value while another increases.
  3. Non-linear-Both variables move together for a time then one decreases while the other continues to increase
  4. Zero-No relationship

The most common way to measure the correlation between variables is the Pearson product-moment correlation aka correlation coefficient aka r.  Correlations are usually measured on a standardized scale that ranges from -1 to +1. The value of the number, whether positive or negative, indicates the strength of the relationship.

The Person Product Moment Correlation test confirms if the r is statistically significant or if such a relationship would exist in the population and not just the sample. Below are the assumptions

  • Subjects are randomly selected
  • Both populations are normally distributed

Here is the process for finding the r.

  1. Determine hypotheses
    • H0: = 0 (There is no relationship between the variables in the population)
    • H0: r ≠ 0 (There is a relationship between the variables in the population)
  2. Decided what the level of significance will be
  3. Calculate degrees of freedom to determine the t critical value (computer does this)
  4. Calculate Pearson’s (computer does this)
  5. Calculate t value (computer does this)
  6. State conclusion.

Below is an example

A clerk wants to see if there is a correlation between the overall grade students get on an exam and the number of words they wrote for their essay. Below are the results

Student         Grade        Words on Essay
1                             79                           147
2                             76                           143
3                             78                           147
4                             84                           168
5                             90                           206
6                             83                           155
7                             93                           192
8                             94                           211
9                             97                           209
10                          85                           187
11                          88                           200
12                          82                           150

Step 1: State Hypotheses
H0: There is no relationship between grade and the number of words on the essay
H1: There is a relationship between grade and the number of words on the essay

Step 2: Level of significance
Set to 0.05

Step 3: Determine degrees of freedom and t critical value
t-critical = + 2.228 (This info is found in a chart in the back of most stat books)

Step 4: Compute r
r = 0.93                       (calculated by the computer)

Step 5: Decision rule. Calculate t-value for the r

t-value for r = 8.00  (Computer found this)

Since the computed t-value of 8.00 is greater than the t-critical value of 2.228 we reject the null hypothesis.

Step 6: Conclusion
Since the null hypothesis was rejected, we conclude that there is evidence that a strong relationship between the overall grade on the exam and the number of words written for the essay. To make this practical, the teacher could tell the students to write longer essays if they want a better score on the test.

IMPORTANT NOTE

When a null hypothesis is rejected there are several possible relationships between the variables.

  • Direct cause and effect
  • The relationship between X and Y may be due to the influence of a third variable not in the model
  • This could be a chance relationship. For example, foot size and vocabulary. Older people have bigger feet and also a larger vocabulary. Thus it is a nonsense relationship

Two-Way Analysis of Variance

Advertisements

Two-way analysis of variance is used when we want to know the following pieces of information.
• The means of the blocks or subpopulations
• The means of the treatment groups
• The means of the interaction of the subpopulation and treatment groups

Now you are probably confused but remember that two-way analysis of variance is an extension of randomized block designed. With randomized block design, there were two hypotheses one for the treatment groups and one for the blocks or subpopulations. What we are doing for two-analysis is assessing the interaction effect, which is the amount of the variation of
subpopulation and treatment group). The assessment of the interaction effect gives us the third hypothesis. To put it in simple words when both the subpopulation and the treatment are present combined they have some sort of influence just as they do when one or the other is present. Therefore, two-way analysis of variance is randomized block designed plus an interaction effect hypothesis.

Another important difference is the use of repeated measures. In a two-way analysis of variance, at least one of the groups received the treatment more than once. In a randomized block design, each group receives the treatment only one time. Your research questions determine if any group needs to experience the treatment more than once.

Below are the assumptions
• Sample randomly selected
• Populations have homogeneous standard deviations
• Population distributions are normal
• Population covariances are equal.

Here are the steps
1. Set up hypotheses (there will be three of them)
a.Treatment means (AKA factor A)
i. H0: There is no difference in the treatment means
ii. H1: H0 is false
b. Block means (AKA factor B)
i. H0: There is no difference in the block means
ii. H1: is false
c. Interaction between Factor A and B
i. H0: There is no interacting effect between factor A & B
ii. H1: There is an interacting effect between factor A & B
2. Determine your level of statistical significance
3. Determine F critical (there will be three now and the computer does this)
4. Calculate the F-test values (there will be three now and the computer does this)
5. Test hypotheses
6. State conclusion

Here is an example
A music teacher wants to study the effect of instrument type and service center on the repair time measured in minutes. Four instruments (sax, trumpet, clarinet, flute) were picked for the analysis. Each service center was assigned to perform the particular repair on two instruments in each category

Instrument
Service centers Sax Trumpet Clarinet Flute
1                        60      50          58         60
70      56          62         64
2                        50      53          48         54
54      57          64         46
3                        62      54          46         51
64      66          52         49

Here are your research questions
• Is there a difference in the means of the repair time between service centers?
• Is there a difference in the means of the repair time between instrument type?
• Is there an interaction due to service center and type of instrument on the mean of the repair time
Let us go through each of our steps
Step 1: State the hypotheses
• Treatment means (AKA factor A)
a. H0: There is no difference in the means of the service centers
b. H1: H0 is false
• Block means (AKA factor B)
a. H0: There is no difference in means of the instrument types
b. H1: is false
• Interaction between Factor A and B
a. H0: There is no interacting effect between service center and instrument type
b. H1: There is an interacting effect between service center and instrument type

Step 2: Significance level
• Set at 0.1

Step 3: Determine F-Critical
For the instruments, F-critical is 2.81
For the service centers, F-critical is 2.61
For the interaction effect, F-critical is 2.33

Step 4: Calculate F-values
Service centers 3.2
Instrument type 1.4
Interaction 2.1

Step 5: Make decision
Since the F-value of 3.2 is greater than the F-critical of 2.8 we reject the null hypothesis for the service centers

Since the F-value of 1.4 is less than the F-critical of 2.61 we do not reject the null hypothesis for the instrument types

Since the F-value of 2.1 is less than the F-critical of 2.3 we do reject the null hypothesis for the interaction effect of service center and instrument type.

Step 6: Conclusion
Since we reject the null hypothesis that there is no difference in the means of the repair time of the service centers, we conclude that there is evidence of a difference in the repair times between service centers. This means that one service center is faster than the others are. To find out, do a posthoc test.

Since we do not reject the null hypothesis that there is no difference in the means of the repair time of the instrument types, we conclude that there is no evidence of a difference in the repair time between instrument types. In other words, it does not matter what type of instrument is being fixed as they will all take about the same amount of time.

Since we do not reject the null hypothesis that there is no interaction effect of service center and instrument type on the mean of the repair time, we conclude that there is no evidence of an interaction effect of service center and instrument type on repair time. In other words, if service center and instrument type are considered at the same time there is no difference in how fast the instruments are repaired.

Analysis of Variance: Randomized Block Design

Advertisements

Randomized blocked design is used when a researcher wants to compare treatment means. What is unique to this research design is that the experiment is divided into two or more mini-experiments.

The reason behind this is to reduce the variation within-treatments so that it is easier to find differences between means.  Another unique characteristic of randomized block design is that since there is more than one experiment happening at the same time, there will be more than one set of hypotheses to consider. There will be a set of hypotheses for the treatment groups and also for the block groups. The block groups are the several subpopulations with the sample. Below are the assumptions

  • Samples are randomly selected
  • Populations are homogeneous
  • Populations are normally distributed
  • Populations covariances are equal
    •  Covariance is a measure of the commonality that two variables deviate from their expected values. If two variable deviates in similar ways the covariance will be high and vice versa. The standardized version of covariance is correlation.

Looking at equations and doing this by hand is tough. It is better to use SPSS or excel to calculate results. We are going to look at an example and see an application of randomized block design.

A professor wants to see if “time of day” affects his students score on a quiz. He randomly divides his stat class into five groups and has them take the quiz at one of four times during the day.  Below are the results
Time Period/Treatment
Section    8-9                10-11                11-12                1-2
1                  25                      22                        20                     25
2                  28                      24                        29                     23
3                  30                      25                        25                     27
4                  24                      27                        28                     25
5                  21                      28                        30                     24

The treatment groups here are the time periods. The are along the time and are 8-9, 10-11, 11-12, 1-2. The block groups are along the left-hand side and the are section 1, 2, 3, 4, 5. The block groups are the 5 different experimental groups of the larger population of the statistics class. What is happening here is that all members from all groups all took the quiz at one of the four times. For example, members from section one took the quiz at 8-9, 10-11, 11-12, and 1-2. The same for group 2 and so forth.  By having five different groups take the quiz at each of the time periods it should hopefully improve the accuracy of the results. It is like sampling a population five times instead of one time.

In addition, by having four different time periods, we can hopefully see much more clearly if the time period makes a difference. We have four different time periods instead of two or three. Below are the steps for solving this problem.

Step 1: State hypotheses
For Time periods
Null hypothesis: There is no difference in the means between time periods
Alternative hypothesis: There is a difference in the means between time periods
For Blocks
Null hypothesis: There is no difference in the means among the sections of students
Alternative hypothesis: There is difference in the means among the sections of students

Step 2: Significance level
are alpha is set to .05

Step 3: Critical value of F
This is done by the computer and it indicates that the F critical for the treatment (time periods) is 3.49 and the F critical for the blocks (section of students) is 3.26. There are two F criticals because there are two sets of hypotheses, one for the time periods and one for the students.

Step 4: Calculate
The computed F-value for treatment (time periods) is 0.25
The computed F-value for the blocks (section of students) is 0.89

Step 5: Decision
Since the F-value of the treatment (time periods) is 0.25 is less than F critical of 3.49 at an alpha of .05 we do not reject the null hypothesis

Since the F-value of the blocks (section of students) is 0.89 is less than F critical of 3.26 at an alpha of .05 we do not reject the null hypothesis

Step 6: Conclusion
Treatment (Time period)
Since we did not reject the null hypothesis, we can conclude that there is no evidence that time of day affects the quiz scores.

Blocks (Section of Student)
Since we did not reject the null hypothesis, we can conclude that there is no evidence that group affects the quiz scores.

From this, we know that time of day and the group a student belongs to does not matter. If the time of day mattered it might have been due to a host of factors such as early morning or late afternoon. For the groups, the difference could be identified by how they did on individual items. Maybe they struggled with finding the means of question 3.

Remember in this example there was no difference. The ideas above are for determining why there was a difference if that had happened.

One-Way Analysis of Variance (ANOVA)

Advertisements

Analysis of variance is a statistical technique that is used to determine if there is a difference in two or more sample populations.  Z-test and t-tests are used when comparing one sample population to a known value or two sample populations to each other. When two or more sample populations are involved it is necessary to use analysis of variance.  The simple rule is 3 or more use analysis of variance

Analysis of variance is too complicated to do by hand, even though it is possible. It takes a great deal of time and one error will ruin the answer. Therefore, we are not going to look at equations during this example. Instead, we will focus on the hypotheses and practical applications of analysis of variance. To calculate analysis of variance results you can use SPSS or Microsoft excel.

There are several types of analysis of variance. We are going to first look at one-way analysis of variance.

Here are the assumptions for one-way analysis of variance

  • Samples are randomly selected
  • Samples are independently assigned
  • Samples are homogeneous
  • Sample is normally distributed

One-way analysis of variance is used when 2 or more groups receive the same treatment or intervention. The treatment is the independent variable while the means of each group is the dependent variable. This is because as the researcher, you control the treatment but you do not control the resulting mean that is recorded. One-way analysis of variance is often used in performing experiments.

Let’s look at an example. You want to know if there is any difference in the average life of four different breeds of dogs. You take a random sample of five dogs from four different breeds. Below are the results

Terrier    Retriever   Hound   Bulldog
12                 11                   12            12
13                 10                   11             15
14                 13                   15             10
11                 15                   15             12
15                 14                   16             11

In this example, the independent variable is the breed of dog. This is because you control this. You can select whatever dog breed you want. The dependent variable is the average length of the dog’s lives. You have no control over how long they live. You are trying to see if  dog breed influences how long the dog will live

Here are the hypotheses

Null hypotheses: There is no difference in the average length of a dog’s life because of breed

Alternative hypotheses: There is a difference in the average length of a dog’s life because of breed

The significance level is 0.05  are F critical is 3.24

After running the results in the computer we get an F-value of 0.76. This means we do not reject are null hypotheses.  This means that there is no difference in the average life of the dog breeds in this study.

One-way analysis is used when we have one treatment and three or more groups that experience the treatment. This statistical tool is useful for research designs that call on the need for experiments.

Testing the Difference Between two Means: Paired Samples

Advertisements

The paired sample t-test is used to compare two sample populations that are correlated. This test is most commonly employed for “before and after” or pretest-postest design. Below are the assumptions of paired sample t-test.

  • Only the matched pairs are used to perform the analysis
  • The data is normally distributed
  • The variances of the two samples are homogeneous
  • The observations are independent of each other

Below are the steps involved in conducting a paired sample t-test

  1. Set up the hypotheses
    • H0: The mean of the paired samples are the same or (μ1 = μ2)
    • H1: The means of the paired samples are not equal or
      (μ1 ≠ μ2, μ1 > μ2, μ1 < μ2)
  2. Determine the level of statistical significance (.1, .05, or .01) and if it is two-tailed or one-tailed
    • Two-tailed means there are two choices. One mean can be greater or lesser than the other.
    • One-tailed means there is one choice. One of the means is greater or it lesser but not vice versa.
  3. You also must take into account the degree of freedom which is sample size – 1 this information is useful when looking at the t-test chart to calculate the t critical value
  4. Calculate the paired t-test. The formulas are below. Take note that there are three separate formulas labeled A, B, and CFORMULA A   t computed = Mean difference
                                                                     standard deviation of the mean /                                                                               square root of the  sample sizeFORMULA B    mean difference = sum of the difference
                                                                                  sample size

FORMULA C    standard deviation of the difference =
√ΣD² – (ΣD)²
                 n     
n – 1
Sorry there is no simple way to explain formula C

4. Make Statistical decision

5. State conclusion

Below is an example

A teacher develops an incentive plan for his students. Students who were quiet got additional stickers in their notebook. The teacher picked 10 students at random to see if the number of stickers they earned was more after the incentive program was adopted. Here are the results

Student               Before             After
1                               20                        35
2                               30                        41
3                                25                       38
4                                31                       42
5                                19                       18
6                                18                       16
7                                23                       34
8                                32                       19
9                                24                       24
10                             19                       33

Step 1 State the hypotheses
H0: μ1 < 0 or the number of stickers after the incentive plan are not more than before
H2: μ1 > 0 or the number of stickers is greater after the incentive plan

Step 2: The level of statistical significance is .05. This is also a one-tailed test

Step 3: Calculate the critical region which is
degrees of freedom = sample size – 1 = tcritical
df = 10 – 1= 9 and the tcritical is 1.83 according to the table

Step 4: Compute t computed for paired samples

Student    Before     After      Difference       Difference²
1                       20                35           15                             225
2                       30                41           11                             121
3                       25                38           13                             169
4                       31                42            11                             121
5                       19                18            -1                                    1
6                       18                16            -2                                    4
7                        23                34           11                              121
8                        32                19         -13                              169
9                        24                24             0                                     0
10                     19                33           14                               196
Sum of difference     59 Sum of the difference² 1127

Find the mean of the difference
59 / 10 = 5.9

Find the standard deviation of the differences (the entire equation below is squared)

1127 – (5.9)²
                10      
10-1

the standard deviation is 11.17

Find the t computed

            59              = 16.69
11.17 / √10

Step 5: Decision

Since the t computed 16.69 is greater than the t critical of 1.83 we reject the null hypothesis

Step 6 Conclusion

Since we reject the null we can conclude that there is evidence that the incentive program has increased the number of stickers the students earn.

Hypothesis Testing for Two Means: Large Independent Samples

Advertisements

Hypothesis testing for two large samples examines again if there is a difference between the two means. We infer that there is a difference between the population means by seeing if there is a difference between the sample means. The assumptions for testing for the difference between two means are below.

  • Subjects are randomly selected and independently assigned to groups
  • Population is normally distributed
  • Sample size is greater than 30

The hypotheses can be stated as follows

  • Null hypothesis: There is no difference between the population means of the two groups
    • The technical way to say this is…  H0: μ1 = μ2
  • Alternative hypothesis: There is a difference between the population means of the two groups. One is greater or smaller than the other
    • The technical way to say this is… H1: μ1≠ μ2 or μ1> μ2 or         μ1< μ2

The process for conducting a z test for independent samples is provided below

  1. Develop your hypotheses
  2. Determine the level of significance (normally .1, .05, or .01)
  3. Decide if it is a one-tail or two tail test.
  4. Determine the critical value of z. This is found in chart in the back of most stat books common values include +1.64, +1.96, or +2.32
  5. Calculate the means and standard deviations of the two samples.
  6. Calculate the test for the two independent samples. Below is the formula

z = (sample mean 1 – sample  mean 2)

√[(variance of sample 1 squared/ sample population 1) +
(variance  of sample 2 squared/ sample population 2)]

7. If the computed z is less than the critical z then you do not reject your null hypothesis. This means there is no difference between the means. If the computed z is greater than the critical z then you reject the null hypothesis and this indicates that there is evidence that there is a difference.

Below is an example

A business man is comparing the price of buildings in two different provinces to see if there is a difference. Below are the results. Determine if the buildings in Bangkok cost more than the buildings in Saraburi.

Bangkok                                   Saraburi
average price     2,140,000                                1,970,000
variance                 226,000                                     243,000
sample size           47                                                  45

Now let us go through the steps

  1. Develop your hypotheses
    • Null hypothesis: There is no difference between the average price of buildings in Bangkok and Saraburi
      • In stat language, it would be
      • H0: μ1 ≠ μ2
    • Alternative hypothesis: The  average price of buildings in Bangkok is higher than in  Saraburi
      • In stat language, it would be
      • H1: μ1 > μ2
  2. Determine the level of significance (normally .1, .05, or .01)
    • We will select .05
  3. Decide if it is a one-tail or two tail test.
    • This is a one-tail test. We want to know if one mean is greater than another. Therefore, to reject the null we need a z computed that is positive and larger than our z critical.
  4. Determine the critical value of z. This is found in chart in the back of most stat books common values include +1.64, +1.96, or +2.32 when it is a two tailed test
    • Our z critical is + 1.64  since this is a one-tail test we only have one value so we do not split the probable and place have on one side and half on the other side. If this were two-tailed we would have -1.96 and +1.96 which indicates that the difference is greater or less
  5. Calculate the means and standard deviations of the two samples.
    • Already done in the table above
  6. Calculate the test for the two independent samples. Below is the formula.

(2,140,000 – 1,970,000)
√[((226,000)²)/47) + ((243,000)²)/45)]
our final answer for are z computed is 3.47

Since 3.47 is greater than our z critical of +1.64 we reject the null hypothesis and state that there is evidence that building prices are higher in Bangkok than in Saraburi.

What is a One Sample z Test?

Advertisements

There are actually several different situations in which a researcher can use hypothesis testing. The first instance we will look at is the one sample z test. The one sample z test has the following assumptions that need to be met before employing it.

  • Sample size > 30
  • Subjects are randomly selected
  • Population is normally distributed
  • Cases within the sample are independent
  • One sample was taken

If your data collection meets the above assumptions one sample z test may be appropriate.

With the one sample z test, you are comparing your results to a known expected value. For example, if someone states that the average salaries for teachers are $63,000.00 you can assess this by collecting data from teachers to compare it to this known value. You collect some data and you find that the average salary for 35 teachers was $65,7000.00. The questions you have is who is right? Do teachers really make on average $63,000.00 like the report or do they make $65,700.00 as my data says? Before going further let us establish are hypotheses for this example.

  • Null hypothesis: the average salaries for my sample of teacher salaries will be the same as the average salary’s of the reported value of $63,000.00
    1. The mathematical shorthand for this is H0: μ = 63,000.00
  • Alternative hypothesis: the average salaries for my sample of teacher salaries will be different (greater or lesser) than the average salary’s of the reported value of $63,000.00
    1. The mathematical shorthand for this is H1: μ ≠ 63,000.00

Keep in mind that this is a two-tail level of significance. This is because our final value has the option of being either greater or lesser than $63,000.00. Two-tail means two options, greater or lesser than the expected value while one-tail means only one option either we expect greater or we expected lesser but not both. This is why we will have two z critical values to think about in the near future.

We also need two more pieces of information before we put our numbers into the equation. The two items we need to know are the standard deviation of the sample and the level of statistical significance. For the sample data, we collected we will say the standard deviation is $5,250.00 and the level of statistical significance is α = 0.01. When we convert this alpha value to the z critical value we get 2.32 and -2.32 because we are using a two-tail or two option approach. Do not get distracted by the z critical value it is the same as the alpha value but translated for the numbers set to the normal distribution. It is similar to switching from one language to another, same meaning but different language.

If our final value is greater than 2.32 or less than -2.32 we will reject the null hypothesis that average teacher salaries are $63,000.00. Now we can take a look at the equation

z critical value = sample data – expected value                                                                                           Sample standard deviation / square root of the                                                    number of those in the sample population

In simple English

z critical value  = 65,700 – 63,000                                                                                                                         5,250 / square root of 35

Z critical value = 3.04

Our answer is 3.04, which is greater than +2.32. This indicates that we can reject the null hypothesis that the average salary teachers are $63,000.00 as our data indicate that there is evidence that teachers make more on average.

We don’t want to get too excited here. We found evidence that teachers make more but further testing would be needed to validate these claims. As more data confirms our findings we can confidently state that teachers make more.

I would like to thank andydevil12 for the question and suggestion. If there are any other questions please send them to me as they help me to understand research and statistics much better as well.

What is Hypothesis Testing?

Advertisements

Hypothesis testing is a statistical approach used in making decisions about data.  In hypothesis testing, there are two hypotheses that are posed by the researcher and they are…

  1. Null hypothesis-There is no difference between the sample population and the statistical population in relation to the mean or some other parameter that is being assessed
  2. Alternative hypothesis-There is a difference between the sample population and the statistical population in relation to the mean or some other parameter that is being assessed

Generally, researchers often hope to reject the null hypothesis which indicates that the alternative hypothesis is correct.  However, strictly speaking, a researcher never accepts any hypothesis. Instead, you reject or you do not reject the null hypothesis. This is because further testing will always be needed to confirm the results.

How to know whether to reject or not reject the null depends on the results of the analysis. A researcher needs to select a level of statistical significance which is usually 1%, 5%, or 10%. The significance level changes the size of the rejection region at the tails of the normal distribution. The lower the significance level the smaller the rejection region which influences the interpretation of the results.  To reject a null hypothesis, the results of the analysis must fall within the rejection region.

After determining the level of significance a researcher analyzes the data to determine the results. The results then need to be interpreted by stating them in simple English.  From this, the researcher can develop a conclusion about what the results mean.

Methods of Collecting Data

Advertisements

After developing a research question, it is necessary to determine how data collection will take place. There are several different ways and the list below is a partial list of various methods

  • Interview method-Data is collected through a face to face encounter. This is a method associated with qualitative research
  • Questionnaire method-The respondents complete a survey or some other instrument. This method can be for either qualitative or quantitative
  • Observation method-Watching for particular behaviors as they are exhibited by an individual or group. This can be for either qualitative or quantitative as well.
  • Experiment method-This method establishes cause and effect between of variable(s) in controlled conditions. This is a quantitative method.

 

Sampling Part II

Advertisements

Random sampling has already been discussed. This post deals with non-random sampling which is a sample that is selected in a deliberate manner.  Below are a few of the more common forms of non-random sampling

  1. Convenience sampling is the selection of individuals who are available for the study.  Whoever is free and willing is a part of the study.
  2. Purposive sampling is the inclusion or participants based on a criteria developed by the researcher. For example, a researcher wants to only include middle age male teachers in their study. Individuals who meet this criteria will be asked to be a part of the study.
  3. Quota sampling is used when a researcher collect data from a certain number of people from several sample units or sub-populations who meet certain criteria. For example, at a university, selecting students from several different departments such as English and Education to be a part of the study. Whoever is from either department may be a part of the study
  4. Snowball sampling is a technique in which the researcher locates one member of the sample population and collects data from them. The participant then recommends other people the researcher can collect data from. An example, would be a detective interviewing various people about a crime. One witness suggest someone else the detective should talk to. This form of sampling is common in qualitative research.

Sampling is normally influenced by circumstances. Random is often preferred to non-random both the context often dictates that a researcher do the best they can and choose a technique that is appropriate for the situation.

Sampling

Advertisements

There are a plethora of sampling approaches in research. Below is a partial list of the more common approaches.  Please remember that the population is the group you are studying. The sample is a smaller portion of the population. Often it is not practical to collect data from an entire population. Therefore, researchers collect data from a sample and make inferences about the population based on the sample.

The sampling approaches below are all forms of random sampling, which is a process in which any member of the population has an equal likelihood of being selected as part of the sample. Non-random sampling will be dealt with in a future post.

  1. Simple random sampling. The sample is derived via random numbers or lottery from the population
  2. Systematic sampling-The selection of every kth element in the population. For example, selecting every fifth student at a school
  3. Stratified sampling. Subdividing the population into subgroups and taking member at random from each subgroup. This helps to replicate the proportions of the population in the sample. For example, if a school is 75% men and 25% women these same proportions need to exist in the sample population
  4. Cluster sampling. Random selecting clusters from a population that is spread over a large geographical area.  For example, subdividing a country into provinces and then randomly selecting some of the provinces to participate in the study

The sampling approach you used is determined by the purpose of the research, finances, and practicality.

Levels of Measurement Part II

Advertisements

There are four levels of measurement used in statistics. They are nominal, ordinal, interval, and ratio. This post focuses on the last two levels of measurement of interval and ratio.

Interval level of measurement is used to classify and differentiate between categories based on how different they are.  The difference is determined by amount and direction.  The difference can also be discrete (finite difference) or continuous (infinite amount of difference). An example of an interval level is temperature. Temperature indicates the difference in hot and cold, you can tell the direction whether it is increasing or decreasing, and it is continuous in that there are an infinite number of potential temperatures.

Ratio level of measurement is the same as interval with the only difference being that it has an absolute zero. One example is weight, it has all the characteristics of a continuous interval variable (there is direction, amount, and infinite amount of difference). The only difference is that nothing can have a negative weight. The temperature, on the other hand, can go negative (for the sake of illustration please ignore that temperature has an absolute zero).

Levels of Measurement

Advertisements

A variable can be measured several different ways. This variety in variable measurement is broken down into four levels. These levels are nominal, ordinal, interval, and ratio. In this blog, I will talk about nominal and ordinal and I will address interval and ratio in the next post.

Nominal data is data that is broken into separate and discrete categories. The categories are mutually exclusive which means that no data can be in more than one category. Nominal data is also exhaustive in that all the data must go into one of the categories.  This is one of the weakest forms of measurement because differences within the category cannot be accounted for because all data is forced to conform to a category. Examples of nominal measurement would be gender because everyone who responds must be placed in one category or the other and there is no way for someone to be half male half female when using nominal classifications.

Ordinal measurement is used for ranking data.  At this level, data is still nominal but the order matters. An example would be class standing which is freshman, sophomore, junior,  and senior. The data is nominal in that there are categories but the order matters as a senior is a higher level in comparison to a freshman. There is still no attempt to differentiate within categories which weakens this level of measurement.

What level of measurement to use is dependent on what your research questions are. Research is guided by the question you ask.

Classification of Variables

Advertisements

In addition to the types of variables, there also several ways to classify variables. Two ways to classify variables is experimental and mathematical.

Experimental classification is used to classify variables by the function they serve in the experiment. In experimental research, we have independent and dependent variables.  Independent variables are variables that are controlled by the researcher and are believed to have an effect on the dependent variable. Dependent variables are affected by the independent variables.

For example, let’s say we want to see how sleep affects GPA. We would manipulate the amount of sleep a person gets, which is the independent variable to see how their GPA changes as GPA is the dependent variable influenced by sleep.

The second type of classification is mathematical. A continuous variable is can assume an infinite number of values. An example would be weight or height.

A discrete variable consists of a finite number of values. Examples include gender and the number of computers. You can’t be half a gender you are a man or woman.

What type of variable to use depends again on the research questions of the study.

Types of Data

Advertisements

There are two basic types of data and they are qualitative and quantitative. Qualitative data is data that is often put into categories not based on numbers but often some other form of commonality. For example, if a person conduct interviews about student satisfaction, certain concepts, such as good teaching, may be repeated several times by different students. These statements are combined into one category of student satisfaction, which would be good teaching. There is no continuum of data in qualitative it is strictly the development of categories based on a criteria developed by the researcher.

Quantitative data is numerical data that is often based on a continuum. Example of quantitative data is such things as height, weight, and age.  You can treat quantitative data like qualitative by developing categories but this is a discussion for the future.

When to collect qualitative and quantitative data depends on the research questions of the researcher. Neither is superior to the other and it is the context that determines what is best.

 

Sources of Data

Advertisements

In research, there are two major forms of data, primary and secondary. Primary data is data that comes directly from a source by the researcher usually. Secondary data is data that was collected previously by someone else.

Examples of primary data includes interviews, surveys, and experimentation. Examples of Secondary data includes the use of others data results, government reports, and major studies.

Deciding what type of data to use depends on the research questions of the study. In many ways, it is the questions you are trying to answer that influences where and what data to use

 

Population vs Sample

Advertisements

In statistics, one of the most fundamental concepts is the population and sample. A population is all the member from a group. For example, if my population is the United States, I would have to collect data from everyone in the country. This is to say the least, very challenging.

To deal with this, must studies take a sample from the population. A sample is a portion of the population. Continuing our example, instead of collecting data from every in the US I would collect data from several hundred or thousand depending on the research question of my study.

There are several different techniques to sampling that will be covered later.  For now, the most important thing to remember is that your research questions and circumstances of the study influence what steps you take. There is rarely one way to do this.