# Calculating Chi-Square in R

There are times when conducting research that you want to know if there is a difference in categorical data . For example, is there a difference in the number of men who have blue eyes and who have brown eyes. Or is there a relationship between gender and hair color. In other words, is there a difference in the count of a particular characterisitic or is there a relationship between two or more categorical variables.

In statistics, the chi-square test is used to compare categorical data. In this post, we will look at how you can use the chi-square test in R.

For our example, we are going to use data that is already available in R called “HairEyeColor”. Below is the data

```> HairEyeColor
, , Sex = Male

Eye
Hair    Brown Blue Hazel Green
Black    32   11    10     3
Brown    53   50    25    15
Red      10   10     7     7
Blond     3   30     5     8

, , Sex = Female

Eye
Hair    Brown Blue Hazel Green
Black    36    9     5     2
Brown    66   34    29    14
Red      16    7     7     7
Blond     4   64     5     8
```

As you can see, the data comes in the form of a list and shows hair and eye color for men and women in separate tables. The current data is unusable for us in terms of calculating differences. However, by using the ‘marign.table’ function we can make the data useable as shown in the example below.

```> HairEyeNew<- margin.table(HairEyeColor, margin = c(1,2))
> HairEyeNew
Eye
Hair    Brown Blue Hazel Green
Black    68   20    15     5
Brown   119   84    54    29
Red      26   17    14    14
Blond     7   94    10    16
```

Here is what we did. We created the variable ‘HairEyeNew’ and we stored the information from ‘HairEyeColor’ into one table using the ‘margin.table’ function. The margin was set 1,2 for the table.

Now all are data from the list is combined into one table.

We now want to see if there is a particular relationship among hair and eye color that is more common. To do this, we calculate the chi-square statistic as in the example below.

```> chisq.test(HairEyeNew)

Pearson's Chi-squared test

data:  HairEyeNew
X-squared = 138.29, df = 9, p-value < 2.2e-16```

The test tells us that one or more of the relationships are more common than others within the table. To determine which relationship between hair and eye color is more common than the rest we will calculate the proportions for the table as seen below.

```> HairEyeNew/sum(HairEyeNew)
Eye
Hair          Brown        Blue       Hazel       Green
Black 0.114864865 0.033783784 0.025337838 0.008445946
Brown 0.201013514 0.141891892 0.091216216 0.048986486
Red   0.043918919 0.028716216 0.023648649 0.023648649
Blond 0.011824324 0.158783784 0.016891892 0.027027027```

As you can see from the table, brown hair and brown eyes are the most common (0.20 or 20%) flowed by blond hair and blue eyes (0.15 or 15%).

Conclusion

The chi-square serves to determine differences among categorical data. This tool is useful for calculating the potential relationships among non-continuous variables