Factors are used in R for data that is categorical. Categorical data is date that does not normally involve numbers but rather descriptions. For example, people can be right or left handed, gender is often defined as male or female. Each of these descriptions are categories with a variable.
To make a factor in R you need to use the “factor” function. Within the “factor” function there are three important arguments which are explained below
- x–This is the place where the name of the variable containing the vector is placed
- levels-This is another optional vector of values that x might have taken. If this is confusing it should be.
- labels-This argument allows you to rename your levels within the factor if you want
In the example below we are going to make a factor that contains several different car manufacturers.
> car.makers <- c("Ford", "Isuzu", "Honda", "Toyota") > factor(car.makers)  Ford Isuzu Honda Toyota Levels: Ford Honda Isuzu Toyota
This is what happened
- We created the variable ‘car.makers’ and stored the values or names of car makers in the variable using a vector
- We used the “factor” function on the variable ‘car.makers’.
- R prints the the values in the factor as well as the levels. Notice that the levels and the values are the same.
You can also add labels to a factor. For example, let’s say you wanted to abbreviate the names in the ‘car.makers’ variable by removing vowels. Here is how.
> factor(car.makers, labels=c("Frd", "Isz", "Hnd", "Tyt"))  Frd Hnd Isz Tyt Levels: Frd Isz Hnd Tyt
All that we did was add the ‘labels’ argument to the “factor” function. We put in are substitute values and we are done.
A unique thing about R is that when you look at the structure of the factor using the “str” function you get the following printout.
> str(car.makers) Factor w/ 4 levels "Frd","Isz","Hnd",..: 1 3 2 4
This is telling us that the factor ‘car.makers has for levels. R gives us three of them next. After this we get the number 1, 3, 2, 4. What do these numbers mean.
R assigns number to factor levels base on alphabetical order below is a translation of the list above.
- Ford is the first letter in the list alphabetically so it gets 1
- Isuzu is second alphabetically and gets 2
- Honda is next and receives a 3
- Toyota, which was not listed because R only prints the first three levels is the the last level alphabetically and receives a 4
These numbers can be used to create subsets just as with vectors. Below is another example
> levels(car.makers) [3:4]  "Hnd" "Tyt"
In the example above, we told are that we want the levels of the factor ‘car.makers’. However, we specifically asked for levels 3 and for by using the brackets. R then prints the names of level 3 and 4
This provides some basic understanding on factors.