In this post, we will explore how to create data frames as well as looking at other aspects of using data frames in R. The first example below is a data frame that contains information about fictional faculty members. Our job will be to put this information a data frame and to rename the columns. Below is the example and it will be followed by an explanation.
> Faculty <- c('Darrin Thomas', 'Hank Smith', 'Sarah William') > Salary <- c(60000, 50000, 53000) > Hire_Date <- as.Date(c('2015-1-1', '2000-6-1', '2012-9-1')) > Lecturers.data <- data.frame(Faculty, Salary, Hire_Date) > str(Lecturers.data) 'data.frame': 3 obs. of 3 variables: $ Faculty : Factor w/ 3 levels "Darrin Thomas",..: 1 2 3 $ Salary : num 60000 50000 53000 $ Hire_Date: Date, format: "2015-01-01" "2000-06-01" "2012-09-01"
Here is what happen
- We started by making three different vectors and assigning a variable to each. The variables are ‘Faculty’, ‘Salary’, and ‘Hire_Date’.
- We then assigned all three variables to the data frame ‘Lecturers.data’ ‘Faculty’ is a factor vector, ‘Salary’ is a numeric vector, and ‘Hire_Date’ is a date vector. Again, the advantage of data frames is their ability to have several different types of data
- We then used the ‘str’ function to see the attributes of the ‘Lecturers.data’ data frame.
There is one small problem with the data frame above. ‘Faculty’ is a factor vector but our original vector for “Faculty’ was a character vector. We want ‘Faculty’ to continue to be a character vector instead of it becoming a factor. The example below shows one way to deal with this small problem.
> Lecturers.data <- data.frame(Faculty, Salary, Hire_Date, stringsAsFactors=FALSE) > str(Lecturers.data) 'data.frame': 3 obs. of 3 variables: $ Faculty : chr "Darrin Thomas" "Hank Smith" "Sarah William" $ Salary : num 60000 50000 53000 $ Hire_Date: Date, format: "2015-01-01" "2000-06-01" "2012-09-01"
By adding the argument ‘stringsAsFactors=FALSE’ it make forces all vectors to not be factors. If you look closely you will see that ‘$ Faculty’ is not a Factor anymore as in the previous example. Instead it is now a ‘chr’ or character variable.
It is also possible to rename column names in a data frame just like in a matrix. For example, let’s say you made a mistake with the ‘Hire_Date’ variable. You did not mean the the date the lecturers were hired but the date they resigned. Below is an example of how to fix this.
> Lecturers.data Faculty Salary Hire_Date 1 Darrin Thomas 60000 2015-01-01 2 Hank Smith 50000 2000-06-01 3 Sarah William 53000 2012-09-01 > names(Lecturers.data)  <- 'Resign_Date' > Lecturers.data Faculty Salary Resign_Date 1 Darrin Thomas 60000 2015-01-01 2 Hank Smith 50000 2000-06-01 3 Sarah William 53000 2012-09-01
Here is what happening
- We displayed the data frame ‘Lecturers.data’ as a reference point
- We noticed that we did not want a column named ‘Hired_Date’ but want to change the name to ‘Resign_Date’
- To change the name we use the ‘names’ function to change the name of a column in ‘Lecturers.data’. We specifically tell are to rename the third column by using the subset brackets  and assign the name ‘Resign_Date’
- We then redisplay the ‘Lecturers.data’ data frame. If you compare this data frame with the first you can see that the third column has been renamed as desired.
This post provided some basic information on developing data frames. We learned how to combine vectors into a data frame, how to change a factor to a character vector, and how to rename a column. Such skills as these are beneficial to anyone who needs to use data frames.