Apply Functions in R

The last R post focused on the use of the “for” loop. This option is powerful in repeating an action that cannot be calculated in a vector. However, there are some drawbacks to ‘for’loops that are highly technical and hard to explain to beginners. The problems have something to do with strange results in the workspace and environment.

To deal with the complex problems with ‘for’ loops have the alternative approach of using functions from the apply family. The functions in the apply family provide the same results as a ‘for’ loop without the occasional problem. There are three functions in the apply family and they are

  • apply
  • sapply
  • lapply

We discuss each with a realistic example.

Apply

The ‘apply’ function is useful for producing results for a matrix, array, or data frame. They do this by producing results from the rows and or columns. The results of an ‘apply’ function are always shared as a vector, matrix, or list. Below is an example of the use of an ‘apply’ function.

You make a matrix that contains how many points James, Kevin, Williams have scored in the past three games. Below is the code for this.

> points.per.game<- matrix(c(25,23,32,20,18,24,12,15,16), ncol=3)
> colnames(points.per.game)<-c('James', 'Kevin', 'Williams')
> rownames(points.per.game)<-c('Game1', 'Game2', 'Game3')
> points.per.game
      James Kevin Williams
Game1    25    20       12
Game2    23    18       15
Game3    32    24       16

You want to know the most points James, Kevin, and Williams scored for any game. To do this, you use the ‘apply’ function as follows.

> apply(points.per.game, 2, max)
   James    Kevin Williams 
      32       24       16

Here is what we did

  1. We used the ‘apply’ function and in the parentheses we  put the arguments “points.per.game” as this is the name of the matrix, ‘2’ which tells R to examine the matrix by column, and lastly we used the argument ‘max’ which tells are to find the maximum value in each column.
  2. R prints out the results telling us the most points each player scored regardless of the game.

Sapply

The ‘apply’ function works for multidimensional objects such as matrices, arrays, and data frames. ‘Sapply’ is used for vectors, data frames, and list. ‘Sapply’ is more flexible in that it can be used for single dimension (vectors) and multidimensional (data frames) objects.  The output from using the ‘sapply’ function is always a vector, matrix, or list. Below is an example

Let’s say you make the following data frame

> GameInfo
  points      GameType
1    101          Home
2    115          Away
3     98 International
4     89          Away
5    104          Home

You now want to know what kind of variables you have in the ‘GameInfo’ data frame. You can calculate this one at a time or you can use the following script with the ‘sapply’ function

> sapply(GameInfo, class)
   points  GameType 
"numeric"  "factor"

Here is what we did

  1. We use the ‘sappy’ function and included two arguments. The first is the name of the data frame “GameInfo” the second is the argument ‘class’ which tells R to determine what kind of variable is in the “GameInfo” data frame
  2. The answer is then printed. The variable ‘points’ is a numeric variable while the variable ‘GameType’ is a factor.

lapply

The ‘lapply’ function works exactly like the ‘sapply’ function but always returns a list. See the example below

> lapply(GameInfo, class)
$points
[1] "numeric"

$GameType
[1] "character"

This the same information from the ‘sapply’ example we ran earlier.

Conclusion

The ‘apply’ family serves the purpose of running a loop without the concerns of the ‘for’ loop. This feature is useful for various objects.

2 thoughts on “Apply Functions in R

  1. Pingback: Working with Data in R | educationalresearchtechniques

  2. Pingback: Apply Functions in R | educationalresearchtechn...

Leave a Reply