The last R post focused on the use of the “for” loop. This option is powerful in repeating an action that cannot be calculated in a vector. However, there are some drawbacks to ‘for’loops that are highly technical and hard to explain to beginners. The problems have something to do with strange results in the workspace and environment.
To deal with the complex problems with ‘for’ loops have the alternative approach of using functions from the apply family. The functions in the apply family provide the same results as a ‘for’ loop without the occasional problem. There are three functions in the apply family and they are
We discuss each with a realistic example.
The ‘apply’ function is useful for producing results for a matrix, array, or data frame. They do this by producing results from the rows and or columns. The results of an ‘apply’ function are always shared as a vector, matrix, or list. Below is an example of the use of an ‘apply’ function.
You make a matrix that contains how many points James, Kevin, Williams have scored in the past three games. Below is the code for this.
> points.per.game<- matrix(c(25,23,32,20,18,24,12,15,16), ncol=3) > colnames(points.per.game)<-c('James', 'Kevin', 'Williams') > rownames(points.per.game)<-c('Game1', 'Game2', 'Game3') > points.per.game James Kevin Williams Game1 25 20 12 Game2 23 18 15 Game3 32 24 16
You want to know the most points James, Kevin, and Williams scored for any game. To do this, you use the ‘apply’ function as follows.
> apply(points.per.game, 2, max) James Kevin Williams 32 24 16
Here is what we did
- We used the ‘apply’ function and in the parentheses we put the arguments “points.per.game” as this is the name of the matrix, ‘2’ which tells R to examine the matrix by column, and lastly we used the argument ‘max’ which tells are to find the maximum value in each column.
- R prints out the results telling us the most points each player scored regardless of the game.
The ‘apply’ function works for multidimensional objects such as matrices, arrays, and data frames. ‘Sapply’ is used for vectors, data frames, and list. ‘Sapply’ is more flexible in that it can be used for single dimension (vectors) and multidimensional (data frames) objects. The output from using the ‘sapply’ function is always a vector, matrix, or list. Below is an example
Let’s say you make the following data frame
> GameInfo points GameType 1 101 Home 2 115 Away 3 98 International 4 89 Away 5 104 Home
You now want to know what kind of variables you have in the ‘GameInfo’ data frame. You can calculate this one at a time or you can use the following script with the ‘sapply’ function
> sapply(GameInfo, class) points GameType "numeric" "factor"
Here is what we did
- We use the ‘sappy’ function and included two arguments. The first is the name of the data frame “GameInfo” the second is the argument ‘class’ which tells R to determine what kind of variable is in the “GameInfo” data frame
- The answer is then printed. The variable ‘points’ is a numeric variable while the variable ‘GameType’ is a factor.
The ‘lapply’ function works exactly like the ‘sapply’ function but always returns a list. See the example below
> lapply(GameInfo, class) $points  "numeric" $GameType  "character"
This the same information from the ‘sapply’ example we ran earlier.
The ‘apply’ family serves the purpose of running a loop without the concerns of the ‘for’ loop. This feature is useful for various objects.