A common goal of statistics is to try and identify trends in the data as well as to predict what may happen. Both of these goals can be partially achieved through the development of graphs and or charts.
In this post, we will look at adding a smooth line to a scatterplot using the “ggplot2” package.
To accomplish this, we will use the “Carseats” dataset from the “ISLR” package. We will explore the relationship between the price of carseats with actual sales along with whether the carseat was purchase in an urban location or not. Below is some initial code to prepare for the analysis.
We are going to us a layering approach in this example. This means we will add one piece of code at a time until we have the complete plot.We are now going to plot the initial scatterplot. We simply want a scatterplot depicting the relationship between Price and Sales of carseats.
ggplot(data=Carseats, aes(x=Price, y=Sales, col=Urban))+geom_point()
The general trend appears to be negative. As price increases sales decrease regardless if carseat was purchase in an urban setting or not.
We will now add ar LOESS line to the graph. LOESS stands for “Locally weighted smoothing” this is a commonly used tool in regression analysis. The addition of a LOESS line allows in identifying trends visually much easily. Below is the code
ggplot(data=Carseats, aes(x=Price, y=Sales, col=Urban))+geom_point()+ stat_smooth()
Unlike a regression line which is strictly straight, a LOESS line curves with the data. As you look at the graph the LOESS line is mostly straight with curves at the extremes and for small rise in fall in the middle for carseats purchased in urban areas.
So far we have created LOESS lines by the categorical variable Urban. We can actually make a graph with three LOESS lines. One for Yes urban, another for No Urban, and a last one that is an overall line that does not take into account the Urban variable. Below is the code.
ggplot()+ geom_point(data=Carseats, aes(x=Price, y=Sales, col=Urban))+ stat_smooth(data=Carseats, aes(x=Price, y=Sales))+stat_smooth(data=Carseats, aes(x=Price, y=Sales, col=Urban))
Notice that the code is slightly different with the information being mostly outside of the “ggplot” function. You can barely see the third line in the graph but if you look closely you will see a new blue line that was not there previously. This is the overall trend line. If you want you can see the overall trend line with the code below.
ggplot()+ geom_point(data=Carseats, aes(x=Price, y=Sales, col=Urban))+ stat_smooth(data=Carseats, aes(x=Price, y=Sales))
The very first graph we generated in this post only contained points. This is because we used the “geom_point” function. Any of the graphs we created could be generated with points by removing the “geom_point” function and only using the “stat_smooth” function as shown below.
ggplot(data=Carseats, aes(x=Price, y=Sales, col=Urban))+ stat_smooth()
Conclusion This post provided an introduction to adding LOESS lines to a graph using ggplot2. For presenting data in a visually appealing way, adding lines can help in identifying key characteristics in the data.