Annotating data allows you to communicate vital information in a visualization for an audience. In the example below, we will look at how to annotate a visualization while using Python.
Libraries and Data Preparation
We will begin by loading the needed libraries and preparing the data. In the code below, lines 1 and 3 load our visualization libraries. Line 2 loads the function we will need to load our data.
import seaborn as sns
from pydataset import data
import matplotlib.pyplot as plt
In the code below, we use the data() function to load the Prestige data from pydataset into an object called df. Then, we display the head of this data using the .head() method.
df=data('Prestige')
df.head()
Our dataset contains various jobs measured on five dimensions. In our code below, we will focus on using the education, income, and prestige variables.
Making a Comment
Now we will add a comment to our visualization. Specifically, we will point out the highest income value. Below is the code, followed by the visualization
# Draw basic scatter plot of education data and income
sns.scatterplot(x = 'education', y = 'income', data = df)
# Label highest income value with text annotation
plt.text(6, 25000,
'The max income is over 25000',
# Set the font to large
fontdict = {'ha': 'left', 'size': 'large'})
plt.show()
The first step was to make a basic scatterplot. We use the .scatterplot() method from seaborn and plot education and income from our df dataset. Next, we set up our text using the .text() method from matplotlib. For the text, we set an x and y value and then indicate what the text should say. Below that, we adjust the font to come from the left and top to be large in size.
Arrow Annotation
Using an arrow is another way to bring attention to data in a visualization. In the code below, we will use an arrow that will point to the same data point that we used in the previous example. Below is the code, followed by the visualization.
# Query and filter to General Managers
women_census = df.query("(women == 4.02) & (census == 1130)")
prestige_type = df.query("(prestige == 69.1) & (type == 'prof')")
sns.scatterplot(x = 'education', y = 'income',
data = df)
# Point arrow to General Managers
plt.annotate('General Managers',
xy = (women_census.education, prestige_type.income),
xytext = (6.5, 15000),
# Shrink the arrow to avoid occlusion
arrowprops = {'facecolor':'gray', 'width': 3, 'shrink': 0.03},
backgroundcolor = 'white')
plt.show()
Here is what we did,
- We create two queries to locate the data point we want the arrow to point to. All the values in the .query() method for both the woman_census and the prestige_type are values from the general manager row. As shown below,
These two objects are used to locate general managers in the dataset.
2. We make the same scatterplot as shown before
3. The .annotate() method is used. We start by writing in quotes what we want to appear in the scatterplot. Next, we set the x and y coordinates of the data point we want to highlight using the women_census and prestige_type queries we did previously. From there, we have to set the text location. After this, we set the arrow properties in terms of the color, width, and size. Lastly, the background color is set.
Annotation with Color & Text
Color annotation provides a contrast based on color. Below is the code and the visualization when this approach is used.
# Make a vector where prof is orangered; else lightgray
prof = ['orangered' if type == 'prof' else 'lightgray' for type in df['type']]
# Map facecolors to the list prof and set alpha to 0.3
sns.regplot(x = 'education',
y = 'income',
data = df,
fit_reg = False,
scatter_kws = {'facecolors':prof, 'alpha': 0.3})
# Add annotation to plot
plt.text(11, 23000, 'General Managers')
plt.show()
This approach is simpler compared to the last one. We begin by separating the data based on type. Professionals are colored orange, and the rest are colored light gray. Next, we create our scatterplot using the .regplot() method this time. Education and income are the x and y axes, the regression line is removed, and the color of the dots is set using the scatter_kws argument. The “prof” argument provides the coloring rules, and the alpha is set to make the points transparent. The next step uses the .text() method to set the x and y coordinates for the text.
Conclusion
Annotation is one of many ways to bring attention to crucial insights in a visualization. The examples above provide some of the many ways this tool can be used to provide crucial information when using Python
