Redundant coding, text annotations
Introduction
In this worksheet, we will discuss how to encode data using multiple visual channels (such as color and point shape), and we will also discuss text annotations.
First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.
We will be working with two datasets, iris
and cars93
. The iris
dataset contains measurements on the flowers of three Iris species.
Hint: Pay attention to the column names in the iris
dataset. They are all capitalized (e.g., Species
), and the first four use a point as a separator (e.g., Sepal.Length
). It is easy to misspell them and then the R code doesn’t work correctly.
The cars93
dataset contains information about various passenger cars that were on the market in 1993.
Mapping variables to color and shape
First, we will do an exercise to practice using multiple visual channels (color and shape) to represent the same qualitative variable. We will do this exercise with the iris
dataset.
Make a plot of Sepal.Width
versus Sepal.Length
for the three species in the iris
dataset. Map Species
to both color
and shape
.
ggplot(iris, aes(Sepal.Length, Sepal.Width, ___)) +
geom_point()
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species, shape = Species)) +
geom_point()
You can set the shapes with scale_shape_manual()
, just like you do with colors. There are five special shapes, 21 through 25, that have a line color and a fill color. Modify the plot from the previous exercise so it uses these shapes. Hint: This means you should use the fill
aesthetic rather than the color
aesthetic.
ggplot(iris, aes(Sepal.Length, Sepal.Width, ___)) +
geom_point() +
scale_shape_manual(values = ___)
ggplot(iris, aes(Sepal.Length, Sepal.Width, fill = Species, shape = Species)) +
geom_point() +
scale_shape_manual(values = c(21, 23, 25))
Manually applying text labels
We can place text labels with geom_text()
. Oftentimes it makes sense to manually fine-tune exactly where the text labels will be located. To practice this, we will work with a simple dataset that contains three points:
Plot these three points with geom_point()
, and use geom_text()
to add the label text to the right side of each point. Remember that hjust = 0
plots text left-justified. Hints: Add xlim(1, 4)
to ensure the text labels don’t run beyond the edge of the plot panel.
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(___) +
xlim(1, 4)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(aes(label = ___), hjust = ___) +
xlim(1, 4)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(aes(label = label), hjust = 0) +
xlim(1, 4)
Now place the text labels centered below the points. Remember: hjust = 0.5
means horizontally centered, and vjust = 1
means vertically below the reference point. You may also have to adjust x and y limits to make sure all labels are within the plot area.
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(
aes(label = label),
___+
) xlim(0.5, 3.5) +
ylim(0.5, 3)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(
aes(label = label),
hjust = ___,
vjust = ___
+
) xlim(0.5, 3.5) +
ylim(0.5, 3)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(
aes(label = label),
hjust = 0.5,
vjust = 1
+
) xlim(0.5, 3.5) +
ylim(0.5, 3)
Finally, place each label in a different relative orientation to the point. Place “alpha” horizontally centered underneath the point, “beta” vertically centered left of the point, and “gamma” horizontally centered above the point. This will require adding justification data columns to the data table and then mapping them to hjust
and vjust
in geom_text()
.
<- tibble(
data x = c(1, 2, 3),
y = c(1, 3, 2),
label = c("alpha", "beta", "gamma"),
hjust = ___,
vjust = ___
)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(
aes(label = ___, ___)
+
) xlim(0.5, 3.5) +
ylim(0.5, 3)
<- tibble(
data x = c(1, 2, 3),
y = c(1, 3, 2),
label = c("alpha", "beta", "gamma"),
hjust = c(0.5, 1, 0.5),
vjust = c(1, 0.5, 0)
)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(
aes(label = ___, ___)
+
) xlim(0.5, 3.5) +
ylim(0.5, 3)
<- tibble(
data x = c(1, 2, 3),
y = c(1, 3, 2),
label = c("alpha", "beta", "gamma"),
hjust = c(0.5, 1, 0.5),
vjust = c(1, 0.5, 0)
)
ggplot(data, aes(x, y)) +
geom_point() +
geom_text(
aes(label = label, hjust = hjust, vjust = vjust)
+
) xlim(0.5, 3.5) +
ylim(0.5, 3)
Non-overlapping text labels
When there are many points to be labeled, we frequently run into the issue that labels overlap and become unreadable. This problem can be resolved with geom_text_repel()
from the ggrepel package. This geom ensures that none of the text labels overlap. It is also highly customizable, and nearly any labeling problem can be solved with it.
Consider the following plot of fuel-tank capacity versus price, for cars costing more than $30k.
Use geom_text_repel()
to add a text label to each point that shows the make of the car (column Make
). Hint: Set max.overlaps = Inf
to avoid a warning about unlabeled data points.
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = ___),
max.overlaps = ___
)
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf
)
The value of the argument box.padding
determines how far the labels are drawn from the data points. The default is box.padding = 0.25
. Try out what larger values do. E.g., use 0.8 or 1.2.
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf,
___ )
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf,
box.padding = 0.8
)
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf,
box.padding = 1.2
)
See if you can pull the text labels towards the left edge of the plot. This will require using the arguments force_pull
, hjust
, nudge_x
, and direction
. It will also require manual setting of the x limits. For additional hints, see the ggrepel documentation here.
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf,
___+
) ___
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf,
force_pull = ___,
hjust = ___,
nudge_x = ___,
direction = ___
+
) xlim(___)
|>
cars93 filter(Price > 30) |>
ggplot(aes(Price, Fuel.tank.capacity)) +
geom_point() +
geom_text_repel(
aes(label = Make),
max.overlaps = Inf,
force_pull = 0,
hjust = 1,
nudge_x = -10,
direction = "y"
+
) xlim(20, 65)
Experiment with the various options for force_pull
, hjust
/vjust
, nudge_x
/nudge_y
, and direction
to get a sense of how they work.