Redundant coding, text annotations

Author

Claus O. Wilke

Introduction

In this worksheet, we will discuss how to encode data using multiple visual channels (such as color and point shape), and we will also discuss text annotations.

First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.

We will be working with two datasets, iris and cars93. The iris dataset contains measurements on the flowers of three Iris species.

Hint: Pay attention to the column names in the iris dataset. They are all capitalized (e.g., Species), and the first four use a point as a separator (e.g., Sepal.Length). It is easy to misspell them and then the R code doesn’t work correctly.

The cars93 dataset contains information about various passenger cars that were on the market in 1993.

Mapping variables to color and shape

First, we will do an exercise to practice using multiple visual channels (color and shape) to represent the same qualitative variable. We will do this exercise with the iris dataset.

Make a plot of Sepal.Width versus Sepal.Length for the three species in the iris dataset. Map Species to both color and shape.

Hint
ggplot(iris, aes(Sepal.Length, Sepal.Width, ___)) +
  geom_point()
Solution
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species, shape = Species)) +
  geom_point()

You can set the shapes with scale_shape_manual(), just like you do with colors. There are five special shapes, 21 through 25, that have a line color and a fill color. Modify the plot from the previous exercise so it uses these shapes. Hint: This means you should use the fill aesthetic rather than the color aesthetic.

Hint
ggplot(iris, aes(Sepal.Length, Sepal.Width, ___)) +
  geom_point() +
  scale_shape_manual(values = ___)
Solution
ggplot(iris, aes(Sepal.Length, Sepal.Width, fill = Species, shape = Species)) +
  geom_point() +
  scale_shape_manual(values = c(21, 23, 25))

Manually applying text labels

We can place text labels with geom_text(). Oftentimes it makes sense to manually fine-tune exactly where the text labels will be located. To practice this, we will work with a simple dataset that contains three points:

Plot these three points with geom_point(), and use geom_text() to add the label text to the right side of each point. Remember that hjust = 0 plots text left-justified. Hints: Add xlim(1, 4) to ensure the text labels don’t run beyond the edge of the plot panel.

Hint
ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(___) +
  xlim(1, 4)
Hint
ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(aes(label = ___), hjust = ___) +
  xlim(1, 4)
Solution
ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(aes(label = label), hjust = 0) +
  xlim(1, 4)

Now place the text labels centered below the points. Remember: hjust = 0.5 means horizontally centered, and vjust = 1 means vertically below the reference point. You may also have to adjust x and y limits to make sure all labels are within the plot area.

Hint
ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(
    aes(label = label),
    ___
  ) +
  xlim(0.5, 3.5) +
  ylim(0.5, 3)
Hint
ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(
    aes(label = label),
    hjust = ___,
    vjust = ___
  ) +
  xlim(0.5, 3.5) +
  ylim(0.5, 3)
Solution
ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(
    aes(label = label),
    hjust = 0.5,
    vjust = 1
  ) +
  xlim(0.5, 3.5) +
  ylim(0.5, 3)

Finally, place each label in a different relative orientation to the point. Place “alpha” horizontally centered underneath the point, “beta” vertically centered left of the point, and “gamma” horizontally centered above the point. This will require adding justification data columns to the data table and then mapping them to hjust and vjust in geom_text().

Hint
data <- tibble(
  x = c(1, 2, 3),
  y = c(1, 3, 2),
  label = c("alpha", "beta", "gamma"),
  hjust = ___,
  vjust = ___
)

ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(
    aes(label = ___, ___)
  ) +
  xlim(0.5, 3.5) +
  ylim(0.5, 3)
Hint
data <- tibble(
  x = c(1, 2, 3),
  y = c(1, 3, 2),
  label = c("alpha", "beta", "gamma"),
  hjust = c(0.5, 1, 0.5),
  vjust = c(1, 0.5, 0)
)

ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(
    aes(label = ___, ___)
  ) +
  xlim(0.5, 3.5) +
  ylim(0.5, 3)
Solution
data <- tibble(
  x = c(1, 2, 3),
  y = c(1, 3, 2),
  label = c("alpha", "beta", "gamma"),
  hjust = c(0.5, 1, 0.5),
  vjust = c(1, 0.5, 0)
)

ggplot(data, aes(x, y)) +
  geom_point() +
  geom_text(
    aes(label = label, hjust = hjust, vjust = vjust)
  ) +
  xlim(0.5, 3.5) +
  ylim(0.5, 3)

Non-overlapping text labels

When there are many points to be labeled, we frequently run into the issue that labels overlap and become unreadable. This problem can be resolved with geom_text_repel() from the ggrepel package. This geom ensures that none of the text labels overlap. It is also highly customizable, and nearly any labeling problem can be solved with it.

Consider the following plot of fuel-tank capacity versus price, for cars costing more than $30k.

Use geom_text_repel() to add a text label to each point that shows the make of the car (column Make). Hint: Set max.overlaps = Inf to avoid a warning about unlabeled data points.

Hint
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = ___),
    max.overlaps = ___
  )
Solution
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf
  )

The value of the argument box.padding determines how far the labels are drawn from the data points. The default is box.padding = 0.25. Try out what larger values do. E.g., use 0.8 or 1.2.

Hint
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf,
    ___
  )
Solution
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf,
    box.padding = 0.8
  )

cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf,
    box.padding = 1.2
  )

See if you can pull the text labels towards the left edge of the plot. This will require using the arguments force_pull, hjust, nudge_x, and direction. It will also require manual setting of the x limits. For additional hints, see the ggrepel documentation here.

Hint
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf,
    ___
  ) +
  ___
Hint
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf,
    force_pull = ___,
    hjust = ___,
    nudge_x = ___,
    direction = ___
  ) +
  xlim(___)
Solution
cars93 |>
  filter(Price > 30) |>
  ggplot(aes(Price, Fuel.tank.capacity)) +
  geom_point() +
  geom_text_repel(
    aes(label = Make),
    max.overlaps = Inf,
    force_pull = 0,
    hjust = 1,
    nudge_x = -10,
    direction = "y"
  ) +
  xlim(20, 65)

Experiment with the various options for force_pull, hjust/vjust, nudge_x/nudge_y, and direction to get a sense of how they work.