Dealing with overplotting

Author

Claus O. Wilke

Introduction

In this worksheet, we will discuss how to make 2D density plots and histograms, to effectively visualize scatter plots in which many points lie on top of one another.

First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.

Next we set up the data. We will be working with data on individual penguins in Antarctica.

penguins

2D density plots

2D density plots are a replacement for and alternative to scatter plots. They visualize the density of points in the 2D plane, and they are particularly useful when the point density is very high, so that many points lie on top of one another. We will demonstrate 2D density plots for the penguins dataset, specifically for a scatter plot of bill length versus body mass.

To create a 2D density plot, we simply add geom_density_2d(). Try this out.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  ___ +
  geom_point(na.rm = TRUE) +
  theme_bw()
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_density_2d(na.rm = TRUE) + 
  geom_point(na.rm = TRUE) +
  theme_bw()

You can change the number of contour lines shown by providing the bins argument to geom_density_2d(). For example, try bins = 5.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_density_2d(na.rm = TRUE, bins = ___) + 
  geom_point(na.rm = TRUE) +
  theme_bw()
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_density_2d(na.rm = TRUE, bins = 5) + 
  geom_point(na.rm = TRUE) +
  theme_bw()

Also try other values for bins.

The plots we have made so far did not consider different penguin species, but we know from earlier worksheets that there are three penguins species with quite different values for body mass and bill length. Modify the above plot so that both points and contour lines are colored by species.

Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm, color = species)) +
  geom_density_2d(na.rm = TRUE, bins = 5) + 
  geom_point(na.rm = TRUE) +
  theme_bw()

Also try faceting by species.

Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm, color = species)) +
  geom_density_2d(na.rm = TRUE, bins = 5) + 
  geom_point(na.rm = TRUE) +
  theme_bw() +
  facet_wrap(~species)

Finally, you can use geom_density_2d_filled() to draw filled contour bands instead of contour lines. Try this out.

Hints:

  • You cannot map penguin species to color when drawing contour bands.
  • It helps to set alpha transparency for contour bands, e.g. alpha = 0.5.
  • You may want to make the point size smaller so you can see the contour bands underneath the points.

Note: This example may not work in the live web environment.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_density_2d_filled(___) + 
  geom_point(na.rm = TRUE, size = ___) +
  theme_bw() +
  facet_wrap(~species)
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_density_2d_filled(na.rm = TRUE, bins = 5, alpha = 0.5) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species)

2D histograms

2D histograms are very similar to 2D density plots. They are generated by simply subdividing the 2D plane into regularly shaped regions (rectangles or hexagons), counting how many data points fall into each region, and then coloring each region by its count.

To make rectangular 2D hexagons, you can use geom_bin2d(). Try this out.

Hint: Set bins = 5 to get a reasonable number of bins for this dataset.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_bin2d(___) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species)
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_bin2d(na.rm = TRUE, bins = 5) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species)

It helps to make the bins partially transparent (i.e., set alpha = 0.5). Also use a different sequential color scale.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_bin2d(na.rm = TRUE, bins = 5, ___) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill____
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_bin2d(na.rm = TRUE, bins = 5, alpha = 0.5) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()

You can control bins in a more fine-grained manner by setting the binwidth argument. It takes a vector of two numbers, where the first is the width of the bins (in data units) and the second is the height (also in data units). Make bins that are 10000 units wide and 5 units tall.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_bin2d(
    na.rm = TRUE,
    binwidth = ___,
    alpha = 0.5
  ) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_bin2d(
    na.rm = TRUE,
    binwidth = c(10000, 5),
    alpha = 0.5
  ) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()

Also try making bins that are 30 units tall and 2500 units wide.

Instead of rectangular bins, you can also make hexbins, with geom_hex(). It mostly works the same as geom_bin2d(). Try it out.

Hint: You will need to set the argument bins to an appropriate value for the hexbins to look good.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_hex(
    ___
  ) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_hex(
    na.rm = TRUE,
    bins = 5,
    alpha = 0.5
  ) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()

Just as was the case with geom_bin2d(), you can provide an argument binwidth consisting of two values, one controling the width and the other the height. Try this out.

Hint
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_hex(
    na.rm = TRUE,
    binwidth = ___
    alpha = 0.5
  ) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()
Solution
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
  geom_hex(
    na.rm = TRUE,
    binwidth = c(1000, 5),
    alpha = 0.5
  ) + 
  geom_point(na.rm = TRUE, size = 0.2) +
  theme_bw() +
  facet_wrap(~species) +
  scale_fill_viridis_c()