Dealing with issues of overplotting

Claus O. Wilke

2025-04-10

Be aware of points plotted exactly on top of one another

 

Technical term for this problem: overplotting

Partial transparency helps highlight overlapping points

 

A little jitter shows overlaps even more clearly

 

But don’t jitter too much

 

2D density plots: Contour lines and contour bands

Contour lines are the 2D version of density plots

 

Contour lines are the 2D version of density plots

 

We can vary shading for added effect

 

What do we do when there are multiple groups?

 

Colored contour lines can work for 2 to 3 groups

 

What if there are multiple groups intermingled?

 

Don’t make plots that look like spaghetti

 

Contour lines work well with small multiples (facets)

 

2D histograms: Rectangular and hex bins

2D histograms: rectangular bins

 

We need to choose a bin size as in regular histograms

2D histograms: rectangular bins

 

We need to choose a bin size as in regular histograms

2D histograms: hex bins

 

2D histograms: hex bins

 

Choosing the right color scale

 

palette: SunsetDark

Choosing the right color scale

 

palette: Batlow

Choosing the right color scale

 

palette: YlOrRd

Choosing the right color scale

 

palette: BluYl

Choosing the right color scale

 

palette: Heat

Choosing the right color scale

 

palette: ag_GrnYl

Choosing the right color scale

 

palette: SunsetDark

Creating 2D density plots and histograms in R

Contour lines

Getting the data:

blue_jays <- read_csv("https://wilkelab.org/SDS366/datasets/blue_jays.csv")

blue_jays
# A tibble: 123 × 8
   bird_id    sex   bill_depth_mm bill_width_mm bill_length_mm head_length_mm
   <chr>      <chr>         <dbl>         <dbl>          <dbl>          <dbl>
 1 0000-00000 M              8.26          9.21           25.9           56.6
 2 1142-05901 M              8.54          8.76           25.0           56.4
 3 1142-05905 M              8.39          8.78           26.1           57.3
 4 1142-05907 F              7.78          9.3            23.5           53.8
 5 1142-05909 M              8.71          9.84           25.5           57.3
 6 1142-05911 F              7.28          9.3            22.2           52.2
 7 1142-05912 M              8.74          9.28           25.4           57.1
 8 1142-05914 M              8.72          9.94           30             60.7
 9 1142-05917 F              8.2           9.01           22.8           52.8
10 1142-05920 F              7.67          9.31           24.6           54.9
# ℹ 113 more rows
# ℹ 2 more variables: body_mass_g <dbl>, skull_size_mm <dbl>

Contour lines

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_point() +
  theme_bw()

 

Contour lines

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_density_2d() +
  geom_point() +
  theme_bw()

 

Contour lines

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_density_2d(bins = 5) +
  geom_point() +
  theme_bw()

 

Contour bands

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_density_2d_filled(bins = 5) +
  geom_point() +
  theme_bw()

 

Contour bands

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_density_2d_filled(
    bins = 5,
    alpha = 0.5
  ) +
  geom_point() +
  theme_bw()

 

Contour bands

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_density_2d_filled(
    bins = 5,
    alpha = 0.5
  ) +
  geom_density_2d(
    bins = 5,
    color = "black",
    linewidth = 0.2
  ) +
  geom_point() +
  theme_bw()

 

2D histograms

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_bin_2d() +
  theme_bw()

 

2D histograms

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_bin_2d(binwidth = c(3, 3)) +
  theme_bw()

 

2D histograms

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_bin_2d(binwidth = c(1, 5)) +
  theme_bw()

 

2D histograms

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_bin_2d(binwidth = c(5, 1)) +
  theme_bw()

 

Hex bins

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_hex() +
  theme_bw()

 

Hex bins

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_hex(bins = 15) +
  theme_bw()

 

Hex bins

blue_jays |>
  ggplot() +
  aes(body_mass_g, head_length_mm) +
  geom_hex(bins = 10) +
  theme_bw()

 

Further reading