Visualizing distributions 2

Claus O. Wilke

2025-01-13

Reminder: Density estimates visualize distributions

Mean temperatures in Lincoln, NE, in January 2016:

date mean temp
2016-01-01 24
2016-01-02 23
2016-01-03 23
2016-01-04 17
2016-01-05 29
2016-01-06 33
2016-01-07 30
2016-01-08 25
2016-01-09 9
2016-01-10 11

 

How can we compare distributions across months?

A bad idea: Many overlapping density plots

 

Another bad idea: Stacked density plots

 

Somewhat better: Small multiples

 

Instead: Show values along y, conditions along x

 

A boxplot is a crude way of visualizing a distribution.

How to read a boxplot

 

If you like density plots, consider violins

 

A violin plot is a density plot rotated 90 degrees and then mirrored.

How to read a violin plot

 

For small datasets, you can also use a strip chart

Advantage: Can see raw data points instead of abstract representation.

 

Horizontal jittering may be necessary to avoid overlapping points.

For small datasets, you can also use a strip chart

Advantage: Can see raw data points instead of abstract representation.

 

Horizontal jittering may be necessary to avoid overlapping points.

For small datasets, you can also use a strip chart

Advantage: Can see raw data points instead of abstract representation.

 

Horizontal jittering may be necessary to avoid overlapping points.

We can also jitter points into violins

 

Such plots are called sina plots, to honor Sina Hadi Sohi.

But maybe there’s hope for overlapping density plots?

 

How about we stagger the densities vertically?

Vertically staggered density plots are called ridgelines

 

Notice the single fill color. More colors would be distracting.

Making boxplots, violins, etc. in ggplot2

Getting the data

All examples will use the lincoln_temps dataset:

lincoln_temps <- readRDS(url("https://wilkelab.org/SDS366/datasets/lincoln_temps.rds"))

lincoln_temps
# A tibble: 366 × 4
   date       month month_long mean_temp
   <date>     <fct> <fct>          <int>
 1 2016-01-01 Jan   January           24
 2 2016-01-02 Jan   January           23
 3 2016-01-03 Jan   January           23
 4 2016-01-04 Jan   January           17
 5 2016-01-05 Jan   January           29
 6 2016-01-06 Jan   January           33
 7 2016-01-07 Jan   January           30
 8 2016-01-08 Jan   January           25
 9 2016-01-09 Jan   January            9
10 2016-01-10 Jan   January           11
# ℹ 356 more rows

Making boxplots, violins, etc. in ggplot2

Plot type Geom Notes
boxplot geom_boxplot()
violin plot geom_violin()
strip chart geom_point() Jittering requires position_jitter()
sina plot geom_sina() From package ggforce
ridgeline geom_density_ridges() From package ggridges

Examples: Boxplot

ggplot(lincoln_temps, aes(x = month, y = mean_temp)) +
  geom_boxplot(fill = "skyblue") 

 

Examples: Violins

ggplot(lincoln_temps, aes(x = month, y = mean_temp)) +
  geom_violin(fill = "skyblue") 

 

Examples: Strip chart (no jitter)

ggplot(lincoln_temps, aes(x = month, y = mean_temp)) +
  geom_point(size = 0.75)  # reduce point size to minimize overplotting 

 

Examples: Strip chart (w/ jitter)

ggplot(lincoln_temps, aes(x = month, y = mean_temp)) +
  geom_point(size = 0.75,  # reduce point size to minimize overplotting 
    position = position_jitter(
      width = 0.15,  # amount of jitter in horizontal direction
      height = 0     # amount of jitter in vertical direction (0 = none)
    )
  )

 

Examples: Sina plot

library(ggforce) # for geom_sina()

ggplot(lincoln_temps, aes(x = month, y = mean_temp)) +
  geom_violin(fill = "skyblue", color = NA) + # violins in background
  geom_sina(size = 0.75) # sina jittered points in foreground

 

Examples: Ridgeline plot

library(ggridges) # for geom_density_ridges

ggplot(lincoln_temps, aes(x = mean_temp, y = month_long)) +
  geom_density_ridges()

 

Further reading