2025-03-19
Milk butterfat contents by cattle breed. Source: Canadian Record of Performance for Purebred Dairy Cattle
Income versus age for 67 counties in Pennsylvania. Source: 2015 Five-Year American Community Survey
Head length versus body mass for male blue jays. Data source: Keith Tarvin, Oberlin College
In particular, error bars can represent many different quantities
Chocolate bar ratings. Source: Manhattan Chocolate Society
Lace Padilla, Matthew Kay, and Jessica Hullman (2022): Uncertainty Visualization
Chocolate bars from four countries compared to bars from the US
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
Determinstic Construal Error:
Error bars are interpreted as min/max values
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
Categorical thinking:
Areas outside and inside the error bars are categorically different
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
You can remove caps to make the boundary visually less severe
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
You can show multiple confidence levels to de-emphasize existence of boundary
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
You can use faded strips (but hard to read/interpret)
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
You can show actual distributions
Popular in Bayesian inference, but still difficult to interpret
Relative rankings compared to US chocolate bars. Source: Manhattan Chocolate Society
If I can buy either a Canadian or a US bar, what is the probability that the Canadian bar will be better?
Answer: The Canadian bar has a 53% chance of being better
How can we communicate this?
Hypothetical Outcome Plots use animation to let viewers experience uncertainty
Hypothetical Outcome Plots use animation to let viewers experience uncertainty
What does the confidence band show?
Head length versus body mass for male blue jays. Data source: Keith Tarvin, Oberlin College
Both the intercept and the slope have uncertainty
Head length versus body mass for male blue jays. Data source: Keith Tarvin, Oberlin College
Individual sample fits tend to be more wiggly than the mean
Fuel efficiency versus displacement, for 32 cars (1973–74 models). Source: Motor Trend, 1974
Fuel efficiency versus displacement, for 32 cars (1973–74 models). Source: Motor Trend, 1974
Example: Relationship between life expectancy and GDP per capita
Example: Relationship between life expectancy and GDP per capita
# A tibble: 60 × 3
continent year data
<fct> <int> <list>
1 Asia 1952 <tibble [33 × 4]>
2 Asia 1957 <tibble [33 × 4]>
3 Asia 1962 <tibble [33 × 4]>
4 Asia 1967 <tibble [33 × 4]>
5 Asia 1972 <tibble [33 × 4]>
6 Asia 1977 <tibble [33 × 4]>
7 Asia 1982 <tibble [33 × 4]>
8 Asia 1987 <tibble [33 × 4]>
9 Asia 1992 <tibble [33 × 4]>
10 Asia 1997 <tibble [33 × 4]>
# ℹ 50 more rows
lm_data <- gapminder |>
nest(data = -c(continent, year)) |>
mutate(
fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x))
)
lm_data
# A tibble: 60 × 4
continent year data fit
<fct> <int> <list> <list>
1 Asia 1952 <tibble [33 × 4]> <lm>
2 Asia 1957 <tibble [33 × 4]> <lm>
3 Asia 1962 <tibble [33 × 4]> <lm>
4 Asia 1967 <tibble [33 × 4]> <lm>
5 Asia 1972 <tibble [33 × 4]> <lm>
6 Asia 1977 <tibble [33 × 4]> <lm>
7 Asia 1982 <tibble [33 × 4]> <lm>
8 Asia 1987 <tibble [33 × 4]> <lm>
9 Asia 1992 <tibble [33 × 4]> <lm>
10 Asia 1997 <tibble [33 × 4]> <lm>
# ℹ 50 more rows
lm_data <- gapminder |>
nest(data = -c(continent, year)) |>
mutate(
fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
tidy_out = map(fit, tidy)
)
lm_data
# A tibble: 60 × 5
continent year data fit tidy_out
<fct> <int> <list> <list> <list>
1 Asia 1952 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
2 Asia 1957 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
3 Asia 1962 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
4 Asia 1967 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
5 Asia 1972 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
6 Asia 1977 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
7 Asia 1982 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
8 Asia 1987 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
9 Asia 1992 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
10 Asia 1997 <tibble [33 × 4]> <lm> <tibble [2 × 5]>
# ℹ 50 more rows
lm_data <- gapminder |>
nest(data = -c(continent, year)) |>
mutate(
fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
tidy_out = map(fit, tidy)
) |>
unnest(cols = tidy_out)
lm_data
# A tibble: 120 × 9
continent year data fit term estimate std.error statistic p.value
<fct> <int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 Asia 1952 <tibble> <lm> (Interc… 15.8 9.27 1.71 9.78e-2
2 Asia 1952 <tibble> <lm> log(gdp… 4.16 1.25 3.33 2.28e-3
3 Asia 1957 <tibble> <lm> (Interc… 18.1 9.70 1.86 7.20e-2
4 Asia 1957 <tibble> <lm> log(gdp… 4.17 1.28 3.26 2.71e-3
5 Asia 1962 <tibble> <lm> (Interc… 16.6 9.52 1.74 9.11e-2
6 Asia 1962 <tibble> <lm> log(gdp… 4.59 1.24 3.72 7.94e-4
7 Asia 1967 <tibble> <lm> (Interc… 19.8 9.05 2.19 3.64e-2
8 Asia 1967 <tibble> <lm> log(gdp… 4.50 1.15 3.90 4.77e-4
9 Asia 1972 <tibble> <lm> (Interc… 21.9 8.14 2.69 1.13e-2
10 Asia 1972 <tibble> <lm> log(gdp… 4.44 1.01 4.41 1.16e-4
# ℹ 110 more rows
lm_data <- gapminder |>
nest(data = -c(continent, year)) |>
mutate(
fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
tidy_out = map(fit, tidy)
) |>
unnest(cols = tidy_out) |>
select(-fit, -data)
lm_data
# A tibble: 120 × 7
continent year term estimate std.error statistic p.value
<fct> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 Asia 1952 (Intercept) 15.8 9.27 1.71 0.0978
2 Asia 1952 log(gdpPercap) 4.16 1.25 3.33 0.00228
3 Asia 1957 (Intercept) 18.1 9.70 1.86 0.0720
4 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271
5 Asia 1962 (Intercept) 16.6 9.52 1.74 0.0911
6 Asia 1962 log(gdpPercap) 4.59 1.24 3.72 0.000794
7 Asia 1967 (Intercept) 19.8 9.05 2.19 0.0364
8 Asia 1967 log(gdpPercap) 4.50 1.15 3.90 0.000477
9 Asia 1972 (Intercept) 21.9 8.14 2.69 0.0113
10 Asia 1972 log(gdpPercap) 4.44 1.01 4.41 0.000116
# ℹ 110 more rows
lm_data <- gapminder |>
nest(data = -c(continent, year)) |>
mutate(
fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)),
tidy_out = map(fit, tidy)
) |>
unnest(cols = tidy_out) |>
select(-fit, -data) |>
filter(term != "(Intercept)", continent != "Oceania")
lm_data
# A tibble: 48 × 7
continent year term estimate std.error statistic p.value
<fct> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 Asia 1952 log(gdpPercap) 4.16 1.25 3.33 0.00228
2 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271
3 Asia 1962 log(gdpPercap) 4.59 1.24 3.72 0.000794
4 Asia 1967 log(gdpPercap) 4.50 1.15 3.90 0.000477
5 Asia 1972 log(gdpPercap) 4.44 1.01 4.41 0.000116
6 Asia 1977 log(gdpPercap) 4.87 1.03 4.75 0.0000442
7 Asia 1982 log(gdpPercap) 4.78 0.852 5.61 0.00000377
8 Asia 1987 log(gdpPercap) 5.17 0.727 7.12 0.0000000531
9 Asia 1992 log(gdpPercap) 5.09 0.649 7.84 0.00000000760
10 Asia 1997 log(gdpPercap) 5.11 0.628 8.15 0.00000000335
# ℹ 38 more rows
The ggdist package provides many different visualizations of uncertainty
The ggdist package provides many different visualizations of uncertainty
library(ggdist)
library(distributional) # for dist_normal()
lm_data |>
filter(year == 1952) |>
mutate(
continent =
fct_reorder(continent, estimate)
) |>
ggplot(aes(x = estimate, y = continent)) +
stat_dist_gradientinterval(
aes(dist = dist_normal(
mu = estimate, sigma = std.error
)),
point_size = 4,
fill = "skyblue"
)
The ggdist package provides many different visualizations of uncertainty
library(ggdist)
library(distributional) # for dist_normal()
lm_data |>
filter(year == 1952) |>
mutate(
continent =
fct_reorder(continent, estimate)
) |>
ggplot(aes(x = estimate, y = continent)) +
stat_dist_dotsinterval(
aes(dist = dist_normal(
mu = estimate, sigma = std.error
)),
point_size = 4,
fill = "skyblue",
quantiles = 20
)
The ggdist package provides many different visualizations of uncertainty
library(ggdist)
library(distributional) # for dist_normal()
lm_data |>
filter(year == 1952) |>
mutate(
continent =
fct_reorder(continent, estimate)
) |>
ggplot(aes(x = estimate, y = continent)) +
stat_dist_dotsinterval(
aes(dist = dist_normal(
mu = estimate, sigma = std.error
)),
point_size = 4,
fill = "skyblue",
quantiles = 10
)