class: center, middle, title-slide # Visualizing uncertainty ### Claus O. Wilke ### last updated: 2021-09-23 --- class: center middle # Let's imagine we're playing a game --- class: center middle # The odds are in your favor:<br>You have a 90% chance of winning! --- class: center middle background-image: url("visualizing-uncertainty_files/Disappearing_dots.gif") background-size: contain background-color: #cccccc <style> .move-down { margin-top: -6em; } </style> .move-down[ # playing ] ??? Image by Wikiemdia user [Jahobr](https://commons.wikimedia.org/wiki/User:Jahobr), released into the public domain. https://commons.wikimedia.org/wiki/File:Disappearing_dots.gif --- class: center middle # Sorry, you lost. --- class: center middle # How does that make you feel? --- ## We are bad at judging uncertainty -- * You had a 10% chance of losing -- * One in ten playing this game will lose -- * 90% chance of winning is nowhere near a certain win --- ## It helps to visualize a set of possible outcomes .center[ ![](visualizing-uncertainty_files/figure-html/freq-waffle-1.svg)<!-- --> ] Possible outcomes from 100 individual games played --- <br> .center[ ![](visualizing-uncertainty_files/figure-html/freq-waffle2-1.svg)<!-- --> ] -- This type of visualization is called "frequency framing" [//]: # "segment ends here" --- class: center middle ## Visualizing the uncertainty of point estimates --- ## Visualizing the uncertainty of point estimates -- - A point estimate is a single number, such as a mean -- - Uncertainty is expressed as standard error, confidence interval, or credible interval -- - Important:<br>Don't confuse the uncertainty of a point estimate with the variation in the sample --- ## Key concepts of statistical sampling .center[ ![](visualizing-uncertainty_files/figure-html/sampling-schematic1-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Key concepts of statistical sampling .center[ ![](visualizing-uncertainty_files/figure-html/sampling-schematic2-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Key concepts of statistical sampling .center[ ![](visualizing-uncertainty_files/figure-html/sampling-schematic3-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Frequency interpretation of a confidence interval .center[ <img src = "https://clauswilke.com/dataviz/visualizing_uncertainty_files/figure-html/ci-frequentist-expl-1.png" width = "500" /> ] ??? Figure from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Example: Highest point reached on Everest in 2019 .center[ ![](visualizing-uncertainty_files/figure-html/everest-highest-point-1.svg)<!-- --> ] Includes only climbers and expedition members who **did not** summit --- ## Marginal effects example: Height reached on Everest -- Average height reached relative to:<br> a male climber who climbed with oxygen, summited, and survived -- .center[ ![](visualizing-uncertainty_files/figure-html/everest_margins-1.svg)<!-- --> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: half-eye .center[ ![](visualizing-uncertainty_files/figure-html/everest_margins2-1.svg)<!-- --> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: gradient interval .center[ ![](visualizing-uncertainty_files/figure-html/everest_margins3-1.svg)<!-- --> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: quantile dotplot .center[ ![](visualizing-uncertainty_files/figure-html/everest_margins4-1.svg)<!-- --> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: quantile dotplot .center[ ![](visualizing-uncertainty_files/figure-html/everest_margins5-1.svg)<!-- --> ] --- ## Marginal effects example: Height reached on Everest Other visualization options: quantile dotplot .center[ ![](visualizing-uncertainty_files/figure-html/everest_margins6-1.svg)<!-- --> ] [//]: # "segment ends here" --- class: center middle ## Uncertainty visualizations in R --- ## Making a plot with error bars -- .small-font[ Example: Relationship between life expectancy and GDP per capita ] .center[ ![](visualizing-uncertainty_files/figure-html/gapminder-regressions-1.svg)<!-- --> ] --- ## Making a plot with error bars .small-font[ Example: Relationship between life expectancy and GDP per capita ] .pull-left[ ![](visualizing-uncertainty_files/figure-html/gapminder-regressions2-1.svg)<!-- --> ] .pull-right[ <br> ![](visualizing-uncertainty_files/figure-html/gapminder-summary-1.svg)<!-- --> ] --- ## Making a plot with error bars .tiny-font[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) lm_data ``` ``` # A tibble: 60 × 3 continent year data <fct> <int> <list> 1 Asia 1952 <tibble [33 × 4]> 2 Asia 1957 <tibble [33 × 4]> 3 Asia 1962 <tibble [33 × 4]> 4 Asia 1967 <tibble [33 × 4]> 5 Asia 1972 <tibble [33 × 4]> 6 Asia 1977 <tibble [33 × 4]> 7 Asia 1982 <tibble [33 × 4]> 8 Asia 1987 <tibble [33 × 4]> 9 Asia 1992 <tibble [33 × 4]> 10 Asia 1997 <tibble [33 × 4]> # … with 50 more rows ``` ] --- ## Making a plot with error bars .tiny-font[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)) ) lm_data ``` ``` # A tibble: 60 × 4 continent year data fit <fct> <int> <list> <list> 1 Asia 1952 <tibble [33 × 4]> <lm> 2 Asia 1957 <tibble [33 × 4]> <lm> 3 Asia 1962 <tibble [33 × 4]> <lm> 4 Asia 1967 <tibble [33 × 4]> <lm> 5 Asia 1972 <tibble [33 × 4]> <lm> 6 Asia 1977 <tibble [33 × 4]> <lm> 7 Asia 1982 <tibble [33 × 4]> <lm> 8 Asia 1987 <tibble [33 × 4]> <lm> 9 Asia 1992 <tibble [33 × 4]> <lm> 10 Asia 1997 <tibble [33 × 4]> <lm> # … with 50 more rows ``` ] --- ## Making a plot with error bars .tiny-font[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) lm_data ``` ``` # A tibble: 60 × 5 continent year data fit tidy_out <fct> <int> <list> <list> <list> 1 Asia 1952 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 2 Asia 1957 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 3 Asia 1962 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 4 Asia 1967 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 5 Asia 1972 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 6 Asia 1977 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 7 Asia 1982 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 8 Asia 1987 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 9 Asia 1992 <tibble [33 × 4]> <lm> <tibble [2 × 5]> 10 Asia 1997 <tibble [33 × 4]> <lm> <tibble [2 × 5]> # … with 50 more rows ``` ] --- ## Making a plot with error bars .tiny-font[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) %>% unnest(cols = tidy_out) lm_data ``` ``` # A tibble: 120 × 9 continent year data fit term estimate std.error statistic p.value <fct> <int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl> 1 Asia 1952 <tibble … <lm> (Inter… 15.8 9.27 1.71 9.78e-2 2 Asia 1952 <tibble … <lm> log(gd… 4.16 1.25 3.33 2.28e-3 3 Asia 1957 <tibble … <lm> (Inter… 18.1 9.70 1.86 7.20e-2 4 Asia 1957 <tibble … <lm> log(gd… 4.17 1.28 3.26 2.71e-3 5 Asia 1962 <tibble … <lm> (Inter… 16.6 9.52 1.74 9.11e-2 6 Asia 1962 <tibble … <lm> log(gd… 4.59 1.24 3.72 7.94e-4 7 Asia 1967 <tibble … <lm> (Inter… 19.8 9.05 2.19 3.64e-2 8 Asia 1967 <tibble … <lm> log(gd… 4.50 1.15 3.90 4.77e-4 9 Asia 1972 <tibble … <lm> (Inter… 21.9 8.14 2.69 1.13e-2 10 Asia 1972 <tibble … <lm> log(gd… 4.44 1.01 4.41 1.16e-4 # … with 110 more rows ``` ] --- ## Making a plot with error bars .tiny-font[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) %>% unnest(cols = tidy_out) %>% select(-fit, -data) lm_data ``` ``` # A tibble: 120 × 7 continent year term estimate std.error statistic p.value <fct> <int> <chr> <dbl> <dbl> <dbl> <dbl> 1 Asia 1952 (Intercept) 15.8 9.27 1.71 0.0978 2 Asia 1952 log(gdpPercap) 4.16 1.25 3.33 0.00228 3 Asia 1957 (Intercept) 18.1 9.70 1.86 0.0720 4 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271 5 Asia 1962 (Intercept) 16.6 9.52 1.74 0.0911 6 Asia 1962 log(gdpPercap) 4.59 1.24 3.72 0.000794 7 Asia 1967 (Intercept) 19.8 9.05 2.19 0.0364 8 Asia 1967 log(gdpPercap) 4.50 1.15 3.90 0.000477 9 Asia 1972 (Intercept) 21.9 8.14 2.69 0.0113 10 Asia 1972 log(gdpPercap) 4.44 1.01 4.41 0.000116 # … with 110 more rows ``` ] --- ## Making a plot with error bars .tiny-font[ ```r lm_data <- gapminder %>% nest(data = -c(continent, year)) %>% mutate( fit = map(data, ~lm(lifeExp ~ log(gdpPercap), data = .x)), tidy_out = map(fit, tidy) ) %>% unnest(cols = tidy_out) %>% select(-fit, -data) %>% filter(term != "(Intercept)", continent != "Oceania") lm_data ``` ``` # A tibble: 48 × 7 continent year term estimate std.error statistic p.value <fct> <int> <chr> <dbl> <dbl> <dbl> <dbl> 1 Asia 1952 log(gdpPercap) 4.16 1.25 3.33 0.00228 2 Asia 1957 log(gdpPercap) 4.17 1.28 3.26 0.00271 3 Asia 1962 log(gdpPercap) 4.59 1.24 3.72 0.000794 4 Asia 1967 log(gdpPercap) 4.50 1.15 3.90 0.000477 5 Asia 1972 log(gdpPercap) 4.44 1.01 4.41 0.000116 6 Asia 1977 log(gdpPercap) 4.87 1.03 4.75 0.0000442 7 Asia 1982 log(gdpPercap) 4.78 0.852 5.61 0.00000377 8 Asia 1987 log(gdpPercap) 5.17 0.727 7.12 0.0000000531 9 Asia 1992 log(gdpPercap) 5.09 0.649 7.84 0.00000000760 10 Asia 1997 log(gdpPercap) 5.11 0.628 8.15 0.00000000335 # … with 38 more rows ``` ] --- ## Making a plot with error bars .tiny-font.pull-left[ ```r ggplot(lm_data) + aes( x = year, y = estimate, ymin = estimate - 1.96*std.error, ymax = estimate + 1.96*std.error, color = continent ) + geom_pointrange( position = position_dodge(width = 1) ) + scale_x_continuous( breaks = unique(gapminder$year) ) + theme(legend.position = "top") ``` ] .pull-right[ ![](visualizing-uncertainty_files/figure-html/gapminder-model-out-1.svg)<!-- --> ] ??? Figure and code idea from [Kieran Healy. Data Visualization: A practical introduction. Princeton University Press, 2019.](https://socviz.co/) --- ## Half-eyes, gradient intervals, etc -- The **ggdist** package provides many different visualizations of uncertainty -- .tiny-font.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_halfeye( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4 ) ``` ] .pull-right[ ![](visualizing-uncertainty_files/figure-html/gapminder-halfeye-out-1.svg)<!-- --> ] --- ## Half-eyes, gradient intervals, etc The **ggdist** package provides many different visualizations of uncertainty .tiny-font.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_gradientinterval( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4, fill = "skyblue" ) ``` ] .pull-right[ ![](visualizing-uncertainty_files/figure-html/gapminder-gradinterval-out-1.svg)<!-- --> ] --- ## Half-eyes, gradient intervals, etc The **ggdist** package provides many different visualizations of uncertainty .tiny-font.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_dotsinterval( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4, fill = "skyblue", quantiles = 20 ) ``` ] .pull-right[ ![](visualizing-uncertainty_files/figure-html/gapminder-quantiledots-out-1.svg)<!-- --> ] --- ## Half-eyes, gradient intervals, etc The **ggdist** package provides many different visualizations of uncertainty .tiny-font.pull-left[ ```r library(ggdist) library(distributional) # for dist_normal() lm_data %>% filter(year == 1952) %>% mutate( continent = fct_reorder(continent, estimate) ) %>% ggplot(aes(x = estimate, y = continent)) + stat_dist_dotsinterval( aes(dist = dist_normal( mu = estimate, sigma = std.error )), point_size = 4, fill = "skyblue", quantiles = 10 ) ``` ] .pull-right[ ![](visualizing-uncertainty_files/figure-html/gapminder-quantiledots2-out-1.svg)<!-- --> ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 16: Visualizing uncertainty](https://clauswilke.com/dataviz/visualizing-uncertainty.html) - Data Visualization—A Practical Introduction: [Chapter 6.6: Grouped analysis and list columns](https://socviz.co/modeling.html#grouped-analysis-and-list-columns) - Data Visualization—A Practical Introduction: [Chapter 6.7: Plot marginal effects](https://socviz.co/modeling.html#plot-marginal-effects) - **ggdist** reference documentation: https://mjskay.github.io/ggdist/index.html - **ggdist** vignette: [Frequentist uncertainty visualization](https://mjskay.github.io/ggdist/articles/freq-uncertainty-vis.html)