class: center, middle, title-slide .title[ # Visualizing amounts ] .author[ ### Claus O. Wilke ] .date[ ### last updated: 2024-01-29 ] --- ## We often encounter datasets containing simple amounts --- ## We often encounter datasets containing simple amounts Example: Highest grossing movies Dec. 2017 <br> .center[ <table> <thead> <tr> <th style="text-align:right;"> rank </th> <th style="text-align:left;"> title </th> <th style="text-align:right;"> amount </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Star Wars </td> <td style="text-align:right;"> 71.57 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Jumanji </td> <td style="text-align:right;"> 36.17 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Pitch Perfect 3 </td> <td style="text-align:right;"> 19.93 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Greatest Showman </td> <td style="text-align:right;"> 8.81 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Ferdinand </td> <td style="text-align:right;"> 7.32 </td> </tr> </tbody> </table> ] .tiny-font.absolute-bottom-right[ Data source: Box Office Mojo ] --- ## We can visualize amounts with bar plots <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/boxoffice-vertical-1.svg" width="90%" /> ] --- ## Bars can also run horizontally <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/boxoffice-horizontal-1.svg" width="70%" /> ] --- ## Avoid rotated axis labels .center[ <img src="visualizing-amounts_files/figure-html/boxoffice-rot-axis-tick-labels-1.svg" width="60%" /> ] --- ## Avoid rotated axis labels <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/boxoffice-horizontal-repeat-1.svg" width="70%" /> ] --- ## Pay attention to the order of the bars <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/boxoffice-horizontal-unordered-1.svg" width="70%" /> ] --- ## Pay attention to the order of the bars <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/boxoffice-horizontal-repeat2-1.svg" width="70%" /> ] --- ## We can use dots instead of bars <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/boxoffice-dotplot-1.svg" width="70%" /> ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/Americas-life-expect-1.svg" width="60%" /> ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/Americas-life-expect-bad1-1.svg" width="60%" /> ] .absolute-bottom-right[ bar lengths do<br>not accurately<br>represent the<br>data values ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/Americas-life-expect-bad2-1.svg" width="60%" /> ] .absolute-bottom-right[ key features<br>of the data<br>are obscured ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/Americas-life-expect-repeat-1.svg" width="60%" /> ] [//]: # "segment ends here" --- class: center middle ## Grouped bars --- ## We use grouped bars for higher-dimensional datasets -- <br> .center.move-up-1em[ <img src="visualizing-amounts_files/figure-html/income-by-age-race-dodged-1.svg" width="80%" /> ] .absolute-bottom-right[ Data source: United States Census Bureau, 2016 ] --- ## We are free to choose by which variable to group <br> .center[ <img src="visualizing-amounts_files/figure-html/income-by-race-age-dodged-1.svg" width="80%" /> ] .absolute-bottom-right[ Data source: United States Census Bureau, 2016 ] --- ## We can also use multiple plot panels (facets) .center[ <img src="visualizing-amounts_files/figure-html/income-by-age-race-faceted-1.svg" width="75%" /> ] .absolute-bottom-right[ Data source: United States Census Bureau, 2016 ] [//]: # "segment ends here" --- class: center middle ## Making bar plots in **ggplot2** --- ## Dataset: Highest grossing movies Dec. 2017 .tiny-font[ ```r # Data from Box Office Mojo for Dec. 22-24, 2017. boxoffice <- tibble( rank = 1:5, title = c("Star Wars", "Jumanji", "Pitch Perfect 3", "Greatest Showman", "Ferdinand"), amount = c(71.57, 36.17, 19.93, 8.81, 7.32) # million USD ) boxoffice ``` ``` # A tibble: 5 × 3 rank title amount <int> <chr> <dbl> 1 1 Star Wars 71.6 2 2 Jumanji 36.2 3 3 Pitch Perfect 3 19.9 4 4 Greatest Showman 8.81 5 5 Ferdinand 7.32 ``` ] --- ## Visualize as a bar plot .small-font[ ```r ggplot(boxoffice, aes(title, amount)) + geom_col() # "col" stands for column ``` ] -- .center[ <img src="visualizing-amounts_files/figure-html/boxoffice-naive-out-1.svg" width="75%" /> ] --- ## Order by data value .small-font[ ```r ggplot(boxoffice, aes(fct_reorder(title, amount), amount)) + geom_col() ``` ] .center[ <img src="visualizing-amounts_files/figure-html/boxoffice-ordered-out-1.svg" width="75%" /> ] --- ## Order by data value, descending .small-font[ ```r ggplot(boxoffice, aes(fct_reorder(title, -amount), amount)) + geom_col() + xlab(NULL) # remove x axis label ``` ] .center[ <img src="visualizing-amounts_files/figure-html/boxoffice-ordered2-out-1.svg" width="75%" /> ] --- ## Flip x and y, set custom x axis label .small-font[ ```r ggplot(boxoffice, aes(amount, fct_reorder(title, amount))) + geom_col() + xlab("amount (in million USD)") + ylab(NULL) ``` ] .center[ <img src="visualizing-amounts_files/figure-html/boxoffice-ordered3-out-1.svg" width="55%" /> ] --- class: center middle ## Sometimes we need to count before visualization --- background-image: url(https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/man/figures/logo.png) background-position: 95% 5% background-size: 8% ## Goal: Visualize number of penguins per species .small-font[ ```r library(palmerpenguins) head(penguins) ``` ``` # A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torgersen 39.1 18.7 181 3750 2 Adelie Torgersen 39.5 17.4 186 3800 3 Adelie Torgersen 40.3 18 195 3250 4 Adelie Torgersen NA NA NA NA 5 Adelie Torgersen 36.7 19.3 193 3450 6 Adelie Torgersen 39.3 20.6 190 3650 # ℹ 2 more variables: sex <fct>, year <int> ``` ] ??? Image credit: [Artwork by @allison_horst](https://github.com/allisonhorst/palmerpenguins/#artwork) --- ## Use `geom_bar()` to count before plotting .small-font[ ```r ggplot(penguins, aes(y = species)) + # note: no x aesthetic defined geom_bar() ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-bars-out-1.svg" width="55%" /> ] --- ## Getting the bars into the right order --- ## Getting the bars into the right order Option 1: Manually, using `fct_relevel()` .tiny-font[ ```r ggplot(penguins, aes(y = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie"))) + geom_bar() + ylab(NULL) ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-bars2-out-1.svg" width="55%" /> ] --- ## Getting the bars into the right order Option 2: Using `fct_rev()` and `fct_infreq()` from the **forcats** package .tiny-font[ ```r ggplot(penguins, aes(y = fct_rev(fct_infreq(species)))) + geom_bar() + ylab(NULL) ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-bars3-out-1.svg" width="55%" /> ] --- ## Display counts by species and sex .small-font[ ```r ggplot(penguins, aes(sex, fill = species)) + geom_bar() ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-sex-species-out-1.svg" width="55%" /> ] --- ## Remove missing values (`NA`s) .tiny-font[ ```r penguins_nomissing <- na.omit(penguins) # remove all rows with any missing values ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar() ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-sex-species2-out-1.svg" width="55%" /> ] --- ## Positions define how subgroups are shown `position = "dodge"`: Place bars for subgroups side-by-side .small-font[ ```r ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar(position = "dodge") ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-sex-species-dodge-out-1.svg" width="55%" /> ] --- ## Positions define how subgroups are shown `position = "stack"`: Place bars for subgroups on top of each other .small-font[ ```r ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar(position = "stack") ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-sex-species-stack-out-1.svg" width="55%" /> ] --- ## Positions define how subgroups are shown `position = "fill"`: Like `"stack"`, but scale to 100% .small-font[ ```r ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar(position = "fill") ``` ] .center[ <img src="visualizing-amounts_files/figure-html/penguins-sex-species-fill-out-1.svg" width="55%" /> ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 6: Visualizing amounts](https://clauswilke.com/dataviz/visualizing-amounts.html) - Data Visualization—A Practical Introduction: [Chapter 4.4: Geoms can transform data](https://socviz.co/groupfacettx.html#statfunctions) - **ggplot2** reference documentation: [`geom_bar()`, `geom_col()`](https://ggplot2.tidyverse.org/reference/geom_bar.html) - **ggplot2** reference documentation: [`position_stack()`, `position_fill()`](https://ggplot2.tidyverse.org/reference/position_stack.html)