class: center, middle, title-slide # Visualizing amounts ### Claus O. Wilke ### last updated: 2021-09-23 --- ## We often encounter datasets containing simple amounts --- ## We often encounter datasets containing simple amounts Example: Highest grossing movies Dec. 2017 <br> .center[ <table> <thead> <tr> <th style="text-align:right;"> rank </th> <th style="text-align:left;"> title </th> <th style="text-align:right;"> amount </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Star Wars </td> <td style="text-align:right;"> 71.57 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Jumanji </td> <td style="text-align:right;"> 36.17 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Pitch Perfect 3 </td> <td style="text-align:right;"> 19.93 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Greatest Showman </td> <td style="text-align:right;"> 8.81 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Ferdinand </td> <td style="text-align:right;"> 7.32 </td> </tr> </tbody> </table> ] .tiny-font.absolute-bottom-right[ Data source: Box Office Mojo ] --- ## We can visualize amounts with bar plots <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/04422/04422b4b082f7978fe148ea18ab9d0dbb2eb8445" alt=""<!-- --> ] --- ## Bars can also run horizontally <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/c93d1/c93d162975baef19a05a46fe82c2c1314876656f" alt=""<!-- --> ] --- ## Avoid rotated axis labels .center[ data:image/s3,"s3://crabby-images/a2e42/a2e42771a31ddb1bb3453ead7260dd34b4cf19e0" alt=""<!-- --> ] --- ## Avoid rotated axis labels <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/c93d1/c93d162975baef19a05a46fe82c2c1314876656f" alt="" ] --- ## Pay attention to the order of the bars <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/59628/59628538ac2f2a64ef4e051bcd1cc396d552174a" alt=""<!-- --> ] --- ## Pay attention to the order of the bars <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/c93d1/c93d162975baef19a05a46fe82c2c1314876656f" alt="" ] --- ## We can use dots instead of bars <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/876d3/876d33604069fd9c6668b6480488d52cb2d06349" alt=""<!-- --> ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ data:image/s3,"s3://crabby-images/839d4/839d4e94db70988d1fe176a4dc9485d1f012beae" alt=""<!-- --> ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ data:image/s3,"s3://crabby-images/59cb6/59cb65de5ceaddd99cd534d9de2fb519cbc2ac99" alt=""<!-- --> ] .absolute-bottom-right[ bar lengths do<br>not accurately<br>represent the<br>data values ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ data:image/s3,"s3://crabby-images/d3fbc/d3fbcffb3f62de7ee0590f2de8f0b8c9f2df19fa" alt=""<!-- --> ] .absolute-bottom-right[ key features<br>of the data<br>are obscured ] --- ## Dots are preferable if we want to truncate the axes .center.move-up-1em[ data:image/s3,"s3://crabby-images/839d4/839d4e94db70988d1fe176a4dc9485d1f012beae" alt="" ] [//]: # "segment ends here" --- class: center middle ## Grouped bars --- ## We use grouped bars for higher-dimensional datasets -- <br> .center.move-up-1em[ data:image/s3,"s3://crabby-images/f643c/f643c8ef4052fdc342bee7e1b645050ca3e642aa" alt=""<!-- --> ] .absolute-bottom-right[ Data source: United States Census Bureau, 2016 ] --- ## We are free to choose by which variable to group <br> .center[ data:image/s3,"s3://crabby-images/5f07c/5f07c1a58fd93efbb0f78d11ae62e767eb517bc2" alt=""<!-- --> ] .absolute-bottom-right[ Data source: United States Census Bureau, 2016 ] --- ## We can also use multiple plot panels (facets) .center[ data:image/s3,"s3://crabby-images/b3029/b302959edc3131703a507e11e3c8253d290fb073" alt=""<!-- --> ] .absolute-bottom-right[ Data source: United States Census Bureau, 2016 ] [//]: # "segment ends here" --- class: center middle ## Making bar plots in **ggplot2** --- ## Dataset: Highest grossing movies Dec. 2017 .tiny-font[ ```r # Data from Box Office Mojo for Dec. 22-24, 2017. boxoffice <- tibble( rank = 1:5, title = c("Star Wars", "Jumanji", "Pitch Perfect 3", "Greatest Showman", "Ferdinand"), amount = c(71.57, 36.17, 19.93, 8.81, 7.32) # million USD ) boxoffice ``` ``` # A tibble: 5 × 3 rank title amount <int> <chr> <dbl> 1 1 Star Wars 71.6 2 2 Jumanji 36.2 3 3 Pitch Perfect 3 19.9 4 4 Greatest Showman 8.81 5 5 Ferdinand 7.32 ``` ] --- ## Visualize as a bar plot .small-font[ ```r ggplot(boxoffice, aes(title, amount)) + geom_col() # "col" stands for column ``` ] -- .center[ data:image/s3,"s3://crabby-images/a6793/a67931eebcbe517be8c071e2dbaa0fe2c71cc5cb" alt=""<!-- --> ] --- ## Order by data value .small-font[ ```r ggplot(boxoffice, aes(fct_reorder(title, amount), amount)) + geom_col() ``` ] .center[ data:image/s3,"s3://crabby-images/53d00/53d00c82980750c2e5daa20a12f109228d778d8d" alt=""<!-- --> ] --- ## Order by data value, descending .small-font[ ```r ggplot(boxoffice, aes(fct_reorder(title, -amount), amount)) + geom_col() + xlab(NULL) # remove x axis label ``` ] .center[ data:image/s3,"s3://crabby-images/5f3c3/5f3c38ec2a398c4675bebe9117eb3d57565edfaf" alt=""<!-- --> ] --- ## Flip x and y, set custom x axis label .small-font[ ```r ggplot(boxoffice, aes(amount, fct_reorder(title, amount))) + geom_col() + xlab("amount (in million USD)") + ylab(NULL) ``` ] .center[ data:image/s3,"s3://crabby-images/76883/76883d738185ab78bec9e2121b12caacc7e37005" alt=""<!-- --> ] --- class: center middle ## Sometimes we need to count before visualization --- background-image: url(https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/man/figures/logo.png) background-position: 95% 5% background-size: 8% ## Goal: Visualize number of penguins per species .small-font[ ```r library(palmerpenguins) head(penguins) ``` ``` # A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex <fct> <fct> <dbl> <dbl> <int> <int> <fct> 1 Adelie Torge… 39.1 18.7 181 3750 male 2 Adelie Torge… 39.5 17.4 186 3800 fema… 3 Adelie Torge… 40.3 18 195 3250 fema… 4 Adelie Torge… NA NA NA NA <NA> 5 Adelie Torge… 36.7 19.3 193 3450 fema… 6 Adelie Torge… 39.3 20.6 190 3650 male # … with 1 more variable: year <int> ``` ] ??? Image credit: [Artwork by @allison_horst](https://github.com/allisonhorst/palmerpenguins/#artwork) --- ## Use `geom_bar()` to count before plotting .small-font[ ```r ggplot(penguins, aes(y = species)) + # note: no x aesthetic defined geom_bar() ``` ] .center[ data:image/s3,"s3://crabby-images/422ad/422adc5e64afee114ac572e3847bfb95da4a7dd7" alt=""<!-- --> ] --- ## Getting the bars into the right order --- ## Getting the bars into the right order Option 1: Manually, using `fct_relevel()` .tiny-font[ ```r ggplot(penguins, aes(y = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie"))) + geom_bar() + ylab(NULL) ``` ] .center[ data:image/s3,"s3://crabby-images/765ca/765caacaa12b847f74cf452b8d7aa2139fffbd57" alt=""<!-- --> ] --- ## Getting the bars into the right order Option 2: Using `fct_rev()` and `fct_infreq()` from the **forcats** package .tiny-font[ ```r ggplot(penguins, aes(y = fct_rev(fct_infreq(species)))) + geom_bar() + ylab(NULL) ``` ] .center[ data:image/s3,"s3://crabby-images/b8a16/b8a16ec130a29c4dab1c1e9ec03d5d325496e05f" alt=""<!-- --> ] --- ## Display counts by species and sex .small-font[ ```r ggplot(penguins, aes(sex, fill = species)) + geom_bar() ``` ] .center[ data:image/s3,"s3://crabby-images/1b4bd/1b4bd5f15f5f8249067f59cb670716729d6fe97a" alt=""<!-- --> ] --- ## Remove missing values (`NA`s) .tiny-font[ ```r penguins_nomissing <- na.omit(penguins) # remove all rows with any missing values ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar() ``` ] .center[ data:image/s3,"s3://crabby-images/06272/062723107daf03d88d72cdefc5c530eb6ae66cf7" alt=""<!-- --> ] --- ## Positions define how subgroups are shown `position = "dodge"`: Place bars for subgroups side-by-side .small-font[ ```r ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar(position = "dodge") ``` ] .center[ data:image/s3,"s3://crabby-images/039d5/039d515b222a9ee59fb41ecd93df9fa593a08d3a" alt=""<!-- --> ] --- ## Positions define how subgroups are shown `position = "stack"`: Place bars for subgroups on top of each other .small-font[ ```r ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar(position = "stack") ``` ] .center[ data:image/s3,"s3://crabby-images/da317/da3174008a47dfe22c451f22a14d191dd9701824" alt=""<!-- --> ] --- ## Positions define how subgroups are shown `position = "fill"`: Like `"stack"`, but scale to 100% .small-font[ ```r ggplot(penguins_nomissing, aes(sex, fill = species)) + geom_bar(position = "fill") ``` ] .center[ data:image/s3,"s3://crabby-images/58725/5872547f2dacdf020d0e814c1c1feed4a995fff1" alt=""<!-- --> ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 6: Visualizing amounts](https://clauswilke.com/dataviz/visualizing-amounts.html) - Data Visualization—A Practical Introduction: [Chapter 4.4: Geoms can transform data](https://socviz.co/groupfacettx.html#statfunctions) - **ggplot2** reference documentation: [`geom_bar()`, `geom_col()`](https://ggplot2.tidyverse.org/reference/geom_bar.html) - **ggplot2** reference documentation: [`position_stack()`, `position_fill()`](https://ggplot2.tidyverse.org/reference/position_stack.html)