class: center, middle, title-slide # Getting things into the right order ### Claus O. Wilke ### last updated: 2021-09-23 --- ## Remember from "Visualizing amounts" .small-font[ We can use `fct_relevel()` to manually order the bars in a bar plot ] -- .tiny-font[ ```r ggplot(penguins, aes(y = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie"))) + geom_bar() ``` ] .center[ <!-- --> ] --- ## Somewhat cleaner: mutate first, then plot .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) %>% ggplot(aes(y = species)) + geom_bar() ``` ] .center[ <!-- --> ] --- ## We order things in ggplot with factors .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Chinstrap Gentoo Adelie ``` ] -- .small-font[ - The column `species` is a factor ] -- .small-font[ - A factor is a categorical variable with defined categories called levels ] -- .small-font[ - For factors, ggplot generally places visual elements in the order defined by the levels ] --- ## We order things in ggplot with factors .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Chinstrap Gentoo Adelie ``` ```r # the order of factor levels is independent of the order of values in the table penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) ``` ``` # A tibble: 344 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torgersen 39.1 18.7 181 3750 2 Adelie Torgersen 39.5 17.4 186 3800 3 Adelie Torgersen 40.3 18 195 3250 4 Adelie Torgersen NA NA NA NA 5 Adelie Torgersen 36.7 19.3 193 3450 6 Adelie Torgersen 39.3 20.6 190 3650 7 Adelie Torgersen 38.9 17.8 181 3625 8 Adelie Torgersen 39.2 19.6 195 4675 9 Adelie Torgersen 34.1 18.1 193 3475 10 Adelie Torgersen 42 20.2 190 4250 # … with 334 more rows, and 2 more variables: sex <fct>, year <int> ``` ] --- ## Manual ordering of factor levels: `fct_relevel()` .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species)) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Adelie Chinstrap Gentoo ``` ] .small-font[ Default: alphabetic order ] --- ## Manual ordering of factor levels: `fct_relevel()` .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Gentoo")) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Gentoo Adelie Chinstrap ``` ] .small-font[ Move `"Gentoo"` in front, rest alphabetic ] --- ## Manual ordering of factor levels: `fct_relevel()` .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Gentoo")) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Chinstrap Gentoo Adelie ``` ] .small-font[ Move `"Chinstrap"` in front, then `"Gentoo"`, rest alphabetic ] --- ## Manual ordering of factor levels: `fct_relevel()` .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Adelie", "Gentoo")) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Chinstrap Adelie Gentoo ``` ] .small-font[ Use order `"Chinstrap"`, `"Adelie"`, `"Gentoo"` ] --- ## Manual ordering of factor levels: `fct_relevel()` .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Gentoo", "Chinstrap", "Adelie")) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Gentoo Chinstrap Adelie ``` ] .small-font[ Use order `"Gentoo"`, `"Chinstrap"`, `"Adelie"` ] --- ## The order of the y axis is from bottom to top .tiny-font[ ```r penguins %>% mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) %>% ggplot(aes(y = species)) + geom_bar() ``` ] .center[ <!-- --> ] --- ## Reorder based on frequency: `fct_infreq()` .tiny-font[ ```r penguins %>% mutate(species = fct_infreq(species)) %>% slice(1:30) %>% # get first 30 rows pull(species) # pull out just the `species` column ``` ``` [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie [21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Levels: Adelie Gentoo Chinstrap ``` ] -- .small-font[ - Use the order defined by the number of penguins of different species ] -- .small-font[ - The order is descending, from most frequent to least frequent ] --- ## Reorder based on frequency: `fct_infreq()` .tiny-font[ ```r penguins %>% mutate(species = fct_infreq(species)) %>% ggplot(aes(y = species)) + geom_bar() ``` ] .center[ <!-- --> ] --- ## Reverse order: `fct_rev()` .tiny-font[ ```r penguins %>% mutate(species = fct_rev(fct_infreq(species))) %>% ggplot(aes(y = species)) + geom_bar() ``` ] .center[ <!-- --> ] --- ## Reorder based on numeric values: `fct_reorder()` .tiny-font[ ```r penguins %>% count(species) ``` ``` # A tibble: 3 × 2 species n <fct> <int> 1 Adelie 152 2 Chinstrap 68 3 Gentoo 124 ``` ] -- .tiny-font[ ```r penguins %>% count(species) %>% mutate(species = fct_reorder(species, n)) %>% pull(species) # pull out just the `species` column ``` ``` [1] Adelie Chinstrap Gentoo Levels: Chinstrap Gentoo Adelie ``` ] -- .small-font[ The order is ascending, from smallest to largest value ] --- ## Reorder based on numeric values: `fct_reorder()` .tiny-font[ ```r penguins %>% count(species) %>% mutate(species = fct_reorder(species, n)) %>% ggplot(aes(n, species)) + geom_col() ``` ] .center[ <!-- --> ] --- ## Compare to see the difference .xtiny-font.pull-left[ ```r penguins %>% count(species) %>% # summarize data mutate(species = fct_reorder(species, n)) ``` ``` # A tibble: 3 × 2 species n <fct> <int> 1 Adelie 152 2 Chinstrap 68 3 Gentoo 124 ``` ] -- .xtiny-font.pull-right[ ```r penguins %>% # modify the original dataset, no summary mutate(species = fct_infreq(species)) ``` ``` # A tibble: 344 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torgersen 39.1 18.7 181 3750 2 Adelie Torgersen 39.5 17.4 186 3800 3 Adelie Torgersen 40.3 18 195 3250 4 Adelie Torgersen NA NA NA NA 5 Adelie Torgersen 36.7 19.3 193 3450 6 Adelie Torgersen 39.3 20.6 190 3650 7 Adelie Torgersen 38.9 17.8 181 3625 8 Adelie Torgersen 39.2 19.6 195 4675 9 Adelie Torgersen 34.1 18.1 193 3475 10 Adelie Torgersen 42 20.2 190 4250 # … with 334 more rows, and 2 more variables: sex <fct>, year <int> ``` ] --- ## Compare to see the difference .xtiny-font.pull-left[ ```r penguins %>% count(species) %>% # summarize data mutate(species = fct_reorder(species, n)) %>% ggplot(aes(n, species)) + geom_col() ``` <!-- --> ] .xtiny-font.pull-right[ ```r penguins %>% # modify the original dataset, no summary mutate(species = fct_infreq(species)) %>% ggplot(aes(y = species)) + geom_bar() ``` <!-- --> ] --- ## Compare to see the difference .xtiny-font.pull-left[ ```r penguins %>% count(species) %>% # summarize data mutate(species = fct_reorder(species, n)) %>% ggplot(aes(n, species)) + geom_col() ``` <!-- --> ] .xtiny-font.pull-right[ ```r penguins %>% # modify the original dataset, no summary mutate(species = fct_infreq(species)) %>% ggplot(aes(y = fct_rev(species))) + geom_bar() ``` <!-- --> ] [//]: # "segment ends here" --- class: middle, center # Ordering other plot elements --- ## The gapminder dataset: Life expectancy data .tiny-font[ ```r library(gapminder) gapminder ``` ``` # A tibble: 1,704 × 6 country continent year lifeExp pop gdpPercap <fct> <fct> <int> <dbl> <int> <dbl> 1 Afghanistan Asia 1952 28.8 8425333 779. 2 Afghanistan Asia 1957 30.3 9240934 821. 3 Afghanistan Asia 1962 32.0 10267083 853. 4 Afghanistan Asia 1967 34.0 11537966 836. 5 Afghanistan Asia 1972 36.1 13079460 740. 6 Afghanistan Asia 1977 38.4 14880372 786. 7 Afghanistan Asia 1982 39.9 12881816 978. 8 Afghanistan Asia 1987 40.8 13867957 852. 9 Afghanistan Asia 1992 41.7 16317921 649. 10 Afghanistan Asia 1997 41.8 22227415 635. # … with 1,694 more rows ``` ] --- ## Life expectancy in the Americas in 2007 .tiny-font.pull-left[ ```r gapminder %>% filter( year == 2007, continent == "Americas" ) %>% ggplot(aes(lifeExp, country)) + geom_point() ``` ] .pull-right[ <!-- --> ] --- ## Life expectancy in the Americas in 2007 .pull-left[ .tiny-font[ ```r gapminder %>% filter( year == 2007, continent == "Americas" ) %>% ggplot(aes(lifeExp, country)) + geom_point() ``` ] .small-font[ Reminder: Default order is alphabetic, from bottom to top ]] .pull-right[ <!-- --> ] --- ## Life expectancy, ordered from highest to lowest .pull-left[.tiny-font[ ```r gapminder %>% filter( year == 2007, continent == "Americas" ) %>% mutate( country = fct_reorder(country, lifeExp) ) %>% ggplot(aes(lifeExp, country)) + geom_point() ``` ] .small-font[ Order is ascending from bottom to top ]] .pull-right[ <!-- --> ] --- ## We can also order facets .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] -- .small-font[ - Default ordering is alphabetic; there's no good reason for this ordering ] --- ## We can also order facets .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] .small-font[ - Let's apply `fct_reorder()` and see what happens ] --- ## We can also order facets .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% mutate(country = fct_reorder(country, lifeExp)) %>% # default: order by median ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] -- .small-font[ - When the levels of a factor occur more than once, `fct_reorder()` applies a summary function ] --- ## We can also order facets .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% mutate(country = fct_reorder(country, lifeExp)) %>% # default: order by median ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] .small-font[ - The default summary function is `median()` ] --- ## We can also order facets .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% mutate(country = fct_reorder(country, lifeExp, median)) %>% # order by median ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] .small-font[ - We can also set the summary function explicitly ] --- ## Alternative orderings: By smallest value per facet .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% mutate(country = fct_reorder(country, lifeExp, min)) %>% # order by minimum ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] --- ## Alternative orderings: By largest value per facet .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% mutate(country = fct_reorder(country, lifeExp, max)) %>% # order by maximum ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] --- ## Alternative orderings: By smallest difference .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% # order by custom function: here, difference between max and min mutate(country = fct_reorder(country, lifeExp, function(x) { max(x) - min(x) })) %>% ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] --- ## Alternative orderings: By largest difference .tiny-font[ ```r gapminder %>% filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) %>% # order by custom function: here, difference between min and max mutate(country = fct_reorder(country, lifeExp, function(x) { min(x) - max(x) })) %>% ggplot(aes(year, lifeExp)) + geom_line() + facet_wrap(vars(country), nrow = 1) ``` ] .center[ <!-- --> ] --- ## Final example: Lumping factor levels together -- .small-font[ Dataset: Flights out of New York City in 2013 ] .tiny-font[ ```r library(nycflights13) flight_data <- flights %>% # take data on individual flights left_join(airlines) %>% # add in full-length airline names select(name, carrier, flight, year, month, day, origin, dest) # pick columns of interest ``` ``` Joining, by = "carrier" ``` ] -- .tiny-font[ ```r flight_data ``` ``` # A tibble: 336,776 × 8 name carrier flight year month day origin dest <chr> <chr> <int> <int> <int> <int> <chr> <chr> 1 United Air Lines Inc. UA 1545 2013 1 1 EWR IAH 2 United Air Lines Inc. UA 1714 2013 1 1 LGA IAH 3 American Airlines Inc. AA 1141 2013 1 1 JFK MIA 4 JetBlue Airways B6 725 2013 1 1 JFK BQN 5 Delta Air Lines Inc. DL 461 2013 1 1 LGA ATL 6 United Air Lines Inc. UA 1696 2013 1 1 EWR ORD 7 JetBlue Airways B6 507 2013 1 1 EWR FLL 8 ExpressJet Airlines Inc. EV 5708 2013 1 1 LGA IAD 9 JetBlue Airways B6 79 2013 1 1 JFK MCO 10 American Airlines Inc. AA 301 2013 1 1 LGA ORD # … with 336,766 more rows ``` ] --- ## Flights out of New York City in 2013 .pull-left.tiny-font[ ```r flight_data %>% ggplot(aes(y = name)) + geom_bar() ``` ] .pull-right[ <!-- --> ] -- .small-font[ As (almost) always, the default alphabetic ordering is terrible ] --- ## Flights out of New York City in 2013 .pull-left.tiny-font[ ```r flight_data %>% mutate( name = fct_infreq(name) ) %>% ggplot(aes(y = fct_rev(name))) + geom_bar() ``` ] .pull-right[ <!-- --> ] -- .small-font[ Ordering by frequency is better, but do we need to show all airlines? ] --- ## Flights out of New York City in 2013, with lumping .pull-left.tiny-font[ ```r flight_data %>% mutate( # keep only the 7 most common airlines name = fct_lump_n(name, 7) ) %>% ggplot(aes(y = fct_rev(name))) + geom_bar() ``` ] .pull-right[ <!-- --> ] -- .small-font[ Now the ordering is again alphabetic... ] --- ## Flights out of New York City in 2013, with lumping .pull-left.tiny-font[ ```r flight_data %>% mutate( # order after lumping name = fct_infreq(fct_lump_n(name, 7)) ) %>% ggplot(aes(y = fct_rev(name))) + geom_bar() ``` ] .pull-right[ <!-- --> ] --- ## Flights out of New York City in 2013, with lumping .pull-left.tiny-font[ ```r flight_data %>% mutate( # order before lumping name = fct_lump_n(fct_infreq(name), 7) ) %>% ggplot(aes(y = fct_rev(name))) + geom_bar() ``` ] .pull-right[ <!-- --> ] -- .small-font[ In most cases, you will want to order before lumping ] --- ## Flights out of New York City in 2013, with lumping .pull-left.tiny-font[ ```r flight_data %>% mutate( # order before lumping name = fct_lump_n(fct_infreq(name), 7) ) %>% ggplot(aes(y = fct_rev(name))) + geom_bar() ``` ] .pull-right[ <!-- --> ] .small-font[ Can we visually separate the "Other" category? ] --- ## Flights out of New York City in 2013, with lumping .pull-left.tiny-font[ ```r flight_data %>% mutate( name = fct_lump_n(fct_infreq(name), 7), # Use `fct_other()` to manually lump all # levels not called "Other" into "Named" highlight = fct_other( name, keep = "Other", other_level = "Named" ) ) %>% ggplot() + aes( y = fct_rev(name), fill = highlight ) + geom_bar() ``` ] .pull-right[ <!-- --> ] -- .small-font[ One annoying issue: The legend is in the wrong order ] --- ## Flights out of New York City in 2013, with lumping .pull-left.tiny-font[ ```r flight_data %>% mutate( name = fct_lump_n(fct_infreq(name), 7), # Use `fct_other()` to manually lump all # levels not called "Other" into "Named" highlight = fct_other( name, keep = "Other", other_level = "Named" ) ) %>% ggplot() + aes( y = fct_rev(name), # reverse fill aesthetic fill = fct_rev(highlight) ) + geom_bar() ``` ] .pull-right[ <!-- --> ] --- ## Flights out of New York City in 2013, final tweaks .pull-left.xtiny-font[ ```r flight_data %>% mutate( name = fct_lump_n(fct_infreq(name), 7), highlight = fct_other( name, keep = "Other", other_level = "Named" ) ) %>% ggplot() + aes(y = fct_rev(name), fill = highlight) + geom_bar() + scale_x_continuous( name = "Number of flights", expand = expansion(mult = c(0, 0.07)) ) + scale_y_discrete(name = NULL) + scale_fill_manual( values = c( Named = "gray50", Other = "#98545F" ), guide = "none" ) + theme_minimal_vgrid() ``` ] .pull-right[ <!-- --> ] --- ## Summary of key factor manipulation functions .small-font.center[ Function | Use case | Documentation :----------- | :---------- | :----------: `fct_relevel()` | Change order of factor levels manually | [click here](https://forcats.tidyverse.org/reference/fct_relevel.html) `fct_infreq()` | Put levels in descending order of how frequently each level occurs in the data | [click here](https://forcats.tidyverse.org/reference/fct_inorder.html) `fct_rev()` | Reverse the order of factor levels | [click here](https://forcats.tidyverse.org/reference/fct_rev.html) `fct_reorder()` | Put levels in ascending order determined by a numeric variable or summary function | [click here](https://forcats.tidyverse.org/reference/fct_reorder.html) `fct_lump_n()` | Retain the *n* most frequent levels and lump all others into `"Other"` | [click here](https://forcats.tidyverse.org/reference/fct_lump.html) `fct_other()` | Manually group some factor levels into `"Other"` | [click here](https://forcats.tidyverse.org/reference/fct_other.html) ] For more options, check out the [reference documentation](https://forcats.tidyverse.org/reference/index.html) of the **forcats** package [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 6: Visualizing amounts](https://clauswilke.com/dataviz/visualizing-amounts.html) - **forcats** documentation: [Introduction to forcats](https://forcats.tidyverse.org/articles/forcats.html) - **forcats** reference documentation: [Change order of levels](https://forcats.tidyverse.org/reference/index.html#section-change-order-of-levels)