Getting things into the right order

Claus O. Wilke

2025-02-17

Remember from “Visualizing amounts”

We can use fct_relevel() to manually order the bars in a bar plot

ggplot(penguins) +
  aes(y = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) +
  geom_bar() + ylab(NULL)

 

Somewhat cleaner: mutate first, then plot

penguins |>
  mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) |>
  ggplot(aes(y = species)) +
  geom_bar()

 

We order things in ggplot with factors

penguins |>
  mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) |>
  slice(1:30) |>    # get first 30 rows
  pull(species)     # pull out just the `species` column
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Chinstrap Gentoo Adelie
  • The column species is a factor
  • A factor is a categorical variable with defined categories called levels
  • For factors, ggplot generally places visual elements in the order defined by the levels

We order things in ggplot with factors

penguins |>
  mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) |>
  slice(1:30) |>    # get first 30 rows
  pull(species)     # pull out just the `species` column
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Chinstrap Gentoo Adelie

The order of factor levels is independent of the order of values in the table:

penguins |>
  mutate(species = fct_relevel(species, "Gentoo", "Adelie", "Chinstrap")) |>
  slice(1:30) |> pull(species)  
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Gentoo Adelie Chinstrap

Manual ordering of factor levels: fct_relevel()

penguins |>
  mutate(species = fct_relevel(species)) |>
  slice(1:30) |> pull(species)
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Adelie Chinstrap Gentoo

Default: alphabetic order

Manual ordering of factor levels: fct_relevel()

penguins |>
  mutate(species = fct_relevel(species, "Gentoo")) |>
  slice(1:30) |> pull(species)
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Gentoo Adelie Chinstrap

Move "Gentoo" in front, rest alphabetic

Manual ordering of factor levels: fct_relevel()

penguins |>
  mutate(species = fct_relevel(species, "Chinstrap", "Gentoo")) |>
  slice(1:30) |> pull(species)
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Chinstrap Gentoo Adelie

Move "Chinstrap" in front, then "Gentoo", rest alphabetic

Manual ordering of factor levels: fct_relevel()

penguins |>
  mutate(species = fct_relevel(species, "Chinstrap", "Adelie", "Gentoo")) |>
  slice(1:30) |> pull(species)
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Chinstrap Adelie Gentoo

Use order "Chinstrap", "Adelie", "Gentoo"

In ggplot, the order of the y axis is from bottom to top

penguins |>
  mutate(species = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie")) |>
  ggplot(aes(y = species)) +
  geom_bar()

 

Reorder based on frequency: fct_infreq()

penguins |>
  mutate(species = fct_infreq(species)) |>
  slice(1:30) |> pull(species)
 [1] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[11] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
[21] Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie Adelie
Levels: Adelie Gentoo Chinstrap
  • Use the order defined by the number of penguins of different species
  • The order is descending, from most frequent to least frequent

Reorder based on frequency: fct_infreq()

penguins |>
  mutate(species = fct_infreq(species)) |>
  ggplot(aes(y = species)) + 
  geom_bar()

 

Reverse order: fct_rev()

penguins |>
  mutate(species = fct_rev(fct_infreq(species))) |>
  ggplot(aes(y = species)) + 
  geom_bar()

 

Reorder based on numeric values: fct_reorder()

penguins |>
  count(species)
# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      152
2 Chinstrap    68
3 Gentoo      124
penguins |>
  count(species) |>
  mutate(species = fct_reorder(species, n)) |>
  pull(species)
[1] Adelie    Chinstrap Gentoo   
Levels: Chinstrap Gentoo Adelie

The order is ascending, from smallest to largest value

Reorder based on numeric values: fct_reorder()

penguins |>
  count(species) |>
  mutate(species = fct_reorder(species, n)) |>
  ggplot(aes(n, species)) + geom_col()

 

Compare to see the difference

penguins |>
  count(species) |> # summarize data
  mutate(species = fct_reorder(species, n))
# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      152
2 Chinstrap    68
3 Gentoo      124
penguins |>
  # modify the original dataset, no summary
  mutate(species = fct_infreq(species))
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Compare to see the difference

penguins |>
  count(species) |> # summarize data
  mutate(species = fct_reorder(species, n)) |>
  ggplot(aes(n, species)) + geom_col()

 
penguins |>
  # modify the original dataset, no summary
  mutate(species = fct_infreq(species)) |>
  ggplot(aes(y = species)) + geom_bar()

 

Compare to see the difference

penguins |>
  count(species) |> # summarize data
  mutate(species = fct_reorder(species, n)) |>
  ggplot(aes(n, species)) + geom_col()

 
penguins |>
  # modify the original dataset, no summary
  mutate(species = fct_infreq(species)) |>
  ggplot(aes(y = fct_rev(species))) + geom_bar()

 

Ordering other plot elements

The gapminder dataset: Life expectancy data

library(gapminder)

gapminder
# A tibble: 1,704 × 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ℹ 1,694 more rows

Life expectancy in the Americas in 2007

gapminder |>
  filter(
    year == 2007,
    continent == "Americas"
  ) |>
  ggplot(aes(lifeExp, country)) + 
  geom_point()

 

Reminder: Default order is alphabetic, from bottom to top

Life expectancy, ordered from highest to lowest

gapminder |>
  filter(
    year == 2007,
    continent == "Americas"
  ) |>
  mutate(
    country = fct_reorder(country, lifeExp)
  ) |>
  ggplot(aes(lifeExp, country)) + 
  geom_point()

 

Order is ascending from bottom to top

We can also order facets

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 
  • Default ordering is alphabetic; there’s no good reason for this ordering
  • Let’s apply fct_reorder() and see what happens

We can also order facets

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  mutate(country = fct_reorder(country, lifeExp)) |> # default: order by median
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 
  • When the levels of a factor occur more than once, fct_reorder() applies a summary function
  • The default summary function is median()

We can also order facets

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  mutate(country = fct_reorder(country, lifeExp, median)) |> # order by median
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 
  • When the levels of a factor occur more than once, fct_reorder() applies a summary function
  • The default summary function is median()

Alternative orderings: By smallest value per facet

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  mutate(country = fct_reorder(country, lifeExp, min)) |> # order by minimum
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 

Alternative orderings: By largest value per facet

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  mutate(country = fct_reorder(country, lifeExp, max)) |> # order by maximum
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 

Alternative orderings: By smallest difference

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  mutate(country = fct_reorder(country, lifeExp, \(x) { max(x) - min(x) })) |> 
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 

Alternative orderings: By largest difference

gapminder |>
  filter(country %in% c("Norway", "Portugal", "Spain", "Austria")) |>
  mutate(country = fct_reorder(country, lifeExp, \(x) { min(x) - max(x) })) |> 
  ggplot(aes(year, lifeExp)) + geom_line() +
  facet_wrap(vars(country), nrow = 1)

 

Final example: Lumping factor levels together

Dataset: Flights out of New York City in 2013

library(nycflights13)

flight_data <- flights |> # take data on individual flights
  left_join(airlines) |>  # add in full-length airline names
  select(name, carrier, flight, year, month, day, origin, dest) # pick columns of interest

flight_data
# A tibble: 336,776 × 8
   name                     carrier flight  year month   day origin dest 
   <chr>                    <chr>    <int> <int> <int> <int> <chr>  <chr>
 1 United Air Lines Inc.    UA        1545  2013     1     1 EWR    IAH  
 2 United Air Lines Inc.    UA        1714  2013     1     1 LGA    IAH  
 3 American Airlines Inc.   AA        1141  2013     1     1 JFK    MIA  
 4 JetBlue Airways          B6         725  2013     1     1 JFK    BQN  
 5 Delta Air Lines Inc.     DL         461  2013     1     1 LGA    ATL  
 6 United Air Lines Inc.    UA        1696  2013     1     1 EWR    ORD  
 7 JetBlue Airways          B6         507  2013     1     1 EWR    FLL  
 8 ExpressJet Airlines Inc. EV        5708  2013     1     1 LGA    IAD  
 9 JetBlue Airways          B6          79  2013     1     1 JFK    MCO  
10 American Airlines Inc.   AA         301  2013     1     1 LGA    ORD  
# ℹ 336,766 more rows

Flights out of New York City in 2013

flight_data |>
  ggplot(aes(y = name)) + 
  geom_bar()

 

As (almost) always, the default alphabetic ordering is terrible

Flights out of New York City in 2013

flight_data |>
  mutate(
    name = fct_infreq(name)
  ) |>
  ggplot(aes(y = fct_rev(name))) + 
  geom_bar()

 

Ordering by frequency is better, but do we need to show all airlines?

Flights out of New York City in 2013, with lumping

flight_data |>
  mutate(
    # keep only the 7 most common airlines
    name = fct_lump_n(name, 7)
  ) |>
  ggplot(aes(y = fct_rev(name))) + 
  geom_bar()

 

Now the ordering is again alphabetic…

Flights out of New York City in 2013, with lumping

flight_data |>
  mutate(
    # order after lumping
    name = fct_infreq(fct_lump_n(name, 7))
  ) |>
  ggplot(aes(y = fct_rev(name))) + 
  geom_bar()

 

Flights out of New York City in 2013, with lumping

flight_data |>
  mutate(
    # order before lumping
    name = fct_lump_n(fct_infreq(name), 7)
  ) |>
  ggplot(aes(y = fct_rev(name))) + 
  geom_bar()

 

In most cases, you will want to order before lumping

Can we visually separate the “Other” category?

Flights out of New York City in 2013, with lumping

flight_data |>
  mutate(
    name = fct_lump_n(fct_infreq(name), 7),
    # Use `fct_other()` to manually lump all
    # levels not called "Other" into "Named"
    highlight = fct_other(
      name,
      keep = "Other", other_level = "Named"
    )
  ) |>
  ggplot() +
  aes(
    y = fct_rev(name),
    fill = highlight
  ) + 
  geom_bar()

 

One annoying issue: The legend is in the wrong order

Flights out of New York City in 2013, with lumping

flight_data |>
  mutate(
    name = fct_lump_n(fct_infreq(name), 7),
    # Use `fct_other()` to manually lump all
    # levels not called "Other" into "Named"
    highlight = fct_other(
      name,
      keep = "Other", other_level = "Named"
    )
  ) |>
  ggplot() +
  aes(
    y = fct_rev(name),
    # reverse fill aesthetic
    fill = fct_rev(highlight)
  ) + 
  geom_bar()

 

Flights out of New York City in 2013, final tweaks

flight_data |>
  mutate(
    name = fct_lump_n(fct_infreq(name), 7),
    highlight = fct_other(
      name, keep = "Other", other_level = "Named"
    )
  ) |>
  ggplot() +
  aes(y = fct_rev(name), fill = highlight) + 
  geom_bar() +
  scale_x_continuous(
    name = "Number of flights",
    expand = expansion(mult = c(0, 0.07))
  ) +
  scale_y_discrete(name = NULL) +
  scale_fill_manual(
    values = c(
      Named = "gray50", Other = "#98545F"
    ),
    guide = "none"
  ) +
  theme_minimal_vgrid()

 

Summary of key factor manipulation functions

Function Use case Documentation
fct_relevel() Change order of factor levels manually click here
fct_infreq() Put levels in descending order of how frequently each level occurs in the data click here
fct_rev() Reverse the order of factor levels click here
fct_reorder() Put levels in ascending order determined by a numeric variable or summary function click here
fct_lump_n() Retain the n most frequent levels and lump all others into "Other" click here
fct_other() Manually group some factor levels into "Other" click here

For more options, check out the reference documentation of the forcats package

Further reading