Visualizing amounts

Claus O. Wilke

2024-12-26

Many datasets contain simple amounts


Example: Highest grossing movies Dec. 2017

rank title amount
1 Star Wars 71.57
2 Jumanji 36.17
3 Pitch Perfect 3 19.93
4 Greatest Showman 8.81
5 Ferdinand 7.32

Data source: Box Office Mojo

We can visualize amounts with bar plots

 

Bars can also run horizontally

 

Avoid rotated axis labels

 

Avoid rotated axis labels

 

Pay attention to the order of the bars

 

Pay attention to the order of the bars

 

We can use dots instead of bars

 

Dots are preferable if we want to truncate the axes

 

Dots are preferable if we want to truncate the axes

 

Bar lengths do
not accurately
represent the
data values
 

Dots are preferable if we want to truncate the axes

 

Key features
of the data
are obscured
 

Dots are preferable if we want to truncate the axes

 

Grouped bars

We use grouped bars for higher-dimensional datasets

 

Data source: United States Census Bureau, 2016

We are free to choose by which variable to group


 

Data source: United States Census Bureau, 2016

We can also use multiple plot panels (facets)

 

Data source: United States Census Bureau, 2016

Making bar plots in ggplot2

Dataset: Highest grossing movies Dec. 2017

# Data from Box Office Mojo for Dec. 22-24, 2017. 
boxoffice <- tibble(
  rank = 1:5,
  title = c(
    "Star Wars", "Jumanji", "Pitch Perfect 3",
    "Greatest Showman", "Ferdinand"
  ),
  amount = c(71.57, 36.17, 19.93, 8.81, 7.32) # million USD
)

boxoffice
# A tibble: 5 × 3
   rank title            amount
  <int> <chr>             <dbl>
1     1 Star Wars         71.6 
2     2 Jumanji           36.2 
3     3 Pitch Perfect 3   19.9 
4     4 Greatest Showman   8.81
5     5 Ferdinand          7.32

Visualize as a bar plot

ggplot(boxoffice, aes(title, amount)) +
  geom_col()  # "col" stands for column

 

Order by data value

ggplot(boxoffice, aes(fct_reorder(title, amount), amount)) +
  geom_col()  # "col" stands for column

 

Order by data value, descending

ggplot(boxoffice, aes(fct_reorder(title, -amount), amount)) +
  geom_col() + 
  xlab(NULL) # remove x axis label

 

Flip x and y, set custom x axis label

ggplot(boxoffice, aes(amount, fct_reorder(title, amount))) +
  geom_col() +
  xlab("amount (in million USD)") +
  ylab(NULL)

 

Sometimes we need to count before visualization

Example: Visualize number of penguins per species

# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Use geom_bar() to count before plotting

ggplot(penguins, aes(y = species)) + # no x aesthetic needed
  geom_bar()

 

Getting the bars into the right order

Option 1: Manually, using fct_relevel()

ggplot(penguins, aes(y = fct_relevel(species, "Chinstrap", "Gentoo", "Adelie"))) +
  geom_bar() +
  ylab(NULL)

 

Getting the bars into the right order

Option 2: Using fct_rev() and fct_infreq() from the forcats package

ggplot(penguins, aes(y = fct_rev(fct_infreq(species)))) +
  geom_bar() +
  ylab(NULL)

 

Display counts by species and sex

ggplot(penguins, aes(sex, fill = species)) +
  geom_bar()

 

Remove missing values (NAs)

penguins2 <- na.omit(penguins) # remove all rows with any missing values

ggplot(penguins2, aes(sex, fill = species)) +
  geom_bar()

 

Positions define how subgroups are shown

position = "dodge": Place bars for subgroups side-by-side

ggplot(penguins2, aes(sex, fill = species)) +
  geom_bar(position = "dodge")

 

Positions define how subgroups are shown

position = "stack": Place bars for subgroups on top of each other

ggplot(penguins2, aes(sex, fill = species)) +
  geom_bar(position = "stack")

 

Positions define how subgroups are shown

position = "fill": Like "stack", but scale to 100%

ggplot(penguins2, aes(sex, fill = species)) +
  geom_bar(position = "fill")

 

Further reading