Visualizing amounts

Author

Claus O. Wilke

Introduction

In this worksheet, we will discuss a core concept of ggplot, the mapping of data values onto aesthetics.

First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.

Next we set up the data.

We will be working with two datasets. First, box-office gross results for Dec. 22-24, 2017:

Second, data on individual penguins on Antarctica. Note that missing values have been removed:

penguins2

Drawing numerical values as bars

For the boxoffice dataset, we want to draw the amount (Weekend gross, in million USD) for each movie as a bar. Somewhat confusingly, the ggplot geom that does this is called geom_col(). (There is also a geom_bar(), but it works differently. We’ll get to that later in this tutorial.) Make a bar plot of amount versus title. This means amount goes on the y axis and title on the x axis.

Hint
ggplot(boxoffice, aes(x = ___, y = ___)) +
  geom_col()
Solution
ggplot(boxoffice, aes(x = title, y = amount)) +
  geom_col()

Now flip which column you map onto x and which onto y.

Solution
ggplot(boxoffice, aes(x = amount, y = title)) +
  geom_col()

The x-axis label should specify that the amount is in million USD, and the y axis doesn’t need the word “title”. Use xlab() and ylab() to make these changes to the plot.

Hint
ggplot(boxoffice, aes(x = amount, y = title)) +
  geom_col() +
  xlab(___) +
  ylab(___)
Solution
ggplot(boxoffice, aes(x = amount, y = title)) +
  geom_col() +
  xlab("weekend gross (million USD)") +
  ylab(NULL) # NULL means nothing, don't show a y label

Getting bars into the right order

Whenever we are making bar plots, we need to think about the correct order of the bars. By default, ggplot uses alphabetic ordering, but that is rarely appropriate. If there is no inherent ordering (such as, for example, a temporal progression), then it is usually best to order by the magnitude of the values, i.e., sort the bars by length.

We can do this with the fct_reorder() function, which takes two arguments: The categorical variable we want to re-order, and the values by which we want to order. Here, the categorical variable is the column title and the values are in the column amount. We can apply the fct_reorder() function right inside the aes() statement.

Hint
ggplot(boxoffice, aes(x = amount, y = fct_reorder(___, ___))) +
  geom_col() +
  xlab("weekend gross (million USD)") +
  ylab(NULL)
Solution
ggplot(boxoffice, aes(x = amount, y = fct_reorder(title, amount))) +
  geom_col() +
  xlab("weekend gross (million USD)") +
  ylab(NULL)

Try the following additional experiments in the above code:

  • What happens when you run the above code without the ylab(NULL) statement?
  • Can you make the bars blue?
  • Can you color the bars by amount or by title?

Drawing bars based on a count

The boxoffice dataset contains individual values, the dollar amounts, that we wanted to visualize with bars. Often, however, we encounter a slightly different scenario: A dataset doesn’t contain the numeric amounts directly, but instead contains observations we want to count. This is the case in the penguins2 dataset (see above).

It contains one row per penguin. If we want to make a bar plot of the number of penguins of each species (Adelie, Chinstrap, Gentoo), we cannot use geom_col() as before, because the dataset doesn’t have a column that contains these counts.

The solution here is to use geom_bar(), which performs a count and then displays the result of that count. Because geom_bar() counts automatically, you only have to provide it with a single aesthetic, which specifies the data column in which you are counting.

Try this out. Make a bar plot of the number of penguins per species. Map the penguin species onto the x axis.

Hint
ggplot(penguins2, aes(x = ___)) +
  geom_bar()
Solution
ggplot(penguins2, aes(x = species)) +
  geom_bar()

Try the following additional modifications in the above code:

  • Map penguin species onto the y axis.
  • Remove the axis label that says “species”.
  • Change the order of the bars manually, using fct_relevel() (see slides).

Counting subgroups

geom_bar() automatically counts how many cases there are in each unique combination of different categorical aesthetics. In the previous example, we had only one categorical aesthetic, species. But we can add a second one, for example sex. Then geom_bar() counts the number of cases in each unique combination of species and sex and draws separate bars for each. Try this out by mapping the sex column onto the fill aesthetic.

Solution
ggplot(penguins2, aes(x = species, fill = sex)) +
  geom_bar()

By default, the bars for different fill values but identical x values will be drawn on top of one-another. But there are other possibilities, which are controled by the position argument to geom_bar(). For example, try to set the position to "dodge".

Hint
ggplot(penguins2, aes(x = species, fill = sex)) +
  geom_bar(position = ___)
Solution
ggplot(penguins2, aes(x = species, fill = sex)) +
  geom_bar(position = "dodge")

In the above code, also try positions "stack" and "fill".