Color scales

Author

Claus O. Wilke

Introduction

In this worksheet, we will discuss how to change and customize color scales.

First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.

Next we set up the data.

We will be working with the dataset temperatures that we have used in previous worksheets. This dataset contains the average temperature for each day of the year for four different locations.

We will also be working with an aggregated version of this dataset called temps_months, which contains the mean temperature for each month for the same locations.

Built in ggplot2 color scales

We will start with built-in ggplot2 color scales, which require no additional packages. The scale functions are always named scale_color_*() or scale_fill_*(), depending on whether they apply to the color or fill aesthetic. The * indicates some other words specifying the type of the scale, for example scale_color_brewer() or scale_color_distiller() for discrete or continuous scales from the ColorBrewer project, respectively. You can find all available built-in scales here.

Now consider the following plot:

If you wanted to change the color scale to one from the ColorBrewer project, which scale function would you have to add? Think about this and then try it out.

Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_distiller()

Most color scale functions have additional customizations. How to use them depends on the specific scale function. For the ColorBrewer scales you can set direction = 1 or direction = -1 to set the direction of the scale (light to dark or dark to light). You can also set the palette via a numeric argument, e.g. palette = 1, palette = 2, palette = 3 etc.

Try this out by setting the direction of the scale from light to dark and using palette #4.

Hint
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_distiller(direction = ___, palette = ___)
Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_distiller(direction = 1, palette = 4)

A popular set of scales are the viridis scales, which are provided by scale_*_viridis_c() for continuous data and scale_*_viridis_d() for discrete data. Change the above plot to use a viridis scale.

Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_c()

The viridis scales can be customized with direction (as before), option (which can be "A", "B", "C", "D", or "E"), and begin and end which are numerical values between 0 and 1 indicating where in the color scale the data should begin or end. For example, begin = 0.2 means that the lowest data value is mapped to the 20th percentile in the scale.

Try different choices for option, begin, and end to see how they change the plot.

Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_c(option = "B", begin = 0.15)

Customizing scale title and labels

In a previous worksheet, we used arguments such as name, breaks, labels, and limits to customize the axis. For color scales, instead of an axis we have a legend, and we can use the same arguments inside the scale function to customize how the legend looks.

Try this out. Set the scale limits from 10 to 110 and set the name of the scale and the breaks as you wish.

Hint
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_c(
    name = ___,
    breaks = ___,
    limits = ___
  )
Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_c(
    name = "temperature (F)",
    breaks = c(25, 50, 75, 100),
    limits = c(10, 110)
  )

Note: Color scales ignore the expand argument, so you cannot use it to expand the scale beyond the data values as you can for position scales.

Binned scales

Research into human perception has shown that continuous coloring can be difficult to interpret. Therefore, it is often preferable to use a small number of discrete colors to indicate ranges of data values. You can do this in ggplot with binned scales. For example, scale_fill_viridis_b() provides a binned version of the viridis scale. Try this out.

Hint
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  ___
Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_b()

You can provide bin breaks directly with the breaks argument. Try this out.

Hint
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_b(
    breaks = ___
  )
Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_viridis_b(
    breaks = c(40, 60, 80, 100)
  )

Scales from the colorspace package

The color scales provided by the colorspace package follow a simple naming scheme of the form scale_<aesthetic>_<datatype>_<colorscale>(), where <aesthetic> is the name of the aesthetic (fill, color, colour), <datatype> indicates the type of variable plotted (discrete, continuous, binned), and colorscale stands for the type of the color scale (qualitative, sequential, diverging, divergingx).

For the mean temperature plot we have been using throughout this worksheet, which color scale(s) from the colorspace package is/are appropriate? Think about this and then try it out.

Hint

Two alternative options are appropriate. Can you think of both?

Solution
# Option 1: Continuous scale
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_continuous_sequential()

# Option 2: Binned scale
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_binned_sequential()

You can customize the colorspace scales with the palette argument, which takes the name of a palette (e.g., "Inferno", "BluYl", "Lajolla"). Try this out. Also try reversing the scale direction with rev = TRUE or rev = FALSE. (The colorspace scales use rev instead of direction.) You can find the names of all supported scales here (consider specifically single-hue and multi-hue sequential palettes).

Hint
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_continuous_sequential(
    palette = ___
  )
Solution
ggplot(temps_months, aes(x = month, y = location, fill = mean)) + 
  geom_tile() + 
  coord_fixed(expand = FALSE) +
  scale_fill_continuous_sequential(
    palette = "Lajolla"
  )

You can also use begin and end just like in the viridis scales.

Manual scales

For discrete data with a small number of categories, it’s usually best to set colors manually. This can be done with the scale functions scale_*_manual(). These functions take an argument values that specifies the color values to use.

To see how this works, let’s go back to this plot of temperatures over time for four locations:

Let’s use the following four colors: "gold2", "firebrick", "blue3", "springgreen4". We can visualize this palette using the function swatchplot() from the colorspace package.

Now apply this color palette to the temperatures plot, by using the manual color scale.

Hint
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line(linewidth = 1) +
  scale_color_manual(
    values = ___
  )
Solution
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line(linewidth = 1) +
  scale_color_manual(
    values = c("gold2", "firebrick", "blue3", "springgreen4")
  )

One problem with this approach is that we can’t easily control which data value gets assigned to which color. What if we wanted San Diego to be shown in green and Chicago in blue? The simplest way to resolve this issue is to use a named vector. A named vector in R is a vector where each value has a name. Named vectors are created by writing c(name1 = value1, name2 = value2, ...). See the following example.

The names in the second example are A, B, and C. Notice that the names are not in quotes. However, if you need a name containing a space (such as Death Valley), you need to enclose the name in backticks. Thus, our named vector of colors could be written like so:

Now try to use this color vector in the figure showing temperatures over time.

Hint
colorvector <- c(___)

ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line(linewidth = 1) +
  scale_color_manual(
    values = ___
  )
Solution
colorvector <- c(
  `Death Valley` = "gold2",
  Houston = "firebrick",
  Chicago = "blue3",
  `San Diego` = "springgreen4"
)

ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line(linewidth = 1) +
  scale_color_manual(
    values = colorvector
  )

Try some other colors also. For example, you could use the Okabe-Ito colors:

Alternatively, you can find a list of all named colors here. You can also run the command colors() in your R console to get a list of all available color names.

Hint: It’s a good idea to never use the colors "red", "green", "blue", "cyan", "magenta", "yellow". They are extreme points in the RGB color space and tend to look unnatural and cheap. Try this by making a swatch plot of these colors, and compare for example to the color scale containing the colors "firebrick", "springgreen4", "blue3", "turquoise3", "darkorchid2", "gold2". Do you see the difference?

Solution
colorspace::swatchplot(c("red", "green", "blue", "cyan", "magenta", "yellow"))
colorspace::swatchplot(c("firebrick", "springgreen4", "blue3", "turquoise3", "darkorchid2", "gold2"))