Color scales
Introduction
In this worksheet, we will discuss how to change and customize color scales.
First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.
Next we set up the data.
We will be working with the dataset temperatures
that we have used in previous worksheets. This dataset contains the average temperature for each day of the year for four different locations.
We will also be working with an aggregated version of this dataset called temps_months
, which contains the mean temperature for each month for the same locations.
Built in ggplot2 color scales
We will start with built-in ggplot2 color scales, which require no additional packages. The scale functions are always named scale_color_*()
or scale_fill_*()
, depending on whether they apply to the color
or fill
aesthetic. The *
indicates some other words specifying the type of the scale, for example scale_color_brewer()
or scale_color_distiller()
for discrete or continuous scales from the ColorBrewer project, respectively. You can find all available built-in scales here.
Now consider the following plot:
If you wanted to change the color scale to one from the ColorBrewer project, which scale function would you have to add? Think about this and then try it out.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_distiller()
Most color scale functions have additional customizations. How to use them depends on the specific scale function. For the ColorBrewer scales you can set direction = 1
or direction = -1
to set the direction of the scale (light to dark or dark to light). You can also set the palette via a numeric argument, e.g. palette = 1
, palette = 2
, palette = 3
etc.
Try this out by setting the direction of the scale from light to dark and using palette #4.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_distiller(direction = ___, palette = ___)
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_distiller(direction = 1, palette = 4)
A popular set of scales are the viridis scales, which are provided by scale_*_viridis_c()
for continuous data and scale_*_viridis_d()
for discrete data. Change the above plot to use a viridis scale.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c()
The viridis scales can be customized with direction
(as before), option
(which can be "A"
, "B"
, "C"
, "D"
, or "E"
), and begin
and end
which are numerical values between 0 and 1 indicating where in the color scale the data should begin or end. For example, begin = 0.2
means that the lowest data value is mapped to the 20th percentile in the scale.
Try different choices for option
, begin
, and end
to see how they change the plot.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(option = "B", begin = 0.15)
Customizing scale title and labels
In a previous worksheet, we used arguments such as name
, breaks
, labels
, and limits
to customize the axis. For color scales, instead of an axis we have a legend, and we can use the same arguments inside the scale function to customize how the legend looks.
Try this out. Set the scale limits from 10 to 110 and set the name of the scale and the breaks as you wish.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(
name = ___,
breaks = ___,
limits = ___
)
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_c(
name = "temperature (F)",
breaks = c(25, 50, 75, 100),
limits = c(10, 110)
)
Note: Color scales ignore the expand
argument, so you cannot use it to expand the scale beyond the data values as you can for position scales.
Binned scales
Research into human perception has shown that continuous coloring can be difficult to interpret. Therefore, it is often preferable to use a small number of discrete colors to indicate ranges of data values. You can do this in ggplot with binned scales. For example, scale_fill_viridis_b()
provides a binned version of the viridis scale. Try this out.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
___
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_b()
You can provide bin breaks directly with the breaks
argument. Try this out.
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_b(
breaks = ___
)
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_viridis_b(
breaks = c(40, 60, 80, 100)
)
Scales from the colorspace package
The color scales provided by the colorspace package follow a simple naming scheme of the form scale_<aesthetic>_<datatype>_<colorscale>()
, where <aesthetic>
is the name of the aesthetic (fill
, color
, colour
), <datatype>
indicates the type of variable plotted (discrete
, continuous
, binned
), and colorscale
stands for the type of the color scale (qualitative
, sequential
, diverging
, divergingx
).
For the mean temperature plot we have been using throughout this worksheet, which color scale(s) from the colorspace package is/are appropriate? Think about this and then try it out.
Two alternative options are appropriate. Can you think of both?
# Option 1: Continuous scale
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_continuous_sequential()
# Option 2: Binned scale
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_binned_sequential()
You can customize the colorspace scales with the palette
argument, which takes the name of a palette (e.g., "Inferno"
, "BluYl"
, "Lajolla"
). Try this out. Also try reversing the scale direction with rev = TRUE
or rev = FALSE
. (The colorspace scales use rev
instead of direction
.) You can find the names of all supported scales here (consider specifically single-hue and multi-hue sequential palettes).
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_continuous_sequential(
palette = ___
)
ggplot(temps_months, aes(x = month, y = location, fill = mean)) +
geom_tile() +
coord_fixed(expand = FALSE) +
scale_fill_continuous_sequential(
palette = "Lajolla"
)
You can also use begin
and end
just like in the viridis scales.
Manual scales
For discrete data with a small number of categories, it’s usually best to set colors manually. This can be done with the scale functions scale_*_manual()
. These functions take an argument values
that specifies the color values to use.
To see how this works, let’s go back to this plot of temperatures over time for four locations:
Let’s use the following four colors: "gold2"
, "firebrick"
, "blue3"
, "springgreen4"
. We can visualize this palette using the function swatchplot()
from the colorspace package.
Now apply this color palette to the temperatures plot, by using the manual color scale.
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(linewidth = 1) +
scale_color_manual(
values = ___
)
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(linewidth = 1) +
scale_color_manual(
values = c("gold2", "firebrick", "blue3", "springgreen4")
)
One problem with this approach is that we can’t easily control which data value gets assigned to which color. What if we wanted San Diego to be shown in green and Chicago in blue? The simplest way to resolve this issue is to use a named vector. A named vector in R is a vector where each value has a name. Named vectors are created by writing c(name1 = value1, name2 = value2, ...)
. See the following example.
The names in the second example are A
, B
, and C
. Notice that the names are not in quotes. However, if you need a name containing a space (such as Death Valley
), you need to enclose the name in backticks. Thus, our named vector of colors could be written like so:
Now try to use this color vector in the figure showing temperatures over time.
<- c(___)
colorvector
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(linewidth = 1) +
scale_color_manual(
values = ___
)
<- c(
colorvector `Death Valley` = "gold2",
Houston = "firebrick",
Chicago = "blue3",
`San Diego` = "springgreen4"
)
ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
geom_line(linewidth = 1) +
scale_color_manual(
values = colorvector
)
Try some other colors also. For example, you could use the Okabe-Ito colors:
Alternatively, you can find a list of all named colors here. You can also run the command colors()
in your R console to get a list of all available color names.
Hint: It’s a good idea to never use the colors "red"
, "green"
, "blue"
, "cyan"
, "magenta"
, "yellow"
. They are extreme points in the RGB color space and tend to look unnatural and cheap. Try this by making a swatch plot of these colors, and compare for example to the color scale containing the colors "firebrick"
, "springgreen4"
, "blue3"
, "turquoise3"
, "darkorchid2"
, "gold2"
. Do you see the difference?
::swatchplot(c("red", "green", "blue", "cyan", "magenta", "yellow"))
colorspace::swatchplot(c("firebrick", "springgreen4", "blue3", "turquoise3", "darkorchid2", "gold2")) colorspace