---
#############################################################
# #
# Click on "Run Document" in RStudio to run this worksheet. #
# #
#############################################################
title: "Handling overlapping points"
author: "Claus O. Wilke"
output: learnr::tutorial
runtime: shiny_prerendered
---
```{r setup, include=FALSE}
library(learnr)
library(tidyverse)
library(palmerpenguins)
library(colorspace)
knitr::opts_chunk$set(echo = FALSE, comment = "")
blue_jays <- read_csv("https://wilkelab.org/SDS375/datasets/blue_jays.csv")
```
## Introduction
In this worksheet, we will discuss how to make 2D density plots and histograms, to effectively visualize scatter plots in which many points lie on top of one another.
We will be using the R package **tidyverse**, which includes `ggplot()` and related functions. We will also be using the package **palmerpenguins** for the `penguins` dataset.
```{r library-calls, echo = TRUE, eval = FALSE}
# load required library
library(tidyverse)
library(palmerpenguins)
```
We will be working with the dataset `penguins` containing data on individual penguins on Antarctica.
```{r temperatures, echo = TRUE}
penguins
```
## 2D density plots
2D density plots are a replacement for and alternative to scatter plots. They visualize the density of points in the 2D plane, and they are particularly useful when the point density is very high, so that many points lie on top of one another. We will demonstrate 2D density plots for the `penguins` dataset, specifically for a scatter plot of bill length versus body mass.
```{r penguins-scatter, echo = TRUE}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_point(na.rm = TRUE) +
theme_bw()
```
To create a 2D density plot, we simply add `geom_density_2d()`. Try this out.
```{r density-lines, exercise = TRUE}
```
```{r density-lines-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
___ +
geom_point(na.rm = TRUE) +
theme_bw()
```
```{r density-lines-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_density_2d(na.rm = TRUE) +
geom_point(na.rm = TRUE) +
theme_bw()
```
You can change the number of contour lines shown by providing the `bins` argument to `geom_density_2d()`. For example, try `bins = 5`.
```{r density-lines2, exercise = TRUE}
```
```{r density-lines2-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_density_2d(na.rm = TRUE, bins = ___) +
geom_point(na.rm = TRUE) +
theme_bw()
```
```{r density-lines2-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_density_2d(na.rm = TRUE, bins = 5) +
geom_point(na.rm = TRUE) +
theme_bw()
```
Also try other values for `bins`.
The plots we have made so far did not consider different penguin species, but we know from earlier worksheets that there are three penguins species with quite different values for body mass and bill length. Modify the above plot so that both points and contour lines are colored by species.
```{r density-lines-colors, exercise = TRUE}
```
```{r density-lines-colors-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm, color = species)) +
geom_density_2d(na.rm = TRUE, bins = 5) +
geom_point(na.rm = TRUE) +
theme_bw()
```
Also try faceting by species.
```{r density-lines-facets, exercise = TRUE}
```
```{r density-lines-facets-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm, color = species)) +
geom_density_2d(na.rm = TRUE, bins = 5) +
geom_point(na.rm = TRUE) +
theme_bw() +
facet_wrap(vars(species))
```
Finally, you can use `geom_density_2d_filled()` to draw filled contour bands instead of contour lines. Try this out.
**Hints:**
- You cannot map penguin species to color when drawing contour bands.
- It helps to set alpha transparency for contour bands, e.g. `alpha = 0.5`.
- You may want to make the point size smaller so you can see the contour bands underneath the points.
```{r density-bands, exercise = TRUE}
```
```{r density-bands-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_density_2d_filled(___) +
geom_point(na.rm = TRUE, size = ___) +
theme_bw() +
facet_wrap(vars(species))
```
```{r density-bands-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_density_2d_filled(na.rm = TRUE, bins = 5, alpha = 0.5) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species))
```
## 2D histograms
2D histograms are very similar to 2D density plots. They are generated by simply subdividing the 2D plane into regularly shaped regions (rectangles or hexagons), counting how many data points fall into each region, and then coloring each region by its count.
To make rectangular 2D hexagons, you can use `geom_bin2d()`. Try this out.
**Hint:** Set `bins = 5` to get a reasonable number of bins for this dataset.
```{r bins2d-facets, exercise = TRUE}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
___ +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species))
```
```{r bins2d-facets-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_bin2d(___) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species))
```
```{r bins2d-facets-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_bin2d(na.rm = TRUE, bins = 5) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species))
```
It helps to make the bins partially transparent (i.e., set `alpha = 0.5`). Also use a different sequential color scale.
```{r bins2d-facets2, exercise = TRUE}
```
```{r bins2d-facets2-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_bin2d(na.rm = TRUE, bins = 5, ___) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill____
```
```{r bins2d-facets2-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_bin2d(na.rm = TRUE, bins = 5, alpha = 0.5) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```
You can control bins in a more fine-grained manner by setting the `binwidth` argument. It takes a vector of two numbers, where the first is the width of the bins (in data units) and the second is the height (also in data units). Make bins that are 10000 units wide and 5 units tall.
```{r bins2d-facets3, exercise = TRUE}
```
```{r bins2d-facets3-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_bin2d(
na.rm = TRUE,
binwidth = ___,
alpha = 0.5
) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```
```{r bins2d-facets3-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_bin2d(
na.rm = TRUE,
binwidth = c(10000, 5),
alpha = 0.5
) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```
Also try making bins that are 30 units tall and 2500 units wide.
Instead of rectangular bins, you can also make hexbins, with `geom_hex()`. It mostly works the same as `geom_bin2d()`. Try it out.
**Hint:** You will need to set the argument `bins` to an appropriate value for the hexbins to look good.
```{r hex-facets, exercise = TRUE}
```
```{r hex-facets-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_hex(
___
) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```
```{r hex-facets-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_hex(
na.rm = TRUE,
bins = 5,
alpha = 0.5
) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```
Just as was the case with `geom_bin2d()`, you can provide an argument `binwidth` consisting of two values, one controling the width and the other the height. Try this out.
```{r hex-facets2, exercise = TRUE}
```
```{r hex-facets2-hint}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_hex(
na.rm = TRUE,
binwidth = ___
alpha = 0.5
) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```
```{r hex-facets2-solution}
ggplot(penguins, aes(body_mass_g, bill_length_mm)) +
geom_hex(
na.rm = TRUE,
binwidth = c(1000, 5),
alpha = 0.5
) +
geom_point(na.rm = TRUE, size = 0.2) +
theme_bw() +
facet_wrap(vars(species)) +
scale_fill_viridis_c()
```