Enter your name and EID here

This homework is due on Feb. 8, 2021 at 11:00pm. Please submit as a pdf file on Canvas.

Problem 1: (5 pts) We will work again with the iris dataset built into R. It was previously introduced in Homework 2.

Make two different strip charts of sepal length versus species, the first one without horizontal jitter and second one with horizontal jitter. Explain in 1-2 sentences why the plot without jitter is highly misleading.

Hint: Make sure you do not accidentally apply vertical jitter. This is a common mistake many people make.

# without jitter
ggplot(iris, aes(Species, Sepal.Length)) +
  geom_point(position = position_jitter(width = 0, height = 0))

# with jitter
ggplot(iris, aes(Species, Sepal.Length)) +
  geom_point(position = position_jitter(width = 0.2, height = 0))

The sepal lengths in the iris dataset are rounded to one decimal and therefore many values appear more than once. This causes points to fall exactly on top of one another, and therefore the plot without jitter appears to have many fewer points than there actually are.

Problem 2: (5 pts) For this problem, we will be working with the Aus_athletes dataset that comes with the ggridges package:

head(Aus_athletes)
##    rcc wcc   hc   hg ferr   bmi   ssf pcBfat   lbm height weight sex      sport
## 1 3.96 7.5 37.5 12.3   60 20.56 109.1  19.75 63.32  195.9   78.9   f basketball
## 2 4.41 8.3 38.2 12.7   68 20.67 102.8  21.30 58.55  189.7   74.4   f basketball
## 3 4.14 5.0 36.4 11.6   21 21.86 104.6  19.88 55.36  177.8   69.1   f basketball
## 4 4.11 5.3 37.3 12.6   69 21.88 126.4  23.66 57.18  185.0   74.9   f basketball
## 5 4.45 6.8 41.5 14.0   29 18.96  80.3  17.64 53.20  184.6   64.6   f basketball
## 6 4.10 4.4 37.4 12.5   42 21.04  75.2  15.58 53.77  174.0   63.7   f basketball

This dataset contains various physiological measurements made on athletes competing in different sports. Here, we are only interested in the columns height, indicating the athleete’s height in cm, sex, indicating whether an athlete is male or female, and sport, indicating the sport the athlete competes in.

Visualize the distribution of athletes’ heights by sex and sport with (i) boxplots and (ii) ridgelines. Make one plot per geom and do not use faceting. In both cases, put height on the x axis and sport on the y axis. Use color to indicate the athlete’s sex.

The boxplot ggplot generates will have a problem. Explain what the problem is. (You do not have to solve it.)

ggplot(Aus_athletes, aes(height, sport, fill = sex)) +
  geom_boxplot()

ggplot(Aus_athletes, aes(height, sport, fill = sex)) +
  geom_density_ridges()
## Picking joint bandwidth of 2.8

For three sports (water polo, netball, gymnastics), we only have data for either male or female athletes. The resulting boxplot is twice as wide as for the other sports and sits in the wrong location (centered rather than dodged).