This homework is due on Feb. 8, 2024 at 11:00pm. Please submit as a pdf file on Canvas.

Problem 1: (6 pts) For problems 1 and 2, we will work with the dataset OH_pop that contains Ohio state demographics and has been derived from the midwest dataset provided by ggplot2. See here for details of the original dataset: https://ggplot2.tidyverse.org/reference/midwest.html. OH_pop contains two columns: county and poptotal (the county’s total population), and it only contains counties with at least 100,000 inhabitants.

OH_pop
## # A tibble: 25 × 2
##    county     poptotal
##    <chr>         <int>
##  1 CUYAHOGA    1412140
##  2 FRANKLIN     961437
##  3 HAMILTON     866228
##  4 MONTGOMERY   573809
##  5 SUMMIT       514990
##  6 LUCAS        462361
##  7 STARK        367585
##  8 BUTLER       291479
##  9 LORAIN       271126
## 10 MAHONING     264806
## # ℹ 15 more rows
  1. Use ggplot to make a scatter plot of county vs total population (column poptotal) and order the counties by the total population.

  2. Rename the axes and set appropriate limits, breaks and labels. Note: Do not use xlab() or ylab() to label the axes.

# your code goes here

Problem 2: (6 pts)

  1. Modify the plot from Problem 1 by changing the scale for poptotal to logarithmic.

  2. Adjust the limits, breaks and labels for the logarithmic scale.

# your code goes here

Problem 3: (8 pts) For this problem, we will be working with the Aus_athletes dataset that comes with the ggridges package:

head(Aus_athletes)
##    rcc wcc   hc   hg ferr   bmi   ssf pcBfat   lbm height weight sex      sport
## 1 3.96 7.5 37.5 12.3   60 20.56 109.1  19.75 63.32  195.9   78.9   f basketball
## 2 4.41 8.3 38.2 12.7   68 20.67 102.8  21.30 58.55  189.7   74.4   f basketball
## 3 4.14 5.0 36.4 11.6   21 21.86 104.6  19.88 55.36  177.8   69.1   f basketball
## 4 4.11 5.3 37.3 12.6   69 21.88 126.4  23.66 57.18  185.0   74.9   f basketball
## 5 4.45 6.8 41.5 14.0   29 18.96  80.3  17.64 53.20  184.6   64.6   f basketball
## 6 4.10 4.4 37.4 12.5   42 21.04  75.2  15.58 53.77  174.0   63.7   f basketball

This dataset contains various physiological measurements made on athletes competing in different sports. Here, we are only interested in the columns height, indicating the athleete’s height in cm, sex, indicating whether an athlete is male or female, and sport, indicating the sport the athlete competes in.

Visualize the distribution of athletes’ heights by sex and sport with (i) boxplots and (ii) ridgelines. Make one plot per geom and do not use faceting. In both cases, put height on the x axis and sport on the y axis. Use color to indicate the athlete’s sex.

Do you see anything noteworthy about the boxplots for water polo, netball, and gymnastics? Do they visually match the boxplots for the other sports?

# your boxplot code goes here
# your ridgelines code goes here