This homework is due on Feb. 8, 2024 at 11:00pm. Please submit as a pdf file on Canvas.
Problem 1: (6 pts) For problems 1 and 2, we will
work with the dataset OH_pop
that contains Ohio state
demographics and has been derived from the midwest
dataset
provided by ggplot2. See here for details of the
original dataset: https://ggplot2.tidyverse.org/reference/midwest.html.
OH_pop
contains two columns: county
and
poptotal
(the county’s total population), and it only
contains counties with at least 100,000 inhabitants.
OH_pop
## # A tibble: 25 × 2
## county poptotal
## <chr> <int>
## 1 CUYAHOGA 1412140
## 2 FRANKLIN 961437
## 3 HAMILTON 866228
## 4 MONTGOMERY 573809
## 5 SUMMIT 514990
## 6 LUCAS 462361
## 7 STARK 367585
## 8 BUTLER 291479
## 9 LORAIN 271126
## 10 MAHONING 264806
## # ℹ 15 more rows
Use ggplot to make a scatter plot of county
vs total
population (column poptotal
) and order the counties by the
total population.
Rename the axes and set appropriate limits, breaks and labels.
Note: Do not use xlab()
or ylab()
to label the
axes.
# your code goes here
Problem 2: (6 pts)
Modify the plot from Problem 1 by changing the scale for
poptotal
to logarithmic.
Adjust the limits, breaks and labels for the logarithmic scale.
# your code goes here
Problem 3: (8 pts) For this problem, we will be
working with the Aus_athletes
dataset that comes with the
ggridges package:
head(Aus_athletes)
## rcc wcc hc hg ferr bmi ssf pcBfat lbm height weight sex sport
## 1 3.96 7.5 37.5 12.3 60 20.56 109.1 19.75 63.32 195.9 78.9 f basketball
## 2 4.41 8.3 38.2 12.7 68 20.67 102.8 21.30 58.55 189.7 74.4 f basketball
## 3 4.14 5.0 36.4 11.6 21 21.86 104.6 19.88 55.36 177.8 69.1 f basketball
## 4 4.11 5.3 37.3 12.6 69 21.88 126.4 23.66 57.18 185.0 74.9 f basketball
## 5 4.45 6.8 41.5 14.0 29 18.96 80.3 17.64 53.20 184.6 64.6 f basketball
## 6 4.10 4.4 37.4 12.5 42 21.04 75.2 15.58 53.77 174.0 63.7 f basketball
This dataset contains various physiological measurements made on
athletes competing in different sports. Here, we are only interested in
the columns height
, indicating the athleete’s height in cm,
sex
, indicating whether an athlete is male or female, and
sport
, indicating the sport the athlete competes in.
Visualize the distribution of athletes’ heights by sex and sport with (i) boxplots and (ii) ridgelines. Make one plot per geom and do not use faceting. In both cases, put height on the x axis and sport on the y axis. Use color to indicate the athlete’s sex.
Do you see anything noteworthy about the boxplots for water polo, netball, and gymnastics? Do they visually match the boxplots for the other sports?
# your boxplot code goes here
# your ridgelines code goes here