Enter your name and EID here

This homework is due on Feb. 1, 2021 at 11:00pm. Please submit as a pdf file on Canvas.

In this homework you will be working with the iris dataset built into R. This data set contains measurements of flowers (sepal length, sepal width, petal length, petal width) for three different Iris species (I. setosa, I. versicolor, I. virginica).

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Problem 1: (6 pts) Use ggplot to make a histogram of the Sepal.Length column. Manually choose appropriate values for binwidth and center. Explain your choice of values in 2-3 sentences.

ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(binwidth = 0.2, center = 0.1)

Sepal lengths vary from about 4 to about 8, so a binwidth of 0.2 creates 20 bins, enough to see the overall shape of the histogram but not so many that we end up with many empty or near-empty bins. Setting the center value to half the binwidth value makes sure bins are aligned to round numbers.

Problem 2: (4 pts) Modify the plot from Problem 1 to show one panel per species. Hint: Use facet_wrap(). See Slide 14 from Class 2.

ggplot(iris, aes(Sepal.Length)) +
  geom_histogram(binwidth = 0.2, center = 0.1) +
  facet_wrap(vars(Species))