Enter your name and EID here
This homework is due on April 12, 2021 at 11:00pm. Please submit as a pdf file on Canvas.
For all problems in this homework, we will work with the
penguins_clean dataset, which is a cleaned-up version of the
penguins dataset from the palmerpenguins package.
Note: This homework is about the contents of the plots. Don’t worry about styling. It’s OK to use the default theme and plot labeling.
library(palmerpenguins) penguins_clean <- penguins %>% select(-year) %>% # remove the year column as it is distracting here na.omit() # remove any rows with missing values penguins_clean
## # A tibble: 333 x 7 ## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g ## <fct> <fct> <dbl> <dbl> <int> <int> ## 1 Adelie Torge… 39.1 18.7 181 3750 ## 2 Adelie Torge… 39.5 17.4 186 3800 ## 3 Adelie Torge… 40.3 18 195 3250 ## 4 Adelie Torge… 36.7 19.3 193 3450 ## 5 Adelie Torge… 39.3 20.6 190 3650 ## 6 Adelie Torge… 38.9 17.8 181 3625 ## 7 Adelie Torge… 39.2 19.6 195 4675 ## 8 Adelie Torge… 41.1 17.6 182 3200 ## 9 Adelie Torge… 38.6 21.2 191 3800 ## 10 Adelie Torge… 34.6 21.1 198 4400 ## # … with 323 more rows, and 1 more variable: sex <fct>
Problem 1: (2 pts)
Perform a PCA of the
penguins_clean dataset and make two plots: 1. A rotation plot of components 1 and 2; 2. A plot of the eigenvalues, showing the amount of variance explained by the various components.
# perform PCA pca_fit <- penguins_clean %>% select(where(is.numeric)) %>% # retain only numeric columns scale() %>% # scale to zero mean and unit variance prcomp() # make rotation plot arrow_style <- arrow( angle = 20, length = grid::unit(8, "pt"), ends = "first", type = "closed" ) pca_fit %>% # extract rotation matrix tidy(matrix = "rotation") %>% pivot_wider( names_from = "PC", values_from = "value", names_prefix = "PC" ) %>% ggplot(aes(PC1, PC2)) + geom_segment( xend = 0, yend = 0, arrow = arrow_style ) + geom_text(aes(label = column), hjust = 0, vjust = 1) + coord_fixed(xlim = c(-0.5, 1.2), ylim = c(-.9, .1))
# make variance explained plot pca_fit %>% # extract eigenvalues tidy(matrix = "eigenvalues") %>% ggplot(aes(PC, percent)) + geom_col() + scale_x_continuous( # create one axis tick per PC breaks = 1:4 ) + scale_y_continuous( name = "variance explained", # format y axis ticks as percent values label = scales::label_percent(accuracy = 1) )
Problem 2: (4 pts) Make a scatter plot of PC 2 versus PC 1 and color by penguin species. Then use the rotation plot from Problem 1 to describe the physical characteristics by which the different penguin species differ. Finally, make one more scatter plot of the raw data that can support your interpretation of the PC analysis.
pca_fit %>% # add PCs to the original dataset augment(penguins_clean) %>% ggplot(aes(.fittedPC1, .fittedPC2)) + geom_point(aes(color = species))
All three species separate along PC 1. Penguins with high PC 1 have on average higher body mass, longer flippers, and longer but narrower bills. It approximately represents the overall size of the penguins. Therefore, Gentoo penguins are substantially larger than the other two species. However, Chinstrap don’t differ from Adelie in body mass. Instead, they have longer bills.
penguins_clean %>% ggplot(aes(body_mass_g, bill_length_mm)) + geom_point(aes(color = species))