## Lab Worksheet 5

In 1898, Hermon Bumpus, an American biologist working at Brown University, collected data on one of the first examples of natural selection directly observed in nature. Immediately following a bad winter storm, he collected 136 English house sparrows, Passer domesticus, and brought them indoors. Of these birds, 64 had died during the storm, but 72 recovered and survived. By comparing measurements of physical traits, Bumpus demonstrated physical differences between the dead and living birds. He interpreted this finding as evidence for natural selection as a result of this storm:

``bumpus <- read_csv("http://wilkelab.org/classes/SDS348/data_sets/bumpus_full.csv")``
``````## Parsed with column specification:
## cols(
##   Sex = col_character(),
##   Age = col_character(),
##   Survival = col_character(),
##   Length = col_double(),
##   Weight = col_double(),
##   Skull_Length = col_double(),
##   Humerus_Length = col_double(),
##   Femur_Length = col_double(),
##   Tarsus_Length = col_double(),
##   Sternum_Length = col_double(),
##   Skull_Width = col_double()
## )``````
``head(bumpus)``
``````## # A tibble: 6 x 12
##   Sex   Age   Survival Length Wingspread Weight Skull_Length Humerus_Length
##   <chr> <chr> <chr>     <dbl>      <dbl>  <dbl>        <dbl>          <dbl>
## 1 Male  Adult Alive       154        241   24.5         31.2           17.4
## 2 Male  Adult Alive       160        252   26.9         30.8           18.7
## 3 Male  Adult Alive       155        243   26.9         30.6           18.6
## 4 Male  Adult Alive       154        245   24.3         31.7           18.8
## 5 Male  Adult Alive       156        247   24.1         31.5           18.2
## 6 Male  Adult Alive       161        253   26.5         31.8           19.8
## # â€¦ with 4 more variables: Femur_Length <dbl>, Tarsus_Length <dbl>,
## #   Sternum_Length <dbl>, Skull_Width <dbl>``````

The data set has three categorical variables (`Sex`, with levels `Male` and `Female`, `Age`, with levels `Adult` and `Young`, and `Survival`, with levels `Alive` and `Dead`) and nine numerical variables that hold various aspects of the birdsâ€™ anatomy, such as wingspread, weight, etc.

Question 1: Perform a PCA on the numerical columns of this data set. Then make three plots potting the data as PC2 vs.Â PC1, colored by (i) sex, (ii) age, (iii) survival.

``````# do PCA
pca <- bumpus %>%
select(-Sex, -Age, -Survival) %>%
scale() %>%
prcomp()

pca``````
``````## Standard deviations (1, .., p=9):
## [1] 2.3106280 1.0027180 0.8144054 0.7377845 0.6802200 0.6403176 0.5074763
## [8] 0.4459542 0.3447601
##
## Rotation (n x k) = (9 x 9):
##                      PC1         PC2         PC3         PC4         PC5
## Length         0.3113182 -0.48175513  0.02229922 -0.41815857  0.19719494
## Wingspread     0.3519669 -0.28971389  0.32621226 -0.21858032  0.25216077
## Weight         0.3140960 -0.33493529 -0.21062751 -0.15604080 -0.72876548
## Skull_Length   0.3328845  0.13688694 -0.31044883  0.30038515 -0.29893156
## Humerus_Length 0.3851414  0.21428192  0.24369845 -0.04486213  0.05060431
## Femur_Length   0.3630277  0.41484276  0.16164946 -0.03621275  0.10202003
## Tarsus_Length  0.3392712  0.46086907  0.18672729 -0.12495881 -0.16734831
## Sternum_Length 0.2962715 -0.34123813  0.19778895  0.80186706  0.14007820
## Skull_Width    0.2944947  0.07876015 -0.76996029 -0.02926814  0.46526730
##                        PC6         PC7         PC8          PC9
## Length         -0.38191614 -0.51623648  0.15746650  0.139991140
## Wingspread     -0.01722238  0.58909210 -0.38269685 -0.283946038
## Weight          0.40902438  0.03600302  0.09143718 -0.111320803
## Skull_Length   -0.73394265  0.23323278 -0.03753325 -0.001420609
## Humerus_Length  0.14570457  0.30000698  0.40887962  0.680713192
## Femur_Length    0.04554965 -0.15236298  0.47981073 -0.634632805
## Tarsus_Length   0.08364904 -0.39844161 -0.63887140  0.141767330
## Sternum_Length  0.18447218 -0.23988512 -0.04919212  0.007200079
## Skull_Width     0.28903106  0.03545977 -0.10488329  0.033222346``````
``pca\$scale  # information on how the variables in the dataset were scaled``
``````##         Length     Wingspread         Weight   Skull_Length Humerus_Length
##      3.5608315      5.5122709      1.4773290      0.7054777      0.5904685
##   Femur_Length  Tarsus_Length Sternum_Length    Skull_Width
##      0.6094387      1.0296262      1.0045593      0.3805969``````
``pca\$center  # information on how the variables were centered around 0``
``````##         Length     Wingspread         Weight   Skull_Length Humerus_Length
##   3.281280e-15   1.200531e-15  -1.119802e-15  -1.963299e-15  -2.657737e-15
##   Femur_Length  Tarsus_Length Sternum_Length    Skull_Width
##  -1.976360e-15   1.238631e-15  -7.000119e-17  -1.787173e-15``````
``pca\$rotation   # resulting rotation matrix containing eigenvectors``
``````##                      PC1         PC2         PC3         PC4         PC5
## Length         0.3113182 -0.48175513  0.02229922 -0.41815857  0.19719494
## Wingspread     0.3519669 -0.28971389  0.32621226 -0.21858032  0.25216077
## Weight         0.3140960 -0.33493529 -0.21062751 -0.15604080 -0.72876548
## Skull_Length   0.3328845  0.13688694 -0.31044883  0.30038515 -0.29893156
## Humerus_Length 0.3851414  0.21428192  0.24369845 -0.04486213  0.05060431
## Femur_Length   0.3630277  0.41484276  0.16164946 -0.03621275  0.10202003
## Tarsus_Length  0.3392712  0.46086907  0.18672729 -0.12495881 -0.16734831
## Sternum_Length 0.2962715 -0.34123813  0.19778895  0.80186706  0.14007820
## Skull_Width    0.2944947  0.07876015 -0.76996029 -0.02926814  0.46526730
##                        PC6         PC7         PC8          PC9
## Length         -0.38191614 -0.51623648  0.15746650  0.139991140
## Wingspread     -0.01722238  0.58909210 -0.38269685 -0.283946038
## Weight          0.40902438  0.03600302  0.09143718 -0.111320803
## Skull_Length   -0.73394265  0.23323278 -0.03753325 -0.001420609
## Humerus_Length  0.14570457  0.30000698  0.40887962  0.680713192
## Femur_Length    0.04554965 -0.15236298  0.47981073 -0.634632805
## Tarsus_Length   0.08364904 -0.39844161 -0.63887140  0.141767330
## Sternum_Length  0.18447218 -0.23988512 -0.04919212  0.007200079
## Skull_Width     0.28903106  0.03545977 -0.10488329  0.033222346``````
``head(pca\$x)  # these were the values that were rotated; I've wrapped it with the head() function to keep knit from printing all 136 rows``
``````##             PC1        PC2        PC3        PC4        PC5        PC6
## [1,] -3.8856548 -1.3120204 -0.5218476  1.1129461 -0.1905058 -0.2336805
## [2,]  0.8224474 -0.3998802  0.7775967 -0.8865862 -0.2099604  1.2483988
## [3,] -0.6426387  0.2143840  0.1084914  0.1627538 -0.6951568  1.9420813
## [4,] -1.2227485  0.7674720  1.0376675  0.8433835 -0.5152572 -0.2262541
## [5,] -1.6146988  0.3682892  1.3968497  0.2316305 -0.3493214 -0.6703105
## [6,]  2.7783623 -0.1149411  1.0431605  0.4265468  0.3594881  0.5671766
##              PC7        PC8         PC9
## [1,]  0.96374801  0.1525549 -0.57521792
## [2,]  0.02392220 -1.0953482 -0.04083636
## [3,] -0.01924521 -0.3374622  0.10601678
## [4,]  0.88126091 -0.7067789  0.74563716
## [5,]  0.56087560 -0.5365881 -0.44692206
## [6,]  0.68771676  0.7389925  0.20013441``````
``````bumpus_pca <- data.frame(bumpus, pca\$x) # combine original data frame with PCs

# plot by sex
ggplot(bumpus_pca, aes(x = PC1, y = PC2, color = Sex)) +
geom_point()``````

``````# plot by age
ggplot(bumpus_pca, aes(x = PC1, y = PC2, color = Age)) +
geom_point()``````

``````# plot by survival
ggplot(bumpus_pca, aes(x = PC1, y = PC2, color = Survival)) +
geom_point()``````

Question 2: Now visualize the rotation matrix of the PCA obtained under Question 1.

Hint: Look at yesterdayâ€™s in class worksheet to find the code for visualizing the rotation matrix with arrows.

``````# capture the rotation matrix in a data frame
rotation_data <- data.frame(pca\$rotation) %>%
rownames_to_column(var = "bird_feature")

# define a pleasing arrow style
arrow_style <- arrow(
length = unit(0.05, "inches"),
type = "closed"
)

# now plot, using geom_segment() for arrows and geom_text for labels
ggplot(rotation_data) +
geom_segment(aes(xend = PC1, yend = PC2), x = 0, y = 0, arrow = arrow_style) +
geom_text(aes(x = PC1, y = PC2, label = bird_feature), hjust = 0, size = 3, color = "red") +
xlim(-1., 1.25) +
ylim(-1., 1.) +
coord_fixed() # fix aspect ratio to 1:1``````