In 1898, Hermon Bumpus, an American biologist working at Brown University, collected data on one of the first examples of natural selection directly observed in nature. Immediately following a bad winter storm, he collected 136 English house sparrows, Passer domesticus, and brought them indoors. Of these birds, 64 had died during the storm, but 72 recovered and survived. By comparing measurements of physical traits, Bumpus demonstrated physical differences between the dead and living birds. He interpreted this finding as evidence for natural selection as a result of this storm:
bumpus <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/bumpus_full.csv")
head(bumpus)
## Sex Age Survival Length Wingspread Weight Skull_Length Humerus_Length
## 1 Male Adult Alive 154 241 24.5 31.2 17.4
## 2 Male Adult Alive 160 252 26.9 30.8 18.7
## 3 Male Adult Alive 155 243 26.9 30.6 18.6
## 4 Male Adult Alive 154 245 24.3 31.7 18.8
## 5 Male Adult Alive 156 247 24.1 31.5 18.2
## 6 Male Adult Alive 161 253 26.5 31.8 19.8
## Femur_Length Tarsus_Length Sternum_Length Skull_Width
## 1 17.0 26.0 21.1 14.9
## 2 18.0 30.0 21.4 15.3
## 3 17.9 29.2 21.5 15.3
## 4 17.5 29.1 21.3 14.8
## 5 17.9 28.7 20.9 14.6
## 6 18.9 29.1 22.7 15.4
The data set has three categorical variables (Sex
with levels Male
and Female
; Age
with levels Adult
and Young
; and Survival
, with levels Alive
and Dead
) and nine numerical variables that hold various aspects of the birds’ anatomy, such as wingspread, weight, etc.
Split the bumpus
data set into a random training and test set. Use 70% of the data as a training set.
set.seed(13) # set the seed to make your partition reproductible
# your R code here
Fit a logistic regression model on the training data set, then predict the survival on the test data set, and plot the resulting ROC curves.
# model to use:
# Survival ~ Sex + Length + Weight + Humerus_Length + Sternum_Length
# your R code here
Calculate the area under the training and test curve for the following model.
# model to use:
# Survival ~ Weight + Humerus_Length
set.seed(13) # set the seed to make your partition reproductible
# your R code here
set.seed()
to create reproducibility in the way we partition the dataset?# your R code here (if the above was easy)
Your answer here.