Enter your name and EID here
Please submit both this completed Rmarkdown document and its knitted HTML, converted to PDF, on Canvas no later than 4:00 pm on April 2nd, 2019. These two documents will be graded jointly, so they must be consistent (as in, don’t change the Rmarkdown file without also updating the knitted HTML!).
All results presented must have corresponding code. Any answers/results given without the corresponding R code that generated the result will be considered absent. All code reported in your final project document should work properly. Please bear in mind that you will lose points for the following:
For this project, you will work with a dataset was extracted from the 1974 Motor Trend US magazine. It contains information about fuel consumption and 10 aspects of automobile design and performance for 32 automobiles.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The column contents are as follows:
Problem 1: (20 points) Make a logistic regression model that predicts transmission type (am
) from gross horsepower (hp
) and miles per galon (mpg
). Make another logistic regression model that also predicts transmission type from gross horsepower alone. Show the summary (using summary
) of each model below. Make a plot with two ROC curves, and explain which model better predicts transmission type. For this analysis, use the entire dataset as training data, and do not evaluate the mode on test data.
# R code goes here
Your answer goes here.
Problem 2: (40 points) We have now divided the mtcars
dataset into a training and a test data set (train_data
and test_data
):
train_fraction <- 0.5 # fraction of data for training purposes
set.seed(123) # set the seed to make the partition reproductible
train_size <- floor(train_fraction * nrow(mtcars)) # number of observations in training set
train_indices <- sample(1:nrow(mtcars), size = train_size)
train_data <- mtcars[train_indices, ] # get training data
test_data <- mtcars[-train_indices, ] # get test data
Fit a logistic regression model to predict transimission type on the training data set. Use the predictors hp
and mpg
to predict transimission type (am
). Your code should be appropriately commented with high-level statements about the code’s function. Using your model, predict the outcome on the test data set, and plot and discuss your results.
You should have two final plots: a plot with two ROC curves, one for the training and one for the test data set, and a density plot that shows how the linear predictor separates the two transmission types in the test data. Your discussion should, at least, cover the differences and similarities in model performance on the training vs. test data (including AUC) as well as a clear interpretation of each plot. Please limit your discussion to a maximum of 10 sentences.
# R code goes here
Your answer goes here.
Problem 3: (40 points) Think of one conceptual question to ask about the dataset mtcars
. You are welcome to use either the training, test, or full data set for this part. For your question, perform an exploratory statistical analysis (PCA, clustering, logistic regression, linear regression, ANOVA, etc.) with two corresponding figures. The analysis and plots must be multivariate (include at least three of the data columns). Discuss your findings, in particular how your analysis’ results reveal (or don’t reveal) an answer to your proposed question. Please limit your discussion to a maximum of 15 sentences.
To receive full credit for Part II, you will have to do the following:
Conceptual question: Please write your question here.
Please briefly describe your planned analysis and plots before doing them (5 sentences max).
# R code for your question goes here
Discussion and answer of your question goes here (15 sentences max).