Homework 3

Enter your name and EID here

This homework is due on Feb. 6, 2018 at 7:00pm. Please submit as a PDF file on Canvas.

In this homework, you are asked to evaluate two data sets and determine if they are tidy data sets. We are referring to a very specific definition of “tidy”, so if this term is unfamiliar to you, please review the lecture materials.

Problem 1: (2 pts) The dataset WorldPhones built into R contains the number of telephones (in thousands) in various regions of the world for the years 1951 and 1956-1961. You can run ?WorldPhones to learn more about this data set.

WorldPhones
##      N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer
## 1951  45939  21574 2876   1815    1646     89      555
## 1956  60423  29990 4708   2568    2366   1411      733
## 1957  64721  32510 5230   2695    2526   1546      773
## 1958  68484  35218 6662   2845    2691   1663      836
## 1959  71799  37598 6856   3000    2868   1769      911
## 1960  76036  40341 8220   3145    3054   1905     1008
## 1961  79831  43173 9053   3338    3224   2005     1076

Explain the variables present in this dataset. Using the variables in this dataset and the formal definition of tidy data that we learned in lecture, is this data set tidy? Explain why or why not.

Your answer goes here.

The dataset ToothGrowth built into R contains data on the effect of vitamin C on tooth growth in 60 Guinea pigs. You can run ?ToothGrowth to learn more about this data set.

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Explain the variables present in this dataset. Using the variables in this dataset and the formal definition of tidy data that we learned in lecture, is this data set tidy? Explain why or why not.

Your answer goes here.

Problem 2: (2 pts) The MedGPA dataset contains information about medical school admission. The dataset has 55 observations and 11 columns. It contains information on acceptance status (Accept) with levels A for accepted and D for denied, indicator for acceptance status (Acceptance) with levels 1 for accepted and 0 for denied, sex of a student (Sex), Biology/Chemistry/Physics/Math grade point average (BCPM), college grade point average (GPA), MCAT exam’s verbal reasoning score (VR), MCAT exam’s physical sciences score (PS), MCAT exam’s writing sample score (WS), MCAT exam’s biological science score (BS), MCAT exam’s total score (sum of VR+PS+WS+BS), and the number of medical schools the student applied to (Apps).

MedGPA <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/MedGPA.csv")
head(MedGPA)
##   Accept Acceptance Sex BCPM  GPA VR PS WS BS MCAT Apps
## 1      D          0   F 3.59 3.62 11  9  9  9   38    5
## 2      A          1   M 3.75 3.84 12 13  8 12   45    3
## 3      A          1   F 3.24 3.23  9 10  5  9   33   19
## 4      A          1   F 3.74 3.69 12 11  7 10   40    5
## 5      A          1   F 3.53 3.38  9 11  4 11   35   11
## 6      A          1   M 3.59 3.72 10  9  7 10   36    5

What are the mean GPA and the mean MCAT exam score for students that were accepted and for students that were denied? State your answer in a sentence.

# R code goes here

Your answer goes here. 1-2 sentences only.

Problem 3: (3 pts) For female students that were accepted, what was the minimum and the maximum number of medical schools the students applied to? HINT: Use the functions max() and min() to determine the maximum and the minimum number of schools applied.

# R code goes here

Your answer goes here. 1-2 sentences only.

Problem 4: (3 pts) Ask a question about the MedGPA data set. Your question should not repeat the questions in problems 2 or 3. Describe in 1-2 sentences how you would answer this question with an analysis or a graph.

Your answer goes here. 2-3 sentences only.