Enter your name and EID here
This homework is due on Feb. 12, 2019 at 4:00pm. Please submit as a PDF file on Canvas.
In this homework, you are asked to evaluate two data sets and determine if they are tidy data sets. We are referring to a very specific definition of “tidy”, so if this term is unfamiliar to you, please review the lecture materials.
Problem 1: (2 pts) The dataset USAccDeaths
built into R contains accidental deaths in the US 1973-1978. You can run ?USAccDeaths
to learn more about this data set.
USAccDeaths
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
## 1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161
## 1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710
## 1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8466 8160
## 1976 7717 7461 7767 7925 8623 8945 10078 9179 8037 8488 7874
## 1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265
## 1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633
## Dec
## 1973 8927
## 1974 8680
## 1975 8034
## 1976 8647
## 1977 8796
## 1978 9240
Explain the variables present in this dataset. Using the variables in this dataset and the formal definition of tidy data that we learned in lecture, is this data set tidy? Explain why or why not.
Your answer goes here. 2-3 sentences only.
The dataset CO2
built into R contains data on the carbon dioxide uptake in grass plants. You can run ?CO2
to learn more about this data set.
head(CO2)
## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## 3 Qn1 Quebec nonchilled 250 34.8
## 4 Qn1 Quebec nonchilled 350 37.2
## 5 Qn1 Quebec nonchilled 500 35.3
## 6 Qn1 Quebec nonchilled 675 39.2
Explain the variables present in this dataset. Using the variables in this dataset and the formal definition of tidy data that we learned in lecture, is this data set tidy? Explain why or why not.
Your answer goes here. 2-3 sentences only.
Problem 2: (5 pts) Listed below are three examples of code that violate the rules in section 2 of the tidyverse style guide. Which tidyverse style guidelines are violated in these example?
iris %>% filter(Species=="versicolor") %>% head()
Your answer goes here. 1-2 sentences only.
iris[50,]
Your answer goes here. 1-2 sentences only.
boxplot (len ~ dose, data = ToothGrowth, range = 1, width = c(2, 2, 2), varwidth = TRUE, notch = FALSE, outline = TRUE)
Your answer goes here. 1-2 sentences only.
Problem 3: (3 pts) The NCbirths
contains 1409 birth records from North Carolina in 2001. The column contents are as follows:
NCbirths <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
head(NCbirths)
## Plural Sex MomAge Weeks Gained Smoke BirthWeightGm Low Premie Marital
## 1 1 1 32 40 38 0 3146.85 0 0 0
## 2 1 2 32 37 34 0 3288.60 0 0 0
## 3 1 1 27 39 12 0 3912.30 0 0 0
## 4 1 1 27 39 15 0 3855.60 0 0 0
## 5 1 1 25 39 32 0 3430.35 0 0 0
## 6 1 1 28 43 32 0 3316.95 0 0 0
For single births, what are the max completed weeks of gestation and the mean birth weight for babies that were born prematurely and for babies that were carried to term? State your answer in a sentence. HINT: Use the function max()
to determine the maximum completed weeks of gestation.
# R code goes here
Your answer goes here. 2-3 sentences only.