Homework 3

Enter your name and EID here

This homework is due on Feb. 10, 2019 at 12:00pm. Please submit as a PDF file on Canvas.

In this homework, you are asked to evaluate two data sets and determine if they are tidy data sets. We are referring to a very specific definition of “tidy”, so if this term is unfamiliar to you, please review the lecture materials.

Problem 1: (3 pts) The dataset ldeaths built into R is a time series giving the monthly deaths from bronchitis, emphysema and asthma in the UK (1974-1979). You can run ?ldeaths to learn more about this data set. Using the variables in this dataset and the formal definition of tidy data that we learned in lecture, is this data set tidy? Explain why or why not.

ldeaths
##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1974 3035 2552 2704 2554 2014 1655 1721 1524 1596 2074 2199 2512
## 1975 2933 2889 2938 2497 1870 1726 1607 1545 1396 1787 2076 2837
## 1976 2787 3891 3179 2011 1636 1580 1489 1300 1356 1653 2013 2823
## 1977 3102 2294 2385 2444 1748 1554 1498 1361 1346 1564 1640 2293
## 1978 2815 3137 2679 1969 1870 1633 1529 1366 1357 1570 1535 2491
## 1979 3084 2605 2573 2143 1693 1504 1461 1354 1333 1492 1781 1915

Your answer here.

The dataset airquality built into R contains daily air quality measurements in New York, May to September in 1973. You can run ?airquality to learn more about this data set. Using the variables in this dataset and the formal definition of tidy data that we learned in lecture, is this data set tidy? Explain why or why not.

head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

Your answer here.

Problem 2: (3 pts) Listed below are three examples of code that violate the rules in section 2 of the tidyverse style guide. Name at least one style violation in each example.

ToothGrowth %>% filter(supp=="OJ") %>% head()

Your answer here.

ToothGrowth[,1]

Your answer here.

boxplot ( len ~ dose, data = ToothGrowth, range = 1, width = c(2, 2, 2), varwidth = TRUE, notch = FALSE, outline = TRUE )

Your answer here.

Problem 3: (4 pts) The NCbirths contains 1409 birth records from North Carolina in 2001. The column contents are as follows:

NCbirths <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/NCbirths.csv")
head(NCbirths)
##   Plural Sex MomAge Weeks Gained Smoke BirthWeightGm Low Premie Marital
## 1      1   1     32    40     38     0       3146.85   0      0       0
## 2      1   2     32    37     34     0       3288.60   0      0       0
## 3      1   1     27    39     12     0       3912.30   0      0       0
## 4      1   1     27    39     15     0       3855.60   0      0       0
## 5      1   1     25    39     32     0       3430.35   0      0       0
## 6      1   1     28    43     32     0       3316.95   0      0       0

Using some of the analysis functions we’ve discussed in class (i.e., mutate(), filter(), group_by(), summarize(), etc), write code that outputs the answer to the following question:

For premature births, what is the maximum age of mothers and the mean birth weight for single babies, twins and triplets? Using the computed results, answer the question in 1-2 sentences. HINT: Use the function max() to determine the maximum age of mothers.

# Your R code here

Your answer here.