In-class worksheet 6

Feb 1, 2018

In this worksheet, we will continue to work with the tidyverse libraries:

library(tidyverse)

1. The msleep dataset

The msleep dataset, provided with ggplot2, contains information about sleep and awake times of different mammals:

msleep
## # A tibble: 83 x 11
##                          name       genus  vore        order conservation
##                         <chr>       <chr> <chr>        <chr>        <chr>
##  1                    Cheetah    Acinonyx carni    Carnivora           lc
##  2                 Owl monkey       Aotus  omni     Primates         <NA>
##  3            Mountain beaver  Aplodontia herbi     Rodentia           nt
##  4 Greater short-tailed shrew     Blarina  omni Soricomorpha           lc
##  5                        Cow         Bos herbi Artiodactyla domesticated
##  6           Three-toed sloth    Bradypus herbi       Pilosa         <NA>
##  7          Northern fur seal Callorhinus carni    Carnivora           vu
##  8               Vesper mouse     Calomys  <NA>     Rodentia         <NA>
##  9                        Dog       Canis carni    Carnivora domesticated
## 10                   Roe deer   Capreolus herbi Artiodactyla           lc
## # ... with 73 more rows, and 6 more variables: sleep_total <dbl>,
## #   sleep_rem <dbl>, sleep_cycle <dbl>, awake <dbl>, brainwt <dbl>,
## #   bodywt <dbl>

Verify that the sum of total sleep time (column sleep_total) and total awake time (column awake) adds up to 24h for all animals in the msleep dataset.

# R code goes here.

Make a list of all the domesticated species in the msleep dataset, in alphabetical order. Hint: domesticated species have the entry “domesticated” in the column conservation.

# R code goes here.

For the different vore classifications, tally how many species are awake for at least 18 hours. Hint: use the function tally().

# R code goes here.

Using the function top_n(), identify the top-10 least-awake animals, and list them from least awake to most awake. Explain why this analysis gives you 11 results instead of 10.

# R code goes here.

Considering only carnivores and herbivores, make a plot of the percent of time each animal is in REM sleep (out of the total sleep time) vs. the animal’s total sleep time. Hint: Use the operator | to indicate logical OR in the filter() function.

# R code goes here.

2. The diamonds dataset

The diamonds dataset provided by ggplot2 provides information about quality and price of 53940 diamonds:

head(diamonds)
## # A tibble: 6 x 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

The best cuts of diamonds are “Very Good”, “Premium”, and “Ideal”. Make a table that selects only those diamonds, and find the minimum, median, and maximum price for each cut. Hint: The operator %in% is helpful for selecting the diamond cuts.

# R code goes here.

For each of the different diamond cuts, calculate the mean carat level among the diamonds whose price falls within 10% of the most expensive diamond for that cut.

# R code goes here.

For each of the different diamond cuts, calculate the mean carat level among the top-10% most expensive diamonds.

# R code goes here.

Make a table that contains the median price for each combination of cut and clarity, and arrange the final table in descending order of median price.

# R code goes here.

Now arrange the same table first by cut and then within each cut group by median price.

# R code goes here.

3. If this was easy

For the diamonds data set, using the function do(), fit a linear model of price vs. carat separately for each cut. Then make a table that holds the resulting intercepts and slopes.

# R code goes here.