class6.utf8.md

In-class worksheet 6

Feb 6, 2020

In this worksheet, we will continue to work with the tidyverse libraries:

library(tidyverse)

1. The msleep dataset

The msleep dataset, provided with ggplot2, contains information about sleep and awake times of different mammals:

msleep

## # A tibble: 83 x 11
##    name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
##    <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
##  1 Chee… Acin… carni Carn… lc                  12.1      NA        NA      11.9
##  2 Owl … Aotus omni  Prim… <NA>                17         1.8      NA       7  
##  3 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
##  4 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
##  5 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20  
##  6 Thre… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6
##  7 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3
##  8 Vesp… Calo… <NA>  Rode… <NA>                 7        NA        NA      17  
##  9 Dog   Canis carni Carn… domesticated        10.1       2.9       0.333  13.9
## 10 Roe … Capr… herbi Arti… lc                   3        NA        NA      21  
## # … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>

Verify that the sum of total sleep time (column sleep_total) and total awake time (column awake) adds up to 24h for all animals in the msleep dataset.

# R code goes here.

Make a list of all the domesticated species in the msleep dataset, in alphabetical order. Hint: Domesticated species have the entry “domesticated” in the column conservation.

# R code goes here.

For the different vore classifications, tally how many species are awake for at least 18 hours. Hint: Use the function tally().

# R code goes here.

Using the function top_n(), identify the top-10 least-awake animals and list them from least awake to most awake. Explain why this analysis gives you 11 results instead of 10. Hint: Before calling top_n(), use the function select() to extract the two columns name and sleep_total, in that order.

# R code goes here.

Considering only carnivores and herbivores, make a plot of the percent of time each animal is in REM sleep (out of the total sleep time) vs. the animal’s total sleep time. Hint: Use the operator | to indicate logical OR in the filter() function.

# R code goes here.

2. The diamonds dataset

The diamonds dataset provided by ggplot2 provides information about quality and price of 53940 diamonds:

head(diamonds)

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

The best cuts of diamonds are “Very Good”, “Premium”, and “Ideal”. Make a table that selects only those diamonds, and find the minimum, median, and maximum price for each cut. Hint: The operator %in% is helpful for selecting the diamond cuts.

# R code goes here.

For each of the different diamond cuts, calculate the mean carat level among the diamonds whose price falls within 10% of the most expensive diamond for that cut.

# R code goes here.

For each of the different diamond cuts, calculate the mean carat level among the top-10% most expensive diamonds.

# R code goes here.

Make a table that contains the median price for each combination of cut and clarity, and arrange the final table in descending order of median price.

# R code goes here.

Now arrange the same table first by cut and then within each cut group by median price.

# R code goes here.

3. If this was easy

For the diamonds data set, separately for each diamond cut, calculate the percentage of diamonds with a price above $10,000, and the median carat value for diamonds priced $10,000 or more.

# R code goes here.