Feb 7, 2019
In this worksheet, we will continue to work with the tidyverse libraries:
library(tidyverse)
The msleep
dataset, provided with ggplot2, contains information about sleep and awake times of different mammals:
msleep
## # A tibble: 83 x 11
## name genus vore order conservation sleep_total sleep_rem sleep_cycle
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Chee… Acin… carni Carn… lc 12.1 NA NA
## 2 Owl … Aotus omni Prim… <NA> 17 1.8 NA
## 3 Moun… Aplo… herbi Rode… nt 14.4 2.4 NA
## 4 Grea… Blar… omni Sori… lc 14.9 2.3 0.133
## 5 Cow Bos herbi Arti… domesticated 4 0.7 0.667
## 6 Thre… Brad… herbi Pilo… <NA> 14.4 2.2 0.767
## 7 Nort… Call… carni Carn… vu 8.7 1.4 0.383
## 8 Vesp… Calo… <NA> Rode… <NA> 7 NA NA
## 9 Dog Canis carni Carn… domesticated 10.1 2.9 0.333
## 10 Roe … Capr… herbi Arti… lc 3 NA NA
## # … with 73 more rows, and 3 more variables: awake <dbl>, brainwt <dbl>,
## # bodywt <dbl>
Verify that the sum of total sleep time (column sleep_total
) and total awake time (column awake
) adds up to 24h for all animals in the msleep
dataset.
# R code goes here.
Make a list of all the domesticated species in the msleep
dataset, in alphabetical order. Hint: Domesticated species have the entry “domesticated” in the column conservation
.
# R code goes here.
For the different vore classifications, tally how many species are awake for at least 18 hours. Hint: Use the function tally()
.
# R code goes here.
Using the function top_n()
, identify the top-10 least-awake animals and list them from least awake to most awake. Explain why this analysis gives you 11 results instead of 10. Hint: Before calling top_n()
, use the function select()
to extract the two columns name
and sleep_total
, in that order.
# R code goes here.
Considering only carnivores and herbivores, make a plot of the percent of time each animal is in REM sleep (out of the total sleep time) vs. the animal’s total sleep time. Hint: Use the operator |
to indicate logical OR in the filter()
function.
# R code goes here.
The diamonds
dataset provided by ggplot2 provides information about quality and price of 53940 diamonds:
head(diamonds)
## # A tibble: 6 x 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.290 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
The best cuts of diamonds are “Very Good”, “Premium”, and “Ideal”. Make a table that selects only those diamonds, and find the minimum, median, and maximum price for each cut. Hint: The operator %in%
is helpful for selecting the diamond cuts.
# R code goes here.
For each of the different diamond cuts, calculate the mean carat level among the diamonds whose price falls within 10% of the most expensive diamond for that cut.
# R code goes here.
For each of the different diamond cuts, calculate the mean carat level among the top-10% most expensive diamonds.
# R code goes here.
Make a table that contains the median price for each combination of cut
and clarity
, and arrange the final table in descending order of median price.
# R code goes here.
Now arrange the same table first by cut and then within each cut group by median price.
# R code goes here.
For the diamonds
data set, separately for each diamond cut, calculate the percentage of diamonds with a price above $10,000, and the median carat value for diamonds priced $10,000 or more.
# R code goes here.