class8.utf8.md

In-class worksheet 8

Feb 13, 2020

In this worksheet, we will use the library tidyverse:

library(tidyverse)

1. Making wide tables longer

Consider the following data set, which contains information about income and religious affiliation in the US:

pew <- read_csv("http://wilkelab.org/classes/SDS348/data_sets/pew.csv")

## Parsed with column specification:
## cols(
##   religion = col_character(),
##   below10k = col_double(),
##   from10to20k = col_double(),
##   from20to30k = col_double(),
##   from30to40k = col_double(),
##   from40to50k = col_double(),
##   from50to75k = col_double(),
##   from75to100k = col_double(),
##   from100to150k = col_double(),
##   above150k = col_double(),
##   no_answer = col_double()
## )

head(pew)

## # A tibble: 6 x 11
##   religion below10k from10to20k from20to30k from30to40k from40to50k from50to75k
##   <chr>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
## 1 Agnostic       27          34          60          81          76         137
## 2 Atheist        12          27          37          52          35          70
## 3 Buddhist       27          21          30          34          33          58
## 4 Catholic      418         617         732         670         638        1116
## 5 Don't k…       15          14          15          11          10          35
## 6 Evangel…      575         869        1064         982         881        1486
## # … with 4 more variables: from75to100k <dbl>, from100to150k <dbl>,
## #   above150k <dbl>, no_answer <dbl>

This table is not tidy, because income levels are used as column headers rather than as levels of an income variable.

Use pivot_longer() to turn this table into a table with three columns, one for religion, one for income (called income), and one for the count of people with the respective combination of income and religion (called count).

# R code goes here.

Now call the income column income_level and the count column number_of_people.

# R code goes here.

Now, instead of using data from all columns, use only the data from columns below10k, from20to30k, and from50to75k, such that your final data frame contains only these three income levels. Sort your final data frame according to religion and then income_level.

# R code goes here.

2. Making long tables wider

Consider the following data set, which contains information about the sex, weight, and height of 200 individuals:

persons <- read_csv("http://wilkelab.org/classes/SDS348/data_sets/persons.csv")

## Parsed with column specification:
## cols(
##   subject = col_double(),
##   indicator = col_character(),
##   value = col_character()
## )

head(persons)

## # A tibble: 6 x 3
##   subject indicator value
##     <dbl> <chr>     <chr>
## 1       1 sex       M    
## 2       1 weight    77   
## 3       1 height    182  
## 4       2 sex       F    
## 5       2 weight    58   
## 6       2 height    161

Is this data set tidy? And can you rearrange it so that you have one column for subject, one for sex, one for weight, and one for height?

# R code goes here.

For the data set diamonds from the ggplot2 package, create a table displaying the mean price for each combination of cut and clarity. Then use pivot_wider() to rearrange this table into a wide format, such that there is a column of mean prices for each cut level (Fair, Good, Very Good, etc.).

# R code goes here.

3. If this was easy

Take the sepal lengths from the iris dataset and put them into a wide table so that is one data column per species. You might be tempted to do this with the following code, which however doesn’t work. Can you explain why?

# If you remove the # signs in the lines below you will get an error; this code doesn't work
# iris %>% 
#   select(Sepal.Length, Species) %>%
#   pivot_wider(names_from = "Species", values_from = "Sepal.Length")

Explanation goes here.

# R code goes here.