Feb 4, 2020
Is the iris
dataset tidy? Explain why or why not.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Your answer goes here.
Is the HairEyeColor
dataset tidy? Explain why or why not.
HairEyeColor
## , , Sex = Male
##
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
##
## , , Sex = Female
##
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
Your answer goes here.
All subsequent code will be based on the tidyverse library. So we first have to load this library:
library(tidyverse)
Now, using the function filter()
, pick all the rows in the iris
dataset that pertain to species setosa, and store them in a new table called iris_setosa
.
# R code goes here.
Pick all the rows in the iris
dataset where species virginica has a sepal length > 7.
# R code goes here.
Are there any cases in the iris
dataset for which the ratio of sepal length to sepal width exceeds the ratio of petal length to petal width? Use filter()
to find out.
# R code goes here.
Create a pared-down table which contains only data for species setosa and which only has the columns Sepal.Length
and Sepal.Width
. Store the result in a table called iris_pared
.
# R code goes here.
Using the function mutate()
, create a new data column that holds the ratio of sepal length to sepal width. Store the resulting table in a variable called iris_ratio
.
# R code goes here.
Order the iris_ratio
table by species name and by increasing values of sepal length-to-width ratio.
# R code goes here.
Calculate the mean and standard deviation of the sepal lengths for each species. Do this by first creating a table grouped by species, which you call iris_grouped
. Then run summarize()
on that table.
# R code goes here.
Use the function n()
to count the number of observations for each species.
# R code goes here.
For each species, calculate the percentage of cases with sepal length > 5.5.
# R code goes here.
Take the iris_ratio
data set you have created and plot the distribution of sepal length-to-width ratios for the three species.
# R code goes here.
Now plot sepal length-to-width ratios vs. sepal lengths. Does it look like there is a relationship between the length-to-width ratios and the lengths? Does it matter whether you consider each species individually or all together? How could you find out?
# R code goes here.