class2.utf8.md

In-class worksheet 2

Jan 23, 2020

1. t test

We will try the t test on the built-in data set PlantGrowth. However, first we need to reformat the data set, which we do with the function unstack(). We store the reformatted data set in a variable plants:

head(PlantGrowth)

##   weight group
## 1   4.17  ctrl
## 2   5.58  ctrl
## 3   5.18  ctrl
## 4   6.11  ctrl
## 5   4.50  ctrl
## 6   4.61  ctrl

plants <- unstack(PlantGrowth)
head(plants)

##   ctrl trt1 trt2
## 1 4.17 4.81 6.31
## 2 5.58 4.17 5.12
## 3 5.18 4.41 5.54
## 4 6.11 3.59 5.50
## 5 4.50 5.87 5.37
## 6 4.61 3.83 5.29

The data set contains plant growth yield (dry weight) under one control and two treatment conditions:

boxplot(plants)

Question: Is the mean control weight significantly different from the mean weight under treatment 1? Is the mean weight under treatment 1 significantly different from the mean weight under treatment 2? Use the function t.test() to find out.

# R code goes here.

2. Correlation

We will try the correlation test on the built-in data set cars. The data set contains the speed of cars and the distances taken to stop, measured in the 1920s:

head(cars)

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

Is there a relationship between speed and stopping distance? Use the function cor.test() to find out. Then make a scatterplot of speed vs. stopping distance, using the function plot().

# R code goes here.

3. Regression

We will do a regression analysis on the data set cabbages from the R package MASS. The data set contains the weight (HeadWt), vitamin C content (VitC), the cultivar (Cult), and the planting date (Date) for 60 cabbage heads:

data(cabbages, package = "MASS") # make the dataset available
head(cabbages)

##   Cult Date HeadWt VitC
## 1  c39  d16    2.5   51
## 2  c39  d16    2.2   55
## 3  c39  d16    3.1   45
## 4  c39  d16    4.3   42
## 5  c39  d16    2.5   53
## 6  c39  d16    4.3   50

Use a multivariate regression to find out whether weight and cultivar have an effect on the vitamin C content. You will need to use the functions lm() and summary().

# R code goes here.

4. If this was easy

Look into the function predict(). Can you use it to estimate the vitamin C content of a c52 cultivar with a weight of 4? Can you use it to calculate the residuals of the regression model?

# R code goes here.