In-class worksheet, class 18

1. Bar plots

The MASS package contains a data set bacteria which contains data from tests of the presence of the bacterium H. influenzae in children with otitis media in the Northern Territory of Australia. We are interested in two columns of this data set: y reports presence (y) or absence (n) of the bacterium. trt reports the treatment, which was placebo, drug, or drug+ (drug plus high adherence).

library(MASS)
head(bacteria)
##   y ap hilo week  ID     trt
## 1 y  p   hi    0 X01 placebo
## 2 y  p   hi    2 X01 placebo
## 3 y  p   hi    4 X01 placebo
## 4 y  p   hi   11 X01 placebo
## 5 y  a   hi    0 X02   drug+
## 6 y  a   hi    2 X02   drug+

Make a bar plot that shows the absolute number of cases with or without the bacterium, stacked on top of each other, for each treatment.

# R code goes here.

Now modify the plot so that bars representing the absolute number of cases with or without the bacterium are shown side-by-side (position='dodge').

# R code goes here.

Now modify the plot so that bars represent the relative number of cases with or without the bacterium. What is the appropriate position option in geom_bar() to achieve this effect?

# R code goes here.

2. Histograms and density plots

Make a histogram plot of sepal lengths in the iris data set, using the default histogram settings. Then make two more such plots, with different bin widths. Use geom_histogram()

# R code goes here.

Instead of geom_histogram(), now use geom_density() and fill the area under the curves by species identity.

# R code goes here.

Now make the areas under the curve partially transparent, so the overlap of the various distributions becomes clearly visible.

# R code goes here.

3. Scales

Using the movies data set provided by ggplot2, make a scatter plot of the number of votes (votes) vs. the length of the movie (length). Use a log scale for both the x and the y axis.

# R code goes here.

Now color the points by year and use a color gradient that goes from yellow to blue. You can change the color scale using scale_color_gradient().

# R code goes here.

Now zoom in to movies that are between 1 and 20 minutes long, using xlim() instead of scale_x_log10().

# R code goes here.

4. If this was easy

Take the log-log plot of votes vs. length from the movies data set and plot only movies that are between 1 and 20 minutes long, but keep the log scale.

# R code goes here.

Make a log-log plot of votes vs. length from the movies data set, faceted by year. Plot a trend line onto each facet, without confidence band.

# R code goes here.

Make a bar plot of the number of movies per year in the dataset.

# R code goes here.

Go back to the bacteria dataset, make a bar plot that shows the total number of cases within each treatment, and plot the number of such cases on top of each bar.

# R code goes here.