Homework 2

Enter your name and EID here

This homework is due on Feb. 3, 2020 at 12:00pm. Please submit as a PDF file on Canvas.

This homework uses the Cars93 data set. Each observation in the data frame contains information on passenger cars from 1993. This is a big data frame with 27 columns. We are interested in the information on manufacturer (Manufacturer), car model (Model), type of car (Type), midrange price in $1000 (Price), maximum horsepower (Horsepower), drivetrain (DriveTrain), and city MPG (miles per US gallon, MPG.city).

Cars93 <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/Cars93.csv")
head(Cars93)
##   Manufacturer   Model    Type Min.Price Price Max.Price MPG.city
## 1        Acura Integra   Small      12.9  15.9      18.8       25
## 2        Acura  Legend Midsize      29.2  33.9      38.7       18
## 3         Audi      90 Compact      25.9  29.1      32.3       20
## 4         Audi     100 Midsize      30.8  37.7      44.6       19
## 5          BMW    535i Midsize      23.7  30.0      36.2       22
## 6        Buick Century Midsize      14.2  15.7      17.3       22
##   MPG.highway            AirBags DriveTrain Cylinders EngineSize
## 1          31               None      Front         4        1.8
## 2          25 Driver & Passenger      Front         6        3.2
## 3          26        Driver only      Front         6        2.8
## 4          26 Driver & Passenger      Front         6        2.8
## 5          30        Driver only       Rear         4        3.5
## 6          31        Driver only      Front         4        2.2
##   Horsepower  RPM Rev.per.mile Man.trans.avail Fuel.tank.capacity
## 1        140 6300         2890             Yes               13.2
## 2        200 5500         2335             Yes               18.0
## 3        172 5500         2280             Yes               16.9
## 4        172 5500         2535             Yes               21.1
## 5        208 5700         2545             Yes               21.1
## 6        110 5200         2565              No               16.4
##   Passengers Length Wheelbase Width Turn.circle Rear.seat.room
## 1          5    177       102    68          37           26.5
## 2          5    195       115    71          38           30.0
## 3          5    180       102    67          37           28.0
## 4          6    193       106    70          37           31.0
## 5          4    186       109    69          39           27.0
## 6          6    189       105    69          41           28.0
##   Luggage.room Weight  Origin          Make
## 1           11   2705 non-USA Acura Integra
## 2           15   3560 non-USA  Acura Legend
## 3           14   3375 non-USA       Audi 90
## 4           17   3405 non-USA      Audi 100
## 5           13   3640 non-USA      BMW 535i
## 6           16   2880     USA Buick Century

Problem 1: (3 pts) Use ggplot2 to create a scatter plot of maximum horse power versus car price. In the same plot, fit a curve to these data using geom_smooth(). In 1-2 sentences, what broad trend do you observe in horsepower for different car prices? HINT: Plot maximum horse power on the y-axis and price on the x-axis.

ggplot(Cars93, aes(x = Price, y = Horsepower)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Horsepower generally increases with price until $40,000, and then appears to level off after. However, there are few data points for cars costing more than $40,000.

Problem 2: (3 pts) Next, create a scatter plot of horsepower against city MPG, facetted by type of drivetrain. In 1-2 sentences, make one observation about the data from this plot.

ggplot(Cars93, aes(x = Horsepower, y = MPG.city)) +
  geom_point() +
  facet_wrap(~DriveTrain)

Observations:

  1. City MPG and horsepower are negatively correlated (i.e., as city MPG goes up, horsepower goes down)

  2. Cars that are rear-wheel drive have the lowest MPG city.

  3. Front-wheel drive cars have the highest MPG city.

  4. All three types of drivetrains have similar ranges of maximum horsepower (100-300HP).

Problem 3: (2 pts) Plot the distribution of car price, once using geom_histogram() and once using geom_density().

ggplot(Cars93, aes(x = Price)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(Cars93, aes(x = Price)) +
  geom_density()

Problem 4: (2 pts) What does the y-axis in your histogram represent? In your density plot, what is the total area under the curve? Please give a single number as your answer. HINT: You do not need to do any additional calculations to determine the area under the curve. Use Google to find the answer.

The y-axis in a histogram represents counts in a given range of x-values (bins). The area under the curve of a density plot always sums to 1.