Enter your name and EID here
This homework is due on Feb. 3, 2020 at 12:00pm. Please submit as a PDF file on Canvas.
This homework uses the Cars93
data set. Each observation in the data frame contains information on passenger cars from 1993. This is a big data frame with 27 columns. We are interested in the information on manufacturer (Manufacturer
), car model (Model
), type of car (Type
), midrange price in $1000 (Price
), maximum horsepower (Horsepower
), drivetrain (DriveTrain
), and city MPG (miles per US gallon, MPG.city
).
Cars93 <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/Cars93.csv")
head(Cars93)
## Manufacturer Model Type Min.Price Price Max.Price MPG.city
## 1 Acura Integra Small 12.9 15.9 18.8 25
## 2 Acura Legend Midsize 29.2 33.9 38.7 18
## 3 Audi 90 Compact 25.9 29.1 32.3 20
## 4 Audi 100 Midsize 30.8 37.7 44.6 19
## 5 BMW 535i Midsize 23.7 30.0 36.2 22
## 6 Buick Century Midsize 14.2 15.7 17.3 22
## MPG.highway AirBags DriveTrain Cylinders EngineSize
## 1 31 None Front 4 1.8
## 2 25 Driver & Passenger Front 6 3.2
## 3 26 Driver only Front 6 2.8
## 4 26 Driver & Passenger Front 6 2.8
## 5 30 Driver only Rear 4 3.5
## 6 31 Driver only Front 4 2.2
## Horsepower RPM Rev.per.mile Man.trans.avail Fuel.tank.capacity
## 1 140 6300 2890 Yes 13.2
## 2 200 5500 2335 Yes 18.0
## 3 172 5500 2280 Yes 16.9
## 4 172 5500 2535 Yes 21.1
## 5 208 5700 2545 Yes 21.1
## 6 110 5200 2565 No 16.4
## Passengers Length Wheelbase Width Turn.circle Rear.seat.room
## 1 5 177 102 68 37 26.5
## 2 5 195 115 71 38 30.0
## 3 5 180 102 67 37 28.0
## 4 6 193 106 70 37 31.0
## 5 4 186 109 69 39 27.0
## 6 6 189 105 69 41 28.0
## Luggage.room Weight Origin Make
## 1 11 2705 non-USA Acura Integra
## 2 15 3560 non-USA Acura Legend
## 3 14 3375 non-USA Audi 90
## 4 17 3405 non-USA Audi 100
## 5 13 3640 non-USA BMW 535i
## 6 16 2880 USA Buick Century
Problem 1: (3 pts) Use ggplot2 to create a scatter plot of maximum horse power versus car price. In the same plot, fit a curve to these data using geom_smooth()
. In 1-2 sentences, what broad trend do you observe in horsepower for different car prices? HINT: Plot maximum horse power on the y-axis and price on the x-axis.
ggplot(Cars93, aes(x = Price, y = Horsepower)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Horsepower generally increases with price until $40,000, and then appears to level off after. However, there are few data points for cars costing more than $40,000.
Problem 2: (3 pts) Next, create a scatter plot of horsepower against city MPG, facetted by type of drivetrain. In 1-2 sentences, make one observation about the data from this plot.
ggplot(Cars93, aes(x = Horsepower, y = MPG.city)) +
geom_point() +
facet_wrap(~DriveTrain)
Observations:
City MPG and horsepower are negatively correlated (i.e., as city MPG goes up, horsepower goes down)
Cars that are rear-wheel drive have the lowest MPG city.
Front-wheel drive cars have the highest MPG city.
All three types of drivetrains have similar ranges of maximum horsepower (100-300HP).
Problem 3: (2 pts) Plot the distribution of car price, once using geom_histogram()
and once using geom_density()
.
ggplot(Cars93, aes(x = Price)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(Cars93, aes(x = Price)) +
geom_density()
Problem 4: (2 pts) What does the y-axis in your histogram represent? In your density plot, what is the total area under the curve? Please give a single number as your answer. HINT: You do not need to do any additional calculations to determine the area under the curve. Use Google to find the answer.
The y-axis in a histogram represents counts in a given range of x-values (bins). The area under the curve of a density plot always sums to 1.