2025-06-22
Scale transformations are applied before statistical transformations
Housing prices follow long-term exponential growth, overlaid with boom/bust cycles
House Price Index (HPI) for California. Source: Freddie Mac
Did housing prices in California decline substantially from 1990 to 1998?
Did housing prices in West Virginia recover by 2020?
US States House Price Index (HPI). Source: Freddie Mac
Did housing prices in California decline substantially from 1990 to 1998?
Did housing prices in West Virginia recover by 2020?
US States House Price Index (HPI). Source: Freddie Mac
Did housing prices in California decline substantially from 1990 to 1998? — yes
Did housing prices in West Virginia recover by 2020? — no
US States House Price Index (HPI). Source: Freddie Mac
Two choices:
It is critical to make the correct choice for the dataset at hand
Any type of growth or decay process (change is proportional to present value) must be analyzed in log space
US States House Price Index (HPI). Source: Freddie Mac
US States House Price Index (HPI). Source: Freddie Mac
US States House Price Index (HPI). Source: Freddie Mac
CO2 abundance in the atmosphere over time. Source: NOAA Global Monitoring Laboratory
CO2 abundance in the atmosphere over time. Source: NOAA Global Monitoring Laboratory
We can use STL to decompose a time series into:
long-term trend
seasonal effect
remainder (noise)
Magnitude of remainder should be small compared to magnitude of seasonal fluctuations
Magnitude of remainder should be small compared to magnitude of seasonal fluctuations
Simpler approaches:
More complex approaches:
All of these are beyond the scope of this class
First dataset: blue_jays
# A tibble: 123 × 8
   bird_id    sex   bill_depth_mm bill_width_mm bill_length_mm head_length_mm
   <chr>      <chr>         <dbl>         <dbl>          <dbl>          <dbl>
 1 0000-00000 M              8.26          9.21           25.9           56.6
 2 1142-05901 M              8.54          8.76           25.0           56.4
 3 1142-05905 M              8.39          8.78           26.1           57.3
 4 1142-05907 F              7.78          9.3            23.5           53.8
 5 1142-05909 M              8.71          9.84           25.5           57.3
 6 1142-05911 F              7.28          9.3            22.2           52.2
 7 1142-05912 M              8.74          9.28           25.4           57.1
 8 1142-05914 M              8.72          9.94           30             60.7
 9 1142-05917 F              8.2           9.01           22.8           52.8
10 1142-05920 F              7.67          9.31           24.6           54.9
# ℹ 113 more rows
# ℹ 2 more variables: body_mass_g <dbl>, skull_size_mm <dbl>Second dataset: cars93
# A tibble: 93 × 27
   Manufacturer Model      Type   Min.Price Price Max.Price MPG.city MPG.highway
   <chr>        <chr>      <chr>      <dbl> <dbl>     <dbl>    <dbl>       <dbl>
 1 Acura        Integra    Small       12.9  15.9      18.8       25          31
 2 Acura        Legend     Midsi…      29.2  33.9      38.7       18          25
 3 Audi         90         Compa…      25.9  29.1      32.3       20          26
 4 Audi         100        Midsi…      30.8  37.7      44.6       19          26
 5 BMW          535i       Midsi…      23.7  30        36.2       22          30
 6 Buick        Century    Midsi…      14.2  15.7      17.3       22          31
 7 Buick        LeSabre    Large       19.9  20.8      21.7       19          28
 8 Buick        Roadmaster Large       22.6  23.7      24.9       16          25
 9 Buick        Riviera    Midsi…      26.3  26.3      26.3       19          27
10 Cadillac     DeVille    Large       33    34.7      36.3       16          25
# ℹ 83 more rows
# ℹ 19 more variables: AirBags <chr>, DriveTrain <chr>, Cylinders <chr>,
#   EngineSize <dbl>, Horsepower <dbl>, RPM <dbl>, Rev.per.mile <dbl>,
#   Man.trans.avail <chr>, Fuel.tank.capacity <dbl>, Passengers <dbl>,
#   Length <dbl>, Wheelbase <dbl>, Width <dbl>, Turn.circle <dbl>,
#   Rear.seat.room <dbl>, Luggage.room <dbl>, Weight <dbl>, Origin <chr>,
#   Make <chr>geom_smooth()Scatter plot only
geom_smooth()Scatter plot with loess smooth
geom_smooth()Scatter plot with linear regression
geom_smooth()Scatter plot with linear regression, no confidence band
geom_smooth()Scatter plot with linear regression by sex
Do more expensive cars have a larger fuel tank?
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Caution: Exact shape of smoothing line depends on method details
Smoothing lines are particularly unreliable near their endpoints
geom_smooth()