Visualizing Trends

Claus O. Wilke

2025-02-23

Reminder: The Grammar-of-Graphics pipeline

Scale transformations are applied before statistical transformations

Detrending: Removing the underlying trend

Raw time series can be misleading

 

Did housing prices in California decline substantially from 1990 to 1998?

Did housing prices in West Virginia recover by 2020?

US States House Price Index (HPI). Source: Freddie Mac

Comparing the raw time series to the trendline helps

 

Did housing prices in California decline substantially from 1990 to 1998?

Did housing prices in West Virginia recover by 2020?

US States House Price Index (HPI). Source: Freddie Mac

Even better: Remove underlying trend

 

Did housing prices in California decline substantially from 1990 to 1998? — yes

Did housing prices in West Virginia recover by 2020? — no

US States House Price Index (HPI). Source: Freddie Mac

Housing-price analysis in linear space looks wrong

 

US States House Price Index (HPI). Source: Freddie Mac

Accounting for seasonal fluctuations

Many time series show regular seasonal fluctuations

 

CO2 abundance in the atmosphere over time. Source: NOAA Global Monitoring Laboratory

Many time series show regular seasonal fluctuations

 

CO2 abundance in the atmosphere over time. Source: NOAA Global Monitoring Laboratory

Seasonal Decomposition of Time Series by Loess (STL)

We can use STL to decompose a time series into:

  1. long-term trend

  2. seasonal effect

  3. remainder (noise)

 

Magnitude of remainder should be small compared to magnitude of seasonal fluctuations

 

Magnitude of remainder should be small compared to magnitude of seasonal fluctuations

Other strategies for adjusting for seasonality

Simpler approaches:

  • Fit model with fixed or random effects for specific seasons

More complex approaches:

  • Perform Fourier or wavelet decomposition

All of these are beyond the scope of this class

Creating trendlines in ggplot2

Getting the data

First dataset: blue_jays

blue_jays <- read_csv("https://wilkelab.org/SDS366/datasets/blue_jays.csv")
blue_jays
# A tibble: 123 × 8
   bird_id    sex   bill_depth_mm bill_width_mm bill_length_mm head_length_mm
   <chr>      <chr>         <dbl>         <dbl>          <dbl>          <dbl>
 1 0000-00000 M              8.26          9.21           25.9           56.6
 2 1142-05901 M              8.54          8.76           25.0           56.4
 3 1142-05905 M              8.39          8.78           26.1           57.3
 4 1142-05907 F              7.78          9.3            23.5           53.8
 5 1142-05909 M              8.71          9.84           25.5           57.3
 6 1142-05911 F              7.28          9.3            22.2           52.2
 7 1142-05912 M              8.74          9.28           25.4           57.1
 8 1142-05914 M              8.72          9.94           30             60.7
 9 1142-05917 F              8.2           9.01           22.8           52.8
10 1142-05920 F              7.67          9.31           24.6           54.9
# ℹ 113 more rows
# ℹ 2 more variables: body_mass_g <dbl>, skull_size_mm <dbl>

Getting the data

Second dataset: cars93

cars93 <- read_csv("https://wilkelab.org/SDS366/datasets/cars93.csv")
cars93
# A tibble: 93 × 27
   Manufacturer Model      Type   Min.Price Price Max.Price MPG.city MPG.highway
   <chr>        <chr>      <chr>      <dbl> <dbl>     <dbl>    <dbl>       <dbl>
 1 Acura        Integra    Small       12.9  15.9      18.8       25          31
 2 Acura        Legend     Midsi…      29.2  33.9      38.7       18          25
 3 Audi         90         Compa…      25.9  29.1      32.3       20          26
 4 Audi         100        Midsi…      30.8  37.7      44.6       19          26
 5 BMW          535i       Midsi…      23.7  30        36.2       22          30
 6 Buick        Century    Midsi…      14.2  15.7      17.3       22          31
 7 Buick        LeSabre    Large       19.9  20.8      21.7       19          28
 8 Buick        Roadmaster Large       22.6  23.7      24.9       16          25
 9 Buick        Riviera    Midsi…      26.3  26.3      26.3       19          27
10 Cadillac     DeVille    Large       33    34.7      36.3       16          25
# ℹ 83 more rows
# ℹ 19 more variables: AirBags <chr>, DriveTrain <chr>, Cylinders <chr>,
#   EngineSize <dbl>, Horsepower <dbl>, RPM <dbl>, Rev.per.mile <dbl>,
#   Man.trans.avail <chr>, Fuel.tank.capacity <dbl>, Passengers <dbl>,
#   Length <dbl>, Wheelbase <dbl>, Width <dbl>, Turn.circle <dbl>,
#   Rear.seat.room <dbl>, Luggage.room <dbl>, Weight <dbl>, Origin <chr>,
#   Make <chr>

We add trend lines with geom_smooth()

ggplot(blue_jays) +
  aes(body_mass_g, head_length_mm) + 
  geom_point() + 
  theme_bw()

 

Scatter plot only

We add trend lines with geom_smooth()

ggplot(blue_jays) +
  aes(body_mass_g, head_length_mm) + 
  geom_point() + 
  geom_smooth() +
  theme_bw()

 

Scatter plot with loess smooth

We add trend lines with geom_smooth()

ggplot(blue_jays) +
  aes(body_mass_g, head_length_mm) + 
  geom_point() + 
  geom_smooth(
    # smooth using linear model
    method = "lm"
  ) +
  theme_bw()

 

Scatter plot with linear regression

We add trend lines with geom_smooth()

ggplot(blue_jays) +
  aes(body_mass_g, head_length_mm) + 
  geom_point() + 
  geom_smooth(
    # smooth using linear model
    method = "lm",
    # suppress confidence band
    se = FALSE
  ) +
  theme_bw()

 

Scatter plot with linear regression, no confidence band

We add trend lines with geom_smooth()

ggplot(blue_jays) +
  aes(
    body_mass_g, head_length_mm,
    color = sex
  ) + 
  geom_point() + 
  geom_smooth(
    # smooth using linear model
    method = "lm",
    # suppress confidence band
    se = FALSE
  ) +
  theme_bw()

 

Scatter plot with linear regression by sex

Linear regression can be nonsensical

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() + 
  geom_smooth(
    # smooth using linear model
    method = "lm",
    # suppress confidence band
    se = FALSE
  ) +
  theme_bw()

 

Do more expensive cars have a larger fuel tank?

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # default: loess smoothing
  geom_smooth(
    se = FALSE
  ) +
  theme_bw()

 

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # loess smoothing
  geom_smooth(
    se = FALSE,
    method = "loess",
    formula = y ~ x
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # loess smoothing
  geom_smooth(
    se = FALSE,
    method = "loess",
    formula = y ~ x,
    span = 0.25
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # loess smoothing
  geom_smooth(
    se = FALSE,
    method = "loess",
    formula = y ~ x,
    span = 0.75 # default value
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # loess smoothing
  geom_smooth(
    se = FALSE,
    method = "loess",
    formula = y ~ x,
    span = 1.0
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # loess smoothing
  geom_smooth(
    se = FALSE,
    method = "loess",
    formula = y ~ x,
    span = 1.5
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # cubic spline, 5 knots
  geom_smooth(
    se = FALSE,
    method = "gam",
    formula = y ~ s(x, k = 5, bs = 'cr')
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # thin-plate spline, 3 knots
  geom_smooth(
    se = FALSE,
    method = "gam",
    formula = y ~ s(x, k = 3)
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Example: Fuel-tank capacity versus price in cars

ggplot(cars93) +
  aes(x = Price, y = Fuel.tank.capacity) + 
  geom_point() +
  # Gaussian process spline, 6 knots
  geom_smooth(
    se = FALSE,
    method = "gam",
    formula = y ~ s(x, k = 6, bs = 'gp')
  ) +
  theme_bw()

 

Caution: Exact shape of smoothing line depends on method details

Smoothing lines are particularly unreliable near their endpoints

Further reading