class: center, middle, title-slide .title[ # Visualizing trends ] .author[ ### Claus O. Wilke ] .date[ ### last updated: 2023-03-20 ] --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <!-- --> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Detrending: Removing the underlying trend .center[ <!-- --> ] -- .small-font[ Did housing prices in California decline substantially from 1990 to 1998? ] -- .small-font[ Did housing prices in West Virginia recover by 2017? ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Detrending: Removing the underlying trend .center[ <!-- --> ] .small-font[ Did housing prices in California decline substantially from 1990 to 1998? ] .small-font[ Did housing prices in West Virginia recover by 2017? ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Detrending: Removing the underlying trend .center[ <!-- --> ] .small-font[ Did housing prices in California decline substantially from 1990 to 1998? — yes ] .small-font[ Did housing prices in West Virginia recover by 2017? — no ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) [//]: # "segment ends here" --- class: middle, center # Creating trendlines in **ggplot2** --- ## Getting the data First dataset: `blue_jays` .tiny-font[ ```r blue_jays <- read_csv("https://wilkelab.org/DSC385/datasets/blue_jays.csv") blue_jays ``` ``` # A tibble: 123 × 8 bird_id sex bill_depth_mm bill_width_mm bill_l…¹ head_…² body_…³ skull…⁴ <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 0000-00000 M 8.26 9.21 25.9 56.6 73.3 30.7 2 1142-05901 M 8.54 8.76 25.0 56.4 75.1 31.4 3 1142-05905 M 8.39 8.78 26.1 57.3 70.2 31.2 4 1142-05907 F 7.78 9.3 23.5 53.8 65.5 30.3 5 1142-05909 M 8.71 9.84 25.5 57.3 74.9 31.8 6 1142-05911 F 7.28 9.3 22.2 52.2 63.9 30 7 1142-05912 M 8.74 9.28 25.4 57.1 75.1 31.8 8 1142-05914 M 8.72 9.94 30 60.7 78.1 30.7 9 1142-05917 F 8.2 9.01 22.8 52.8 64 30.0 10 1142-05920 F 7.67 9.31 24.6 54.9 67.3 30.3 # … with 113 more rows, and abbreviated variable names ¹bill_length_mm, # ²head_length_mm, ³body_mass_g, ⁴skull_size_mm ``` ] --- ## Getting the data Second dataset: `cars93` .tiny-font[ ```r cars93 <- read_csv("https://wilkelab.org/DSC385/datasets/cars93.csv") cars93 ``` ``` # A tibble: 93 × 27 Manufactu…¹ Model Type Min.P…² Price Max.P…³ MPG.c…⁴ MPG.h…⁵ AirBags Drive…⁶ <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> 1 Acura Inte… Small 12.9 15.9 18.8 25 31 None Front 2 Acura Lege… Mids… 29.2 33.9 38.7 18 25 Driver… Front 3 Audi 90 Comp… 25.9 29.1 32.3 20 26 Driver… Front 4 Audi 100 Mids… 30.8 37.7 44.6 19 26 Driver… Front 5 BMW 535i Mids… 23.7 30 36.2 22 30 Driver… Rear 6 Buick Cent… Mids… 14.2 15.7 17.3 22 31 Driver… Front 7 Buick LeSa… Large 19.9 20.8 21.7 19 28 Driver… Front 8 Buick Road… Large 22.6 23.7 24.9 16 25 Driver… Rear 9 Buick Rivi… Mids… 26.3 26.3 26.3 19 27 Driver… Front 10 Cadillac DeVi… Large 33 34.7 36.3 16 25 Driver… Front # … with 83 more rows, 17 more variables: Cylinders <chr>, EngineSize <dbl>, # Horsepower <dbl>, RPM <dbl>, Rev.per.mile <dbl>, Man.trans.avail <chr>, # Fuel.tank.capacity <dbl>, Passengers <dbl>, Length <dbl>, Wheelbase <dbl>, # Width <dbl>, Turn.circle <dbl>, Rear.seat.room <dbl>, Luggage.room <dbl>, # Weight <dbl>, Origin <chr>, Make <chr>, and abbreviated variable names # ¹Manufacturer, ²Min.Price, ³Max.Price, ⁴MPG.city, ⁵MPG.highway, ⁶DriveTrain ``` ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Scatter plot only ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) + geom_smooth() ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` <!-- --> ] .small-font[ Scatter plot with loess smooth ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) + geom_smooth( # smooth using linear model method = "lm" ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <!-- --> ] .small-font[ Scatter plot with linear regression ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) + geom_smooth( # smooth using linear model method = "lm", # suppress confidence band se = FALSE ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <!-- --> ] .small-font[ Scatter plot with linear regression, no confidence band ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes( body_mass_g, head_length_mm, color = sex ) + geom_point() + theme_bw(14) + geom_smooth( # smooth using linear model method = "lm", # suppress confidence band se = FALSE ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <!-- --> ] .small-font[ Scatter plot with linear regression by sex ] --- class: middle, center # Linear regression can be nonsensical --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + geom_smooth(method = "lm") ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <!-- --> ] -- .small-font[ Do more expensive cars have a larger fuel tank? ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # default: loess smoothing geom_smooth( se = FALSE ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` <!-- --> ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x ) ``` ] .xtiny-font.pull-right[ <!-- --> ] -- .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 0.25 ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 0.75 # default value ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 1.0 ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 1.5 ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # cubic spline, 5 knots geom_smooth( se = FALSE, method = "gam", formula = y ~ s(x, k = 5, bs = 'cr') ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # thin-plate spline, 3 knots geom_smooth( se = FALSE, method = "gam", formula = y ~ s(x, k = 3) ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # Gaussian process spline, 6 knots geom_smooth( se = FALSE, method = "gam", formula = y ~ s(x, k = 6, bs = 'gp') ) ``` ] .xtiny-font.pull-right[ <!-- --> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] -- .small-font[ Smoothing lines are particularly unreliable near their endpoints ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 14: Visualizing trends](https://clauswilke.com/dataviz/visualizing-trends.html) - Data Visualization—A Practical Introduction: [Chapter 6: Work with models](https://socviz.co/modeling.html) - **ggplot2** reference documentation: [`geom_smooth()`](https://ggplot2.tidyverse.org/reference/geom_smooth.html) - **mgcv** reference documentation (for gam smoothing): [pdf document](https://cran.r-project.org/web/packages/mgcv/mgcv.pdf)