class: center, middle, title-slide .title[ # Visualizing trends ] .author[ ### Claus O. Wilke ] .date[ ### last updated: 2024-03-03 ] --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-lm-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-line-sex-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-female-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-line-female-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-male-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-line-male-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We visualize linear trends with regression lines .center[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-sex-line-1.svg" width="55%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <img src="visualizing-trends_files/figure-html/biorxiv-nofit-1.svg" width="65%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <img src="visualizing-trends_files/figure-html/biorxiv-expfit-1.svg" width="65%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <img src="visualizing-trends_files/figure-html/biorxiv-expfit-logscale-1.svg" width="65%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <img src="visualizing-trends_files/figure-html/biorxiv-logscale-1.svg" width="65%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Exponential trends are linear trends on a log scale .center[ <img src="visualizing-trends_files/figure-html/biorxiv-logscale-doublefit-1.svg" width="65%" /> ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Detrending: Removing the underlying trend .center[ <img src="visualizing-trends_files/figure-html/hpi-no-trendline-1.svg" width="55%" /> ] -- .small-font[ Did housing prices in California decline substantially from 1990 to 1998? ] -- .small-font[ Did housing prices in West Virginia recover by 2017? ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Detrending: Removing the underlying trend .center[ <img src="visualizing-trends_files/figure-html/hpi-trends-1.svg" width="55%" /> ] .small-font[ Did housing prices in California decline substantially from 1990 to 1998? ] .small-font[ Did housing prices in West Virginia recover by 2017? ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Detrending: Removing the underlying trend .center[ <img src="visualizing-trends_files/figure-html/hpi-detrended-1.svg" width="55%" /> ] .small-font[ Did housing prices in California decline substantially from 1990 to 1998? — yes ] .small-font[ Did housing prices in West Virginia recover by 2017? — no ] ??? Figure redrawn after [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) [//]: # "segment ends here" --- class: middle, center # Creating trendlines in **ggplot2** --- ## Getting the data First dataset: `blue_jays` .tiny-font[ ```r blue_jays <- read_csv("https://wilkelab.org/SDS375/datasets/blue_jays.csv") blue_jays ``` ``` # A tibble: 123 × 8 bird_id sex bill_depth_mm bill_width_mm bill_length_mm head_length_mm <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 0000-00000 M 8.26 9.21 25.9 56.6 2 1142-05901 M 8.54 8.76 25.0 56.4 3 1142-05905 M 8.39 8.78 26.1 57.3 4 1142-05907 F 7.78 9.3 23.5 53.8 5 1142-05909 M 8.71 9.84 25.5 57.3 6 1142-05911 F 7.28 9.3 22.2 52.2 7 1142-05912 M 8.74 9.28 25.4 57.1 8 1142-05914 M 8.72 9.94 30 60.7 9 1142-05917 F 8.2 9.01 22.8 52.8 10 1142-05920 F 7.67 9.31 24.6 54.9 # ℹ 113 more rows # ℹ 2 more variables: body_mass_g <dbl>, skull_size_mm <dbl> ``` ] --- ## Getting the data Second dataset: `cars93` .tiny-font[ ```r cars93 <- read_csv("https://wilkelab.org/SDS375/datasets/cars93.csv") cars93 ``` ``` # A tibble: 93 × 27 Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Acura Integra Small 12.9 15.9 18.8 25 31 2 Acura Legend Midsi… 29.2 33.9 38.7 18 25 3 Audi 90 Compa… 25.9 29.1 32.3 20 26 4 Audi 100 Midsi… 30.8 37.7 44.6 19 26 5 BMW 535i Midsi… 23.7 30 36.2 22 30 6 Buick Century Midsi… 14.2 15.7 17.3 22 31 7 Buick LeSabre Large 19.9 20.8 21.7 19 28 8 Buick Roadmaster Large 22.6 23.7 24.9 16 25 9 Buick Riviera Midsi… 26.3 26.3 26.3 19 27 10 Cadillac DeVille Large 33 34.7 36.3 16 25 # ℹ 83 more rows # ℹ 19 more variables: AirBags <chr>, DriveTrain <chr>, Cylinders <chr>, # EngineSize <dbl>, Horsepower <dbl>, RPM <dbl>, Rev.per.mile <dbl>, # Man.trans.avail <chr>, Fuel.tank.capacity <dbl>, Passengers <dbl>, # Length <dbl>, Wheelbase <dbl>, Width <dbl>, Turn.circle <dbl>, # Rear.seat.room <dbl>, Luggage.room <dbl>, Weight <dbl>, Origin <chr>, # Make <chr> ``` ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/blue-jays-scatter-gg-out-1.svg" width="100%" /> ] .small-font[ Scatter plot only ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) + geom_smooth() ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` <img src="visualizing-trends_files/figure-html/blue-jays-scatter-gg-smooth-out-1.svg" width="100%" /> ] .small-font[ Scatter plot with loess smooth ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) + geom_smooth( # smooth using linear model method = "lm" ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <img src="visualizing-trends_files/figure-html/blue-jays-scatter-gg-lm-out-1.svg" width="100%" /> ] .small-font[ Scatter plot with linear regression ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes(body_mass_g, head_length_mm) + geom_point() + theme_bw(14) + geom_smooth( # smooth using linear model method = "lm", # suppress confidence band se = FALSE ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <img src="visualizing-trends_files/figure-html/blue-jays-scatter-gg-lm-nose-out-1.svg" width="100%" /> ] .small-font[ Scatter plot with linear regression, no confidence band ] --- ## We add trend lines with `geom_smooth()` .tiny-font.pull-left[ ```r ggplot(blue_jays) + aes( body_mass_g, head_length_mm, color = sex ) + geom_point() + theme_bw(14) + geom_smooth( # smooth using linear model method = "lm", # suppress confidence band se = FALSE ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <img src="visualizing-trends_files/figure-html/blue-jays-scatter-gg-sex-out-1.svg" width="100%" /> ] .small-font[ Scatter plot with linear regression by sex ] --- class: middle, center # Linear regression can be nonsensical --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + geom_smooth(method = "lm") ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using formula = 'y ~ x' ``` <img src="visualizing-trends_files/figure-html/cars-lm-out-1.svg" width="100%" /> ] -- .small-font[ Do more expensive cars have a larger fuel tank? ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # default: loess smoothing geom_smooth( se = FALSE ) ``` ] .xtiny-font.pull-right[ ``` `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` <img src="visualizing-trends_files/figure-html/cars-loess-out-1.svg" width="100%" /> ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-loess2-out-1.svg" width="100%" /> ] -- .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 0.25 ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-loess3-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 0.75 # default value ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-loess4-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 1.0 ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-loess5-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # loess smoothing geom_smooth( se = FALSE, method = "loess", formula = y ~ x, span = 1.5 ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-loess6-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # cubic spline, 5 knots geom_smooth( se = FALSE, method = "gam", formula = y ~ s(x, k = 5, bs = 'cr') ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-gam-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # thin-plate spline, 3 knots geom_smooth( se = FALSE, method = "gam", formula = y ~ s(x, k = 3) ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-gam2-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] --- ## Example: Fuel-tank capacity versus price in cars .tiny-font.pull-left[ ```r ggplot(cars93) + aes(x = Price, y = Fuel.tank.capacity) + geom_point() + theme_bw(14) + # Gaussian process spline, 6 knots geom_smooth( se = FALSE, method = "gam", formula = y ~ s(x, k = 6, bs = 'gp') ) ``` ] .xtiny-font.pull-right[ <img src="visualizing-trends_files/figure-html/cars-gam3-out-1.svg" width="100%" /> ] .small-font[ Caution: Exact shape of smoothing line depends on method details ] -- .small-font[ Smoothing lines are particularly unreliable near their endpoints ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 14: Visualizing trends](https://clauswilke.com/dataviz/visualizing-trends.html) - Data Visualization—A Practical Introduction: [Chapter 6: Work with models](https://socviz.co/modeling.html) - **ggplot2** reference documentation: [`geom_smooth()`](https://ggplot2.tidyverse.org/reference/geom_smooth.html) - **mgcv** reference documentation (for gam smoothing): [pdf document](https://cran.r-project.org/web/packages/mgcv/mgcv.pdf)