class: center, middle, title-slide # Dealing with issues of overplotting ### Claus O. Wilke ### last updated: 2022-04-25 --- ## Be aware of points plotted exactly on top of one another .center[ ![](overplotting_files/figure-html/mpg-cty-displ-solid-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) -- Technical term for this problem: overplotting --- ## Partial transparency helps highlight overlapping points .center[ ![](overplotting_files/figure-html/mpg-cty-displ-transp-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## A little jitter shows overlaps even more clearly .center[ ![](overplotting_files/figure-html/mpg-cty-displ-jitter-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## But don't jitter too much .center[ ![](overplotting_files/figure-html/mpg-cty-displ-jitter-extreme-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- class: center middle ## 2D density plots: Contour lines and contour bands --- ## Contour lines are the 2D version of density plots .center[ ![](overplotting_files/figure-html/blue-jays-contour-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## We can vary shading for added effect .center[ ![](overplotting_files/figure-html/blue-jays-contour-filled-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## What do we do when there are multiple groups? .center[ ![](overplotting_files/figure-html/blue-jays-no-contour-by-sex-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Colored contour lines can work for 2 to 3 groups .center[ ![](overplotting_files/figure-html/blue-jays-contour-by-sex-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## What if there are multiple groups intermingled? .center[ ![](overplotting_files/figure-html/diamonds-points-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Don't make plots that look like spaghetti .center[ ![](overplotting_files/figure-html/diamonds-contours-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Contour lines work well with small multiples (facets) .center[ ![](overplotting_files/figure-html/diamonds-contour-facets-1.svg)<!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- class: center middle ## 2D histograms: Rectangular and hex bins --- ## 2D histograms: rectangular bins .center[ ![](overplotting_files/figure-html/diamonds-bin2d-facets-1.svg)<!-- --> ] -- .absolute-bottom-left[ We need to choose a bin size as in regular histograms ] --- ## 2D histograms: rectangular bins .center[ ![](overplotting_files/figure-html/diamonds-bin2d-facets2-1.svg)<!-- --> ] .absolute-bottom-left[ We need to choose a bin size as in regular histograms ] --- ## 2D histograms: hex bins .center[ ![](overplotting_files/figure-html/diamonds-hexbin-facets-1.svg)<!-- --> ] --- ## 2D histograms: hex bins .center[ ![](overplotting_files/figure-html/diamonds-hexbin2-facets-1.svg)<!-- --> ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin2-facets-1.svg) ] .absolute-bottom-left[ palette: SunsetDark ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin3-facets-1.svg)<!-- --> ] .absolute-bottom-left[ palette: Batlow ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin4-facets-1.svg)<!-- --> ] .absolute-bottom-left[ palette: YlOrRd ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin5-facets-1.svg)<!-- --> ] .absolute-bottom-left[ palette: BluYl ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin6-facets-1.svg)<!-- --> ] .absolute-bottom-left[ palette: Heat ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin7-facets-1.svg)<!-- --> ] .absolute-bottom-left[ palette: ag_GrnYl ] --- ## Choosing the right color scale .center[ ![](overplotting_files/figure-html/diamonds-hexbin2-facets-1.svg) ] .absolute-bottom-left[ palette: SunsetDark ] [//]: # "segment ends here" --- class: center middle ## Creating 2D density plots and histograms in R --- ## Contour lines Data preparation: .tiny-font[ ```r blue_jays <- read_csv("https://wilkelab.org/SDS375/datasets/blue_jays.csv") ``` ``` ── Column specification ──────────────────────────────────────────────────────── cols( bird_id = col_character(), sex = col_character(), bill_depth_mm = col_double(), bill_width_mm = col_double(), bill_length_mm = col_double(), head_length_mm = col_double(), body_mass_g = col_double(), skull_size_mm = col_double() ) ``` ] --- ## Contour lines .tiny-font.pull-left[ ```r blue_jays %>% ggplot(aes(body_mass_g, head_length_mm)) + geom_point() + theme_bw(14) ``` ] .pull-right[ ![](overplotting_files/figure-html/scatter-demo-out-1.svg)<!-- --> ] --- ## Contour lines .tiny-font.pull-left[ ```r blue_jays %>% ggplot(aes(body_mass_g, head_length_mm)) + * geom_density_2d() + geom_point() + theme_bw(14) ``` ] .pull-right[ ![](overplotting_files/figure-html/contour-lines-demo-out-1.svg)<!-- --> ] --- ## Contour lines .tiny-font.pull-left[ ```r blue_jays %>% ggplot(aes(body_mass_g, head_length_mm)) + * geom_density_2d(bins = 5) + geom_point() + theme_bw(14) ``` ] .pull-right[ ![](overplotting_files/figure-html/contour-lines-demo2-out-1.svg)<!-- --> ] --- ## Contour bands .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_density_2d_filled(bins = 5) + # geom_point() + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/contour-bands-demo-out-1.svg)<!-- --> ] --- ## Contour bands .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_density_2d_filled(bins = 5, alpha = 0.5) + # geom_point() + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/contour-bands-demo2-out-1.svg)<!-- --> ] --- ## Contour bands .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + geom_density_2d_filled(bins = 5, alpha = 0.5) + * geom_density_2d(bins = 5, color = "black", size = 0.2) + geom_point() + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/contour-bands-demo3-out-1.svg)<!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d() + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/bins2d-demo-out-1.svg)<!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d(binwidth = c(3, 3)) + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/bins2d-demo2-out-1.svg)<!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d(binwidth = c(1, 5)) + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/bins2d-demo3-out-1.svg)<!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d(binwidth = c(5, 1)) + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/bins2d-demo4-out-1.svg)<!-- --> ] --- ## Hex bins .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_hex() + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/hex-demo-out-1.svg)<!-- --> ] --- ## Hex bins .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_hex(bins = 15) + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/hex-demo2-out-1.svg)<!-- --> ] --- ## Hex bins .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_hex(bins = 10) + theme_bw(14) ``` ] .center[ ![](overplotting_files/figure-html/hex-demo3-out-1.svg)<!-- --> ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 18: Handling overlapping points](https://clauswilke.com/dataviz/overlapping-points.html) - **ggplot2** reference documentation: [`geom_density_2d()`](https://ggplot2.tidyverse.org/reference/geom_density_2d.html) - **ggplot2** reference documentation: [`geom_bin2d()`](https://ggplot2.tidyverse.org/reference/geom_bin2d.html) - **ggplot2** reference documentation: [`geom_hex()`](https://ggplot2.tidyverse.org/reference/geom_hex.html)