class: center, middle, title-slide # Dealing with issues of overplotting ### Claus O. Wilke ### last updated: 2022-04-25 --- ## Be aware of points plotted exactly on top of one another .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( -- Technical term for this problem: overplotting --- ## Partial transparency helps highlight overlapping points .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## A little jitter shows overlaps even more clearly .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## But don't jitter too much .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- class: center middle ## 2D density plots: Contour lines and contour bands --- ## Contour lines are the 2D version of density plots .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## We can vary shading for added effect .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## What do we do when there are multiple groups? .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## Colored contour lines can work for 2 to 3 groups .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## What if there are multiple groups intermingled? .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## Don't make plots that look like spaghetti .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- ## Contour lines work well with small multiples (facets) .center[ <!-- --> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.]( --- class: center middle ## 2D histograms: Rectangular and hex bins --- ## 2D histograms: rectangular bins .center[ <!-- --> ] -- .absolute-bottom-left[ We need to choose a bin size as in regular histograms ] --- ## 2D histograms: rectangular bins .center[ <!-- --> ] .absolute-bottom-left[ We need to choose a bin size as in regular histograms ] --- ## 2D histograms: hex bins .center[ <!-- --> ] --- ## 2D histograms: hex bins .center[ <!-- --> ] --- ## Choosing the right color scale .center[  ] .absolute-bottom-left[ palette: SunsetDark ] --- ## Choosing the right color scale .center[ <!-- --> ] .absolute-bottom-left[ palette: Batlow ] --- ## Choosing the right color scale .center[ <!-- --> ] .absolute-bottom-left[ palette: YlOrRd ] --- ## Choosing the right color scale .center[ <!-- --> ] .absolute-bottom-left[ palette: BluYl ] --- ## Choosing the right color scale .center[ <!-- --> ] .absolute-bottom-left[ palette: Heat ] --- ## Choosing the right color scale .center[ <!-- --> ] .absolute-bottom-left[ palette: ag_GrnYl ] --- ## Choosing the right color scale .center[  ] .absolute-bottom-left[ palette: SunsetDark ] [//]: # "segment ends here" --- class: center middle ## Creating 2D density plots and histograms in R --- ## Contour lines Data preparation: .tiny-font[ ```r blue_jays <- read_csv("") ``` ``` ── Column specification ──────────────────────────────────────────────────────── cols( bird_id = col_character(), sex = col_character(), bill_depth_mm = col_double(), bill_width_mm = col_double(), bill_length_mm = col_double(), head_length_mm = col_double(), body_mass_g = col_double(), skull_size_mm = col_double() ) ``` ] --- ## Contour lines .tiny-font.pull-left[ ```r blue_jays %>% ggplot(aes(body_mass_g, head_length_mm)) + geom_point() + theme_bw(14) ``` ] .pull-right[ <!-- --> ] --- ## Contour lines .tiny-font.pull-left[ ```r blue_jays %>% ggplot(aes(body_mass_g, head_length_mm)) + * geom_density_2d() + geom_point() + theme_bw(14) ``` ] .pull-right[ <!-- --> ] --- ## Contour lines .tiny-font.pull-left[ ```r blue_jays %>% ggplot(aes(body_mass_g, head_length_mm)) + * geom_density_2d(bins = 5) + geom_point() + theme_bw(14) ``` ] .pull-right[ <!-- --> ] --- ## Contour bands .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_density_2d_filled(bins = 5) + # geom_point() + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## Contour bands .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_density_2d_filled(bins = 5, alpha = 0.5) + # geom_point() + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## Contour bands .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + geom_density_2d_filled(bins = 5, alpha = 0.5) + * geom_density_2d(bins = 5, color = "black", size = 0.2) + geom_point() + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d() + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d(binwidth = c(3, 3)) + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d(binwidth = c(1, 5)) + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## 2D histograms .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_bin2d(binwidth = c(5, 1)) + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## Hex bins .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_hex() + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## Hex bins .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_hex(bins = 15) + theme_bw(14) ``` ] .center[ <!-- --> ] --- ## Hex bins .tiny-font.width-70.move-up-1em[ ```r ggplot(blue_jays, aes(body_mass_g, head_length_mm)) + * geom_hex(bins = 10) + theme_bw(14) ``` ] .center[ <!-- --> ] [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 18: Handling overlapping points]( - **ggplot2** reference documentation: [`geom_density_2d()`]( - **ggplot2** reference documentation: [`geom_bin2d()`]( - **ggplot2** reference documentation: [`geom_hex()`](