class: center, middle, title-slide .title[ # Redundant coding, text annotations ] .author[ ### Claus O. Wilke ] .date[ ### last updated: 2024-04-08 ] --- ## Color encoding alone is not always sufficient .center[ <img src="redundant-coding_files/figure-html/iris-scatter-one-shape-1.svg" width="60%" /> ] Iris virginica and Iris versicolor are difficult to tell apart ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- .center[ <img src="redundant-coding_files/figure-html/iris-scatter-one-shape-cvd-1.svg" width="80%" /> ] .absolute-bottom-left[ CVD simulation of preceding figure ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Using both colors and shapes helps .center[ <img src="redundant-coding_files/figure-html/iris-scatter-three-shapes-1.svg" width="60%" /> ] Notice also that color assignment has been switched ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- .center[ <img src="redundant-coding_files/figure-html/iris-scatter-three-shapes-cvd-1.svg" width="80%" /> ] .absolute-bottom-left[ CVD simulation of preceding figure ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Pay attention to legend order .center[ <img src="redundant-coding_files/figure-html/tech-stocks-bad-legend-1.svg" width="60%" /> ] Legend order **does not** match visual order of the lines ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Pay attention to legend order .center[ <img src="redundant-coding_files/figure-html/tech-stocks-good-legend-1.svg" width="60%" /> ] Legend order **does** match visual order of the lines ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- .center[ <img src="redundant-coding_files/figure-html/tech-stocks-good-legend-cvd-1.svg" width="80%" /> ] .absolute-bottom-left[ CVD simulation of preceding figure ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Even better: Don't use a legend at all .center[ <img src="redundant-coding_files/figure-html/tech-stocks-good-no-legend-1.svg" width="60%" /> ] This technique is called "direct labeling" ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Direct labeling the Iris scatter plot .center[ <img src="redundant-coding_files/figure-html/iris-scatter-with-ellipses-1.svg" width="45%" /> ] The ellipses help guide the eye ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) --- ## Label only a subset if there are too many items to label .center[ <img src="redundant-coding_files/figure-html/texas-counties-log-1.svg" width="70%" /> ] ??? Figure redrawn from [Claus O. Wilke. Fundamentals of Data Visualization. O'Reilly, 2019.](https://clauswilke.com/dataviz) [//]: # "segment ends here" --- class: center middle ## Direct labeling in R --- ## 1. Secondary axis trick Data preparation: .tiny-font[ ```r tech_stocks <- read_csv("https://wilkelab.org/SDS375/datasets/tech_stocks.csv") %>% mutate(date = ymd(date)) %>% select(company, date, price_indexed) ``` ] --- ## 1. Secondary axis trick .pull-left[ The raw input data: .tiny-font[ ```r tech_stocks ``` ``` # A tibble: 9,569 × 3 company date price_indexed <chr> <date> <dbl> 1 Alphabet 2017-06-02 342. 2 Alphabet 2017-06-01 339. 3 Alphabet 2017-05-31 338. 4 Alphabet 2017-05-30 342. 5 Alphabet 2017-05-26 341. 6 Alphabet 2017-05-25 340. 7 Alphabet 2017-05-24 335. 8 Alphabet 2017-05-23 333. 9 Alphabet 2017-05-22 330. 10 Alphabet 2017-05-19 327. # ℹ 9,559 more rows ``` ]] -- .pull-right[ Most recent values only: .tiny-font[ ```r tech_stocks_last <- tech_stocks %>% filter(date == max(date)) tech_stocks_last ``` ``` # A tibble: 4 × 3 company date price_indexed <chr> <date> <dbl> 1 Alphabet 2017-06-02 342. 2 Apple 2017-06-02 194. 3 Facebook 2017-06-02 554. 4 Microsoft 2017-06-02 252. ``` ]] --- ## 1. Secondary axis trick .tiny-font.pull-left.width-50[ ```r ggplot(tech_stocks) + aes(x = date, y = price_indexed) + geom_line(aes(color = company), na.rm = TRUE) + scale_x_date( limits = c( ymd("2012-06-01"), ymd("2017-05-31") ), expand = c(0, 0) ) + scale_y_continuous( limits = c(0, 560), expand = c(0, 0) ) ``` ] .pull-right[ <img src="redundant-coding_files/figure-html/tech-stock-sec-axis-out-1.svg" width="100%" /> ] Side remark: The default legend order doesn't match the data --- ## 1. Secondary axis trick .tiny-font.pull-left.width-50[ ```r ggplot(tech_stocks) + aes(x = date, y = price_indexed) + geom_line(aes(color = company), na.rm = TRUE) + scale_x_date( limits = c( ymd("2012-06-01"), ymd("2017-05-31") ), expand = c(0, 0) ) + scale_y_continuous( limits = c(0, 560), expand = c(0, 0), * sec.axis = dup_axis( * breaks = tech_stocks_last$price_indexed, * labels = tech_stocks_last$company, * name = NULL * ) ) + guides(color = "none") ``` ] .pull-right[ <img src="redundant-coding_files/figure-html/tech-stock-sec-axis2-out-1.svg" width="100%" /> ] --- ## 2. Manual labeling with `geom_text()` The raw input data: .tiny-font[ ```r iris ``` ``` Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa 11 5.4 3.7 1.5 0.2 setosa 12 4.8 3.4 1.6 0.2 setosa 13 4.8 3.0 1.4 0.1 setosa 14 4.3 3.0 1.1 0.1 setosa 15 5.8 4.0 1.2 0.2 setosa 16 5.7 4.4 1.5 0.4 setosa 17 5.4 3.9 1.3 0.4 setosa 18 5.1 3.5 1.4 0.3 setosa 19 5.7 3.8 1.7 0.3 setosa 20 5.1 3.8 1.5 0.3 setosa 21 5.4 3.4 1.7 0.2 setosa 22 5.1 3.7 1.5 0.4 setosa 23 4.6 3.6 1.0 0.2 setosa 24 5.1 3.3 1.7 0.5 setosa 25 4.8 3.4 1.9 0.2 setosa 26 5.0 3.0 1.6 0.2 setosa 27 5.0 3.4 1.6 0.4 setosa 28 5.2 3.5 1.5 0.2 setosa 29 5.2 3.4 1.4 0.2 setosa 30 4.7 3.2 1.6 0.2 setosa 31 4.8 3.1 1.6 0.2 setosa 32 5.4 3.4 1.5 0.4 setosa 33 5.2 4.1 1.5 0.1 setosa 34 5.5 4.2 1.4 0.2 setosa 35 4.9 3.1 1.5 0.2 setosa 36 5.0 3.2 1.2 0.2 setosa 37 5.5 3.5 1.3 0.2 setosa 38 4.9 3.6 1.4 0.1 setosa 39 4.4 3.0 1.3 0.2 setosa 40 5.1 3.4 1.5 0.2 setosa 41 5.0 3.5 1.3 0.3 setosa 42 4.5 2.3 1.3 0.3 setosa 43 4.4 3.2 1.3 0.2 setosa 44 5.0 3.5 1.6 0.6 setosa 45 5.1 3.8 1.9 0.4 setosa 46 4.8 3.0 1.4 0.3 setosa 47 5.1 3.8 1.6 0.2 setosa 48 4.6 3.2 1.4 0.2 setosa 49 5.3 3.7 1.5 0.2 setosa 50 5.0 3.3 1.4 0.2 setosa 51 7.0 3.2 4.7 1.4 versicolor 52 6.4 3.2 4.5 1.5 versicolor 53 6.9 3.1 4.9 1.5 versicolor 54 5.5 2.3 4.0 1.3 versicolor 55 6.5 2.8 4.6 1.5 versicolor 56 5.7 2.8 4.5 1.3 versicolor 57 6.3 3.3 4.7 1.6 versicolor 58 4.9 2.4 3.3 1.0 versicolor 59 6.6 2.9 4.6 1.3 versicolor 60 5.2 2.7 3.9 1.4 versicolor 61 5.0 2.0 3.5 1.0 versicolor 62 5.9 3.0 4.2 1.5 versicolor 63 6.0 2.2 4.0 1.0 versicolor 64 6.1 2.9 4.7 1.4 versicolor 65 5.6 2.9 3.6 1.3 versicolor 66 6.7 3.1 4.4 1.4 versicolor 67 5.6 3.0 4.5 1.5 versicolor 68 5.8 2.7 4.1 1.0 versicolor 69 6.2 2.2 4.5 1.5 versicolor 70 5.6 2.5 3.9 1.1 versicolor 71 5.9 3.2 4.8 1.8 versicolor 72 6.1 2.8 4.0 1.3 versicolor 73 6.3 2.5 4.9 1.5 versicolor 74 6.1 2.8 4.7 1.2 versicolor 75 6.4 2.9 4.3 1.3 versicolor 76 6.6 3.0 4.4 1.4 versicolor 77 6.8 2.8 4.8 1.4 versicolor 78 6.7 3.0 5.0 1.7 versicolor 79 6.0 2.9 4.5 1.5 versicolor 80 5.7 2.6 3.5 1.0 versicolor 81 5.5 2.4 3.8 1.1 versicolor 82 5.5 2.4 3.7 1.0 versicolor 83 5.8 2.7 3.9 1.2 versicolor 84 6.0 2.7 5.1 1.6 versicolor 85 5.4 3.0 4.5 1.5 versicolor 86 6.0 3.4 4.5 1.6 versicolor 87 6.7 3.1 4.7 1.5 versicolor 88 6.3 2.3 4.4 1.3 versicolor 89 5.6 3.0 4.1 1.3 versicolor 90 5.5 2.5 4.0 1.3 versicolor 91 5.5 2.6 4.4 1.2 versicolor 92 6.1 3.0 4.6 1.4 versicolor 93 5.8 2.6 4.0 1.2 versicolor 94 5.0 2.3 3.3 1.0 versicolor 95 5.6 2.7 4.2 1.3 versicolor 96 5.7 3.0 4.2 1.2 versicolor 97 5.7 2.9 4.2 1.3 versicolor 98 6.2 2.9 4.3 1.3 versicolor 99 5.1 2.5 3.0 1.1 versicolor 100 5.7 2.8 4.1 1.3 versicolor 101 6.3 3.3 6.0 2.5 virginica 102 5.8 2.7 5.1 1.9 virginica 103 7.1 3.0 5.9 2.1 virginica 104 6.3 2.9 5.6 1.8 virginica 105 6.5 3.0 5.8 2.2 virginica 106 7.6 3.0 6.6 2.1 virginica 107 4.9 2.5 4.5 1.7 virginica 108 7.3 2.9 6.3 1.8 virginica 109 6.7 2.5 5.8 1.8 virginica 110 7.2 3.6 6.1 2.5 virginica 111 6.5 3.2 5.1 2.0 virginica 112 6.4 2.7 5.3 1.9 virginica 113 6.8 3.0 5.5 2.1 virginica 114 5.7 2.5 5.0 2.0 virginica 115 5.8 2.8 5.1 2.4 virginica 116 6.4 3.2 5.3 2.3 virginica 117 6.5 3.0 5.5 1.8 virginica 118 7.7 3.8 6.7 2.2 virginica 119 7.7 2.6 6.9 2.3 virginica 120 6.0 2.2 5.0 1.5 virginica 121 6.9 3.2 5.7 2.3 virginica 122 5.6 2.8 4.9 2.0 virginica 123 7.7 2.8 6.7 2.0 virginica 124 6.3 2.7 4.9 1.8 virginica 125 6.7 3.3 5.7 2.1 virginica 126 7.2 3.2 6.0 1.8 virginica 127 6.2 2.8 4.8 1.8 virginica 128 6.1 3.0 4.9 1.8 virginica 129 6.4 2.8 5.6 2.1 virginica 130 7.2 3.0 5.8 1.6 virginica 131 7.4 2.8 6.1 1.9 virginica 132 7.9 3.8 6.4 2.0 virginica 133 6.4 2.8 5.6 2.2 virginica 134 6.3 2.8 5.1 1.5 virginica 135 6.1 2.6 5.6 1.4 virginica 136 7.7 3.0 6.1 2.3 virginica 137 6.3 3.4 5.6 2.4 virginica 138 6.4 3.1 5.5 1.8 virginica 139 6.0 3.0 4.8 1.8 virginica 140 6.9 3.1 5.4 2.1 virginica 141 6.7 3.1 5.6 2.4 virginica 142 6.9 3.1 5.1 2.3 virginica 143 5.8 2.7 5.1 1.9 virginica 144 6.8 3.2 5.9 2.3 virginica 145 6.7 3.3 5.7 2.5 virginica 146 6.7 3.0 5.2 2.3 virginica 147 6.3 2.5 5.0 1.9 virginica 148 6.5 3.0 5.2 2.0 virginica 149 6.2 3.4 5.4 2.3 virginica 150 5.9 3.0 5.1 1.8 virginica ``` ] --- ## 2. Manual labeling with `geom_text()` Manually create table with label positions: .tiny-font[ ```r iris_labels <- tibble( Species = c("setosa", "virginica", "versicolor"), Sepal.Width = c(4.2, 3.76, 2.08), Sepal.Length = c(5.7, 7, 5.1), label = c("Iris setosa", "Iris virginica", "Iris versicolor"), hjust = c(0, 0.5, 0), vjust = c(0, 0.5, 1) ) iris_labels ``` ``` # A tibble: 3 × 6 Species Sepal.Width Sepal.Length label hjust vjust <chr> <dbl> <dbl> <chr> <dbl> <dbl> 1 setosa 4.2 5.7 Iris setosa 0 0 2 virginica 3.76 7 Iris virginica 0.5 0.5 3 versicolor 2.08 5.1 Iris versicolor 0 1 ``` ] --- ## 2. Manual labeling with `geom_text()` .xtiny-font.pull-left.width-50[ ```r ggplot(iris) + aes(Sepal.Length, Sepal.Width, color = Species) + geom_point(aes(shape = Species)) ``` ] .pull-right[ <img src="redundant-coding_files/figure-html/iris-manual-out-1.svg" width="100%" /> ] --- ## 2. Manual labeling with `geom_text()` .xtiny-font.pull-left.width-50[ ```r ggplot(iris) + aes(Sepal.Length, Sepal.Width, color = Species) + geom_point(aes(shape = Species)) + * geom_text( * data = iris_labels, * aes( * label = label, * hjust = hjust, vjust = vjust * ), * size = 14/.pt # 14pt font * ) + guides(color = "none", shape = "none") ``` ] .pull-right[ <img src="redundant-coding_files/figure-html/iris-manual2-out-1.svg" width="100%" /> ] --- ## 2. Manual labeling with `geom_text()` .xtiny-font.pull-left.width-50[ ```r ggplot(iris) + aes(Sepal.Length, Sepal.Width, color = Species) + geom_point(aes(shape = Species)) + geom_text( data = iris_labels, aes( label = label, hjust = hjust, vjust = vjust ), size = 14/.pt # 14pt font ) + * stat_ellipse(size = 0.5) + guides(color = "none", shape = "none") ``` ] .pull-right[ <img src="redundant-coding_files/figure-html/iris-manual3-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r mtcars_named <- mtcars %>% rownames_to_column("car") %>% select(car, weight = wt, mpg) mtcars_named ``` ``` car weight mpg 1 Mazda RX4 2.620 21.0 2 Mazda RX4 Wag 2.875 21.0 3 Datsun 710 2.320 22.8 4 Hornet 4 Drive 3.215 21.4 5 Hornet Sportabout 3.440 18.7 6 Valiant 3.460 18.1 7 Duster 360 3.570 14.3 8 Merc 240D 3.190 24.4 9 Merc 230 3.150 22.8 10 Merc 280 3.440 19.2 11 Merc 280C 3.440 17.8 12 Merc 450SE 4.070 16.4 13 Merc 450SL 3.730 17.3 14 Merc 450SLC 3.780 15.2 15 Cadillac Fleetwood 5.250 10.4 16 Lincoln Continental 5.424 10.4 17 Chrysler Imperial 5.345 14.7 18 Fiat 128 2.200 32.4 19 Honda Civic 1.615 30.4 20 Toyota Corolla 1.835 33.9 21 Toyota Corona 2.465 21.5 22 Dodge Challenger 3.520 15.5 23 AMC Javelin 3.435 15.2 24 Camaro Z28 3.840 13.3 25 Pontiac Firebird 3.845 19.2 26 Fiat X1-9 1.935 27.3 27 Porsche 914-2 2.140 26.0 28 Lotus Europa 1.513 30.4 29 Ford Pantera L 3.170 15.8 30 Ferrari Dino 2.770 19.7 31 Maserati Bora 3.570 15.0 32 Volvo 142E 2.780 21.4 ``` ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r ggplot(mtcars_named, aes(weight, mpg)) + geom_point() ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot1-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r ggplot(mtcars_named, aes(weight, mpg)) + geom_point() + geom_text(aes(label = car)) ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot2-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r ggplot(mtcars_named, aes(weight, mpg)) + geom_point() + geom_text(aes(label = car), hjust = 0) ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot3-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r *library(ggrepel) ggplot(mtcars_named, aes(weight, mpg)) + geom_point() + * geom_text_repel( * aes(label = car), * max.overlaps = Inf * ) ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot4-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r *set.seed(6643) mtcars_named %>% mutate( * # randomly exclude 50% of the labels * car = ifelse(runif(n()) < 0.5, "", car) ) %>% ggplot(aes(weight, mpg)) + geom_point() + geom_text_repel( aes(label = car), max.overlaps = Inf ) ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot5-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r *set.seed(6644) mtcars_named %>% mutate( * # randomly exclude 50% of the labels * car = ifelse(runif(n()) < 0.5, "", car) ) %>% ggplot(aes(weight, mpg)) + geom_point() + geom_text_repel( aes(label = car), max.overlaps = Inf ) ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot6-out-1.svg" width="100%" /> ] --- ## 3. Automatic labeling with `geom_text_repel()` .tiny-font.pull-left[ ```r set.seed(6644) mtcars_named %>% mutate( # randomly exclude 50% of the labels car = ifelse(runif(n()) < 0.5, "", car) ) %>% ggplot(aes(weight, mpg)) + geom_point() + geom_text_repel( aes(label = car), max.overlaps = Inf, * box.padding = 0.7 ) ``` ] .pull-right.width-50[ <img src="redundant-coding_files/figure-html/mtcars-plot7-out-1.svg" width="100%" /> ] `box.padding` controls how far labels are placed from data points [//]: # "segment ends here" --- ## Further reading - Fundamentals of Data Visualization: [Chapter 20: Redundant coding](https://clauswilke.com/dataviz/redundant-coding.html) - **ggplot2** reference documentation: [`geom_text()`](https://ggplot2.tidyverse.org/reference/geom_text.html) - **ggrepel** documentation: [Examples](https://ggrepel.slowkow.com/articles/examples.html#examples-1) - **ggrepel** reference documentation: [`geom_text_repel()`](https://ggrepel.slowkow.com/reference/geom_text_repel.html)