class: center, middle, title-slide # Functions and functional programming ### Claus O. Wilke ### last updated: 2021-09-23 --- ## We often have to run similar code multiple times -- .tiny-font.pull-left.width-50[ ```r penguins %>% filter(species == "Adelie") %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle("Species: Adelie") + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] -- .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Adelie-out-1.svg)<!-- --> ] --- ## We often have to run similar code multiple times .tiny-font.pull-left.width-50[ ```r penguins %>% filter(species == "Chinstrap") %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle("Species: Chinstrap") + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Chinstrap-out-1.svg)<!-- --> ] --- ## We often have to run similar code multiple times .tiny-font.pull-left.width-50[ ```r penguins %>% filter(species == "Gentoo") %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle("Species: Gentoo") + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Gentoo-out-1.svg)<!-- --> ] -- How can we make our life simpler and avoid massive code duplication? --- ## Step 1: Avoid hard-coding specific values -- .tiny-font.pull-left.width-50[ ```r *species <- "Adelie" penguins %>% * filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + * ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Adelie2-out-1.svg)<!-- --> ] --- ## A quick aside: the pronouns `.data` and `.env` We can use pronouns to distinguish data columns from variables: ```r species <- "Adelie" penguins %>% filter(.data$species == .env$species) ``` -- `.data$species` is a column in the data frame -- `.env$species` is a variable in the local environment --- ## A quick aside: the pronouns `.data` and `.env` Alternatively we would have to make sure the names don't clash: ```r species_choice <- "Adelie" penguins %>% filter(species == species_choice) ``` --- ## Step 1: Avoid hard-coding specific values .tiny-font.pull-left.width-50[ ```r *species <- "Adelie" penguins %>% filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Adelie2-out-1.svg) ] --- ## Step 1: Avoid hard-coding specific values .tiny-font.pull-left.width-50[ ```r *species <- "Chinstrap" penguins %>% filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Chinstrap2-out-1.svg)<!-- --> ] --- ## Step 1: Avoid hard-coding specific values .tiny-font.pull-left.width-50[ ```r *species <- "Gentoo" penguins %>% filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Gentoo2-out-1.svg)<!-- --> ] -- This concept is also called: .highlight[Avoiding magic numbers] --- ## Step 2: Define a function .tiny-font.pull-left.width-50[ ```r *make_plot <- function(species) { penguins %>% filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") } ``` ] --- ## Step 2: Define a function .tiny-font.pull-left.width-50[ ```r make_plot <- function(species) { penguins %>% filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") } *make_plot("Adelie") ``` ] .pull-right.width-45.move-up-1em[ ![](functional-programming_files/figure-html/penguins-plot-Adelie3-out-1.svg)<!-- --> ] --- ## Step 2: Define a function .small-font[ ```r make_plot("Adelie") make_plot("Chinstrap") make_plot("Gentoo") ``` ] <img src="functional-programming_files/figure-html/penguins-plot-all-out-1.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-all-out-2.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-all-out-3.svg" width="32%" /> --- ## Rules of thumb about functions -- - You can never write too many functions -- - When you find yourself writing the same code 2-3 times, put it into a function -- - A function should be no longer than 20-40 lines -- - If a function is getting too long, break it into smaller functions --- ## Step 3: Automate calling the function .small-font[ ```r species <- c("Adelie", "Chinstrap", "Gentoo") plots <- map(species, make_plot) ``` ] -- `map` takes each element of the vector `species` and uses it as input for `make_plot()` -- It returns a list of created plots: .small-font[ ```r plots[[1]] ``` <img src="functional-programming_files/figure-html/penguins-plot-map-return-1.svg" width="32%" /> ] --- ## Step 3: Automate calling the function .small-font[ ```r species <- c("Adelie", "Chinstrap", "Gentoo") plots <- map(species, make_plot) ``` ] `map` takes each element of the vector `species` and uses it as input for `make_plot()` It returns a list of created plots: .small-font[ ```r plots[[2]] ``` <img src="functional-programming_files/figure-html/penguins-plot-map-return2-1.svg" width="32%" /> ] --- ## Step 3: Automate calling the function .small-font[ ```r species <- c("Adelie", "Chinstrap", "Gentoo") plots <- map(species, make_plot) ``` ] `map` takes each element of the vector `species` and uses it as input for `make_plot()` It returns a list of created plots: .small-font[ ```r plots[[3]] ``` <img src="functional-programming_files/figure-html/penguins-plot-map-return3-1.svg" width="32%" /> ] --- ## Step 3: Automate calling the function .small-font[ ```r species <- c("Adelie", "Chinstrap", "Gentoo") plots <- map(species, make_plot) plots[[1]] plots[[2]] plots[[3]] ``` ] <img src="functional-programming_files/figure-html/penguins-plot-map-out-1.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-map-out-2.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-map-out-3.svg" width="32%" /> --- ## Step 3: Automate calling the function .small-font[ ```r species <- c("Adelie", "Chinstrap", "Gentoo") plots <- map(species, make_plot) # `walk()` is like `map()` but doesn't return a value # we use it only for side effects (such as printing) walk(plots, print) ``` ] <img src="functional-programming_files/figure-html/penguins-plot-walk-out-1.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-out-2.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-out-3.svg" width="32%" /> --- ## Step 4: Write a more general function -- .tiny-font.pull-left.width-50[ ```r make_plot <- function(species) { * penguins %>% # hard-coded dataset! filter(.data$species == .env$species) %>% ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") } ``` ] --- ## Step 4: Write a more general function .tiny-font.pull-left.width-50[ ```r *make_plot2 <- function(data, species) { * data %>% # filter no longer needed ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") } ``` ] --- ## Step 4: Write a more general function .tiny-font.pull-left.width-50[ ```r make_plot2 <- function(data, species) { data %>% # filter no longer needed ggplot() + aes(bill_length_mm, body_mass_g) + geom_point() + ggtitle(glue("Species: {species}")) + xlab("bill length (mm)") + ylab("body mass (g)") + theme_minimal_grid() + theme(plot.title.position = "plot") } data_adelie <- penguins %>% filter(species == "Adelie") make_plot2(data_adelie, species = "Adelie") ``` ] .pull-right.width-45[ ![](functional-programming_files/figure-html/penguins-plot-generic-out-1.svg)<!-- --> ] --- ## Step 5: Use these concepts in a tidy pipeline .tiny-font[ ```r penguins %>% nest(data = -species) %>% mutate(plots = map(species, make_plot)) ``` ``` # A tibble: 3 × 3 species data plots <fct> <list> <list> 1 Adelie <tibble [152 × 7]> <gg> 2 Gentoo <tibble [124 × 7]> <gg> 3 Chinstrap <tibble [68 × 7]> <gg> ``` ] --- ## Step 5: Use these concepts in a tidy pipeline .tiny-font[ ```r penguins %>% nest(data = -species) %>% mutate(plots = map(species, make_plot)) %>% pull(plots) %>% walk(print) ``` ] <img src="functional-programming_files/figure-html/penguins-plot-walk-tidy-out-1.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-tidy-out-2.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-tidy-out-3.svg" width="32%" /> --- ## Step 5: Use these concepts in a tidy pipeline .tiny-font[ ```r penguins %>% nest(data = -species) %>% mutate(plots = map2(data, species, make_plot2)) %>% pull(plots) %>% walk(print) ``` ] <img src="functional-programming_files/figure-html/penguins-plot-walk-tidy2-out-1.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-tidy2-out-2.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-tidy2-out-3.svg" width="32%" /> -- .small-font[ `map2()` is like `map()` but for functions with 2 arguments ] --- ## Step 5: Use these concepts in a tidy pipeline .tiny-font[ ```r penguins %>% nest(data = -species) %>% mutate(plots = map2(data, species, make_plot2)) %>% pull(plots) %>% walk(print) ``` ] <img src="functional-programming_files/figure-html/penguins-plot-walk-tidy3-out-1.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-tidy3-out-2.svg" width="32%" /><img src="functional-programming_files/figure-html/penguins-plot-walk-tidy3-out-3.svg" width="32%" /> .small-font[ Note: This pipeline automatically processes all species in the dataset, whatever they are called ] --- ## Why no `for` loops? -- - They often require us to think about data logistics (indexing) -- - They encourage writing long, monolithic blocks of code -- - They encourage iterative thinking over conceptual thinking -- - They cannot easily be parallelized or otherwise optimized -- - Most modern programming languages are moving away from `for` loops<br> (examples: Python, Rust, JavaScript) [//]: # "segment ends here" --- ## Further reading - R for Data Science: [Chapter 19: Functions](https://r4ds.had.co.nz/functions.html) - R for Data Science: [Chapter 21.5: The map functions](https://r4ds.had.co.nz/iteration.html#the-map-functions) - **purrr** reference documentation: [Apply a function to each element of a list or atomic vector](https://purrr.tidyverse.org/reference/map.html)