Functions and functional programming

Claus O. Wilke

2025-03-11

We often have to run similar code multiple times

penguins |>
  filter(species == "Adelie") |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle("Species: Adelie") +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

We often have to run similar code multiple times

penguins |>
  filter(species == "Chinstrap") |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle("Species: Chinstrap") +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

We often have to run similar code multiple times

penguins |>
  filter(species == "Gentoo") |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle("Species: Gentoo") +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

How can we make our life simpler and avoid massive code duplication?

Step 1: Avoid hard-coding specific values

species <- "Adelie"

penguins |>
  filter(.data$species == .env$species) |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle(glue("Species: {species}")) +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

A quick aside: the pronouns .data and .env

We can use pronouns to distinguish data columns from variables:

species <- "Adelie"

penguins |>
  filter(.data$species == .env$species)

.data$species is a column in the data frame

.env$species is a variable in the local environment

Alternatively we would have to make sure the names don’t clash:

species_choice <- "Adelie"

penguins |>
  filter(species == species_choice)

Step 1: Avoid hard-coding specific values

species <- "Adelie"

penguins |>
  filter(.data$species == .env$species) |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle(glue("Species: {species}")) +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

Step 1: Avoid hard-coding specific values

species <- "Chinstrap"

penguins |>
  filter(.data$species == .env$species) |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle(glue("Species: {species}")) +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

Step 1: Avoid hard-coding specific values

species <- "Gentoo"

penguins |>
  filter(.data$species == .env$species) |>
  ggplot() +
  aes(bill_length_mm, body_mass_g) +
  geom_point() +
  ggtitle(glue("Species: {species}")) +
  xlab("bill length (mm)") +
  ylab("body mass (g)") +
  theme_minimal_grid() +
  theme(plot.title.position = "plot")

 

This concept is also called: Avoiding magic numbers

Step 2: Define a function

make_plot <- function(species) {
  penguins |>
    filter(.data$species == .env$species) |>
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

Step 2: Define a function

make_plot <- function(species) {
  penguins |>
    filter(.data$species == .env$species) |>
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

make_plot("Adelie")

 

Step 2: Define a function

make_plot <- function(species) {
  penguins |>
    filter(.data$species == .env$species) |>
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

make_plot("Chinstrap")

 

Step 2: Define a function

make_plot <- function(species) {
  penguins |>
    filter(.data$species == .env$species) |>
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

make_plot("Gentoo")

 

Rules of thumb about functions

  • You can never write too many functions
  • When you find yourself writing the same code 2-3 times, put it into a function
  • A function should be no longer than 20-40 lines
  • If a function is getting too long, break it into smaller functions

Step 3: Automate calling the function

We need a brief detour to talk about lists and the map() pattern

Lists

In R, lists are a data structure that can store multiple elements of various types

A list of words:

list("apple", "orange", "banana")
[[1]]
[1] "apple"

[[2]]
[1] "orange"

[[3]]
[1] "banana"

Lists

In R, lists are a data structure that can store multiple elements of various types

A list of numbers:

list(5, 7, 12)
[[1]]
[1] 5

[[2]]
[1] 7

[[3]]
[1] 12

Lists

In R, lists are a data structure that can store multiple elements of various types

A list of mixed data types:

list(5, "apple", TRUE)
[[1]]
[1] 5

[[2]]
[1] "apple"

[[3]]
[1] TRUE

For comparison, all elements of a vector are coerced into the same type:

c(5, "apple", TRUE)
[1] "5"     "apple" "TRUE" 

Lists

In R, lists are a data structure that can store multiple elements of various types

A list of vectors:

list(1:5, c("apple", "orange", "banana"), c(TRUE, FALSE))
[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "apple"  "orange" "banana"

[[3]]
[1]  TRUE FALSE

Lists

In R, lists are a data structure that can store multiple elements of various types

A list of lists:

list(list(1, 2), list(3, 4))
[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] 2


[[2]]
[[2]][[1]]
[1] 3

[[2]][[2]]
[1] 4

Lists

You can access individual elements of a list with the double brackets operator:

fruit <- list("apple", "orange", "banana")
fruit
[[1]]
[1] "apple"

[[2]]
[1] "orange"

[[3]]
[1] "banana"

Lists

You can access individual elements of a list with the double brackets operator:

fruit <- list("apple", "orange", "banana")
fruit[[1]]
[1] "apple"

Lists

You can access individual elements of a list with the double brackets operator:

fruit <- list("apple", "orange", "banana")
fruit[[3]]
[1] "banana"

The map() pattern

The map() function applies a function to all elements of a vector or list and returns the result in a list

This pattern can be used instead of loops

Example: Calculate the squares of the numbers 3, 4, 5:

# define function that calculates square
square <- function(x) x^2

# apply function to the numbers 3, 4, 5
map(3:5, square) 
[[1]]
[1] 9

[[2]]
[1] 16

[[3]]
[1] 25

The map() pattern

We can define the function to be applied on the fly:

map(3:5, function(x) x^2) 
[[1]]
[1] 9

[[2]]
[1] 16

[[3]]
[1] 25

Even simpler:

map(3:5, \(x) x^2) 
[[1]]
[1] 9

[[2]]
[1] 16

[[3]]
[1] 25

The map() pattern

Also:

map(3:5, ~.x^2) 
[[1]]
[1] 9

[[2]]
[1] 16

[[3]]
[1] 25

Note: The formula definition is not part of the base R language and only works for certain tidyverse functions

The map() pattern

Sometimes it’s more convenient to get a vector as return value:

map_dbl(3:5, ~.x^2) 
[1]  9 16 25

Similarly:

  • map_chr() returns a vector of strings
  • map_int() returns a vector of integers
  • map_lgl() returns a vector of logicals

Now let’s go back to making plots

species <- c("Adelie", "Chinstrap", "Gentoo")
plots <- map(species, make_plot)

map takes each element of the vector species and uses it as input for make_plot()

It returns a list of created plots:

plots[[1]]

 

Now let’s go back to making plots

species <- c("Adelie", "Chinstrap", "Gentoo")
plots <- map(species, make_plot)

map takes each element of the vector species and uses it as input for make_plot()

It returns a list of created plots:

plots[[2]]

 

Now let’s go back to making plots

species <- c("Adelie", "Chinstrap", "Gentoo")
plots <- map(species, make_plot)

map takes each element of the vector species and uses it as input for make_plot()

It returns a list of created plots:

plots[[3]]

 

Now let’s go back to making plots

species <- c("Adelie", "Chinstrap", "Gentoo")
plots <- map(species, make_plot)

# put all plots side-by-side with patchwork
patchwork::wrap_plots(plots)

 

Step 4: Write a more general function

make_plot <- function(species) {
  penguins |> # hard-coded dataset!
    filter(.data$species == .env$species) |>
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

Step 4: Write a more general function

make_plot2 <- function(data, species) {
  data |>
    # filter no longer needed
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

Step 4: Write a more general function

make_plot2 <- function(data, species) {
  data |>
    # filter no longer needed
    ggplot() +
    aes(bill_length_mm, body_mass_g) +
    geom_point() +
    ggtitle(glue("Species: {species}")) +
    xlab("bill length (mm)") +
    ylab("body mass (g)") +
    theme_minimal_grid() +
    theme(plot.title.position = "plot")
}

data_adelie <- penguins |>
  filter(species == "Adelie")

make_plot2(data_adelie, species = "Adelie")

 

Step 5: Use these concepts in a tidy pipeline

penguins |>
  nest(data = -species)
# A tibble: 3 × 2
  species   data              
  <fct>     <list>            
1 Adelie    <tibble [152 × 7]>
2 Gentoo    <tibble [124 × 7]>
3 Chinstrap <tibble [68 × 7]> 

Step 5: Use these concepts in a tidy pipeline

penguins |>
  nest(data = -species) |>
  mutate(plots = map(species, make_plot))
# A tibble: 3 × 3
  species   data               plots 
  <fct>     <list>             <list>
1 Adelie    <tibble [152 × 7]> <gg>  
2 Gentoo    <tibble [124 × 7]> <gg>  
3 Chinstrap <tibble [68 × 7]>  <gg>  

Step 5: Use these concepts in a tidy pipeline

penguins |>
  nest(data = -species) |>
  mutate(plots = map(species, make_plot)) |>
  pull(plots) |>
  patchwork::wrap_plots()

 

Step 5: Use these concepts in a tidy pipeline

penguins |>
  nest(data = -species) |>
  mutate(plots = map2(data, species, make_plot2)) |>
  pull(plots) |>
  patchwork::wrap_plots()

 

map2() is like map() but for functions with 2 arguments

Note: This pipeline automatically processes all species in the dataset, whatever they are called

Why no for loops?

  • They often require us to think about data logistics (indexing)
  • They encourage writing long, monolithic blocks of code
  • They encourage iterative thinking over conceptual thinking
  • They cannot easily be parallelized or otherwise optimized
  • Most modern programming languages are moving away from for loops
    (examples: Python, Rust, JavaScript)

Further reading