19  Functional Programming

Class Objectives
  1. Understand the concept of mapping functions
  2. Learn the basic map() functions and their variants
  3. Learn how to use anonymous functions

When you work with data in R, you often need to apply a function repeatedly—whether to every row in a tibble or every element in a vector. You might recall using rowwise() in past tutorials to apply functions to each row of nested data. In this session, we introduce map(), a function from the purrr package (part of the tidyverse) that makes these operations more concise and, in many cases, more powerful.

Think of map() as saying, “Take this function and apply it to each element in this list or vector.” While the idea might seem a little abstract at first, with some practice you’ll appreciate its flexibility and efficiency.

And, just for fun, purrr has the best logo among all tidyverse packages:

From https://purrr.tidyverse.org/

Let’s dive in by loading our “best buddy” packages:

library(tidyverse)
library(tidylog)
library(palmerpenguins)

19.1 Basics of Mapping

19.1.1 Example: Calculate the number of characters in several words

Imagine you have a few words and you want to count the number of characters in each word. Previously, you might have used rowwise() as follows:

tibble(words = c("mango", "lychee", "blueberry")) |>
    rowwise() |>
    mutate(length = nchar(words)) |>
    ungroup()

Here, we had to explicitly mark the tibble as rowwise and then ungroup the results. This works fine, but it can be a bit verbose.

We can use map() instead:

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map(words, nchar))

What’s happening here?

  • map(words, nchar) tells R: “For every element in the words vector, calculate the number of characters using nchar().”
  • The result is a list column because, by default, map() returns a list.

19.1.2 How map() Works

The basic syntax is very simple:

map(.x, .f)
  • .x: The list or vector you want to iterate over.
  • .f: The function to apply to each element.

map() takes each element from .x and sends it through the function .f, collecting the results in a new list. It is not that scary, right?

19.1.3 Getting a Vector Instead of a List

Since map() always returns a list, you might sometimes want to convert the results into a vector format. There are two common ways to do this:

After applying map(), you can “unnest” the list column to turn it into a regular vector:

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map(words, nchar)) |>
    unnest(length)
mutate: new variable 'length' (list) with 3 unique values and 0% NA

The function map_vec() directly returns a vector, so you don’t have to unnest later:

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map_vec(words, nchar))

Surprisingly, map_vec() is a recent introduction to tidyverse (specifically, the end of 2022, see here). Before that, you could use map_dbl() or map_lgl() to get a vector, but you need to specify the type of vector you want to return, which is annoying! Luckily, this is not something you need to worry about anymore.

19.2 Anonymous Functions with map()

So far, it might not be clear why map() is better than rowwise() except that it is shorter to write. Below I will show you one of the huge advantages of map() - you can use anonymous functions (a.k.a. lambda function): These are functions you define “on the fly” without formally naming them, which can make your code more concise when you only need a function once.

19.2.1 Using rowwise()

Let’s say you want to generate a series of random samples from a uniform distribution. Using rowwise(), you’d first have to define a separate function:

# Define a function to generate samples
sample_distribution <- function(n) {
  runif(n, min = -1, max = 1)
}

# Apply the function rowwise
tibble(n = c(10, 20, 30)) |>
  rowwise() |>
  mutate(sample = list(sample_distribution(n))) |>
  ungroup()

## Equivalent `map()` approach
tibble(n = c(10, 20, 30)) |>
  mutate(sample = map(n, sample_distribution))
1
Generate n random samples from a uniform distribution from -1 to 1.

19.2.2 Using Anonymous Functions

If the function is only needed once, you can define it right in the call to map():

tibble(
    num_samples = c(10, 20, 30)
) |>
    mutate(
        sample = map(num_samples, \(x) runif(x, min = -1, max = 1))
    )
mutate: new variable 'sample' (list) with 3 unique values and 0% NA

Explanation:

  • \(x) defines an anonymous function with x as its argument.
  • The body of the function (runif(x, min = -1, max = 1)) uses x directly.
  • This means “for each value in num_samples, generate x random numbers using runif().”

Older code might use a formula-style notation for anonymous functions. Although less common now, it’s still seen in many codebases, so it’s good to know about it. The syntax is as follows:

tibble(
    num_samples = c(10, 20, 30)
) |>
    mutate(
        sample = map(num_samples, ~ runif(., min = -1, max = 1))
    )

In this version, ~ denotes an anonymous function and . is a placeholder for the argument.

:::

19.3 Mapping with Multiple Inputs

Sometimes, you need to work with more than one column simultaneously. Purrr provides two handy functions: map2() and pmap().

19.3.1 map2(): Working with Two Inputs

map2() applies a function to two vectors element-wise. For example:

tibble(
    num_samples = c(10, 20, 30),
    sample_min = c(-1, -2, -3)
) |>
    mutate(
        sample = map2(
            num_samples, sample_min,
            \(n, min) runif(n, min = min, max = 1)
        )
    )
mutate: new variable 'sample' (list) with 3 unique values and 0% NA

How It Works:

  • map2() takes the first element from num_samples and the first element from sample_min
  • We pass them as n and min to the anonymous function \(n, min) runif(n, min = min, max = 1).
  • This is repeated for all corresponding pairs.

19.4 pmap(): Working with Multiple Inputs

If you have more than two vectors, use pmap():

tibble(
    num_samples = c(10, 20, 30),
    sample_min = c(-1, -2, -3),
    sample_max = c(2, 3, 4)
) |>
    mutate(
        sample = pmap(
            list(num_samples, sample_min, sample_max),
            \(n, min, max) runif(n, min = min, max = max)
        )
    )
1
The pmap() function takes a list of columns. So you need to wrap your columns in a list.
2
The anonymous function takes three arguments as what you input in pmap().
mutate: new variable 'sample' (list) with 3 unique values and 0% NA

19.5 When to use map() vs rowwise()

Both map() and rowwise() help you loop over your data, but they have different vibes:

  • map() is your go-to for functional programming. It’s built to apply a function to every element in a vector or list and is extra handy when you’re juggling multiple inputs with helpers like map2() or pmap(). If you dig writing concise code with anonymous functions, map() is where it’s at.

  • rowwise() is more about thinking in rows. It’s great if you want to work row by row in a tibble, and it feels pretty intuitive—like taking it one row at a time. Just keep in mind that once you’re done, you’ll usually want to call ungroup() to tidy up your tibble.

Usually, map() leads to cleaner and more streamlined code, especially if you’re comfortable with a functional programming approach. That said, if you’re just dipping your toes into these concepts, rowwise() can be a nice stepping stone—a sort of “gateway drug” into the world of functional programming.

In most turorials, rowwise() is simply not taught. While I agree that map() is more powerful and you do not need to use it if you do not want to, I personally find that rowwise() is much easier to write and read. It is a natural building step (gateway drug) to functional programming!.