library(tidyverse)
library(tidylog)
library(palmerpenguins)
19 Functional Programming
- Understand the concept of mapping functions
- Learn the basic
map()
functions and their variants - Learn how to use anonymous functions
When you work with data in R, you often need to apply a function repeatedly—whether to every row in a tibble or every element in a vector. You might recall using rowwise()
in past tutorials to apply functions to each row of nested data. In this session, we introduce map()
, a function from the purrr
package (part of the tidyverse
) that makes these operations more concise and, in many cases, more powerful.
Think of map()
as saying, “Take this function and apply it to each element in this list or vector.” While the idea might seem a little abstract at first, with some practice you’ll appreciate its flexibility and efficiency.
And, just for fun, purrr
has the best logo among all tidyverse
packages:
Let’s dive in by loading our “best buddy” packages:
19.1 Basics of Mapping
19.1.1 Example: Calculate the number of characters in several words
Imagine you have a few words and you want to count the number of characters in each word. Previously, you might have used rowwise()
as follows:
tibble(words = c("mango", "lychee", "blueberry")) |>
rowwise() |>
mutate(length = nchar(words)) |>
ungroup()
Here, we had to explicitly mark the tibble as rowwise and then ungroup the results. This works fine, but it can be a bit verbose.
We can use map()
instead:
tibble(words = c("mango", "lychee", "blueberry")) |>
mutate(length = map(words, nchar))
What’s happening here?
map(words, nchar)
tells R: “For every element in thewords
vector, calculate the number of characters usingnchar()
.”- The result is a list column because, by default,
map()
returns a list.
19.1.2 How map()
Works
The basic syntax is very simple:
map(.x, .f)
.x
: The list or vector you want to iterate over..f
: The function to apply to each element.
map()
takes each element from .x
and sends it through the function .f
, collecting the results in a new list. It is not that scary, right?
19.1.3 Getting a Vector Instead of a List
Since map()
always returns a list, you might sometimes want to convert the results into a vector format. There are two common ways to do this:
After applying map()
, you can “unnest” the list column to turn it into a regular vector:
tibble(words = c("mango", "lychee", "blueberry")) |>
mutate(length = map(words, nchar)) |>
unnest(length)
mutate: new variable 'length' (list) with 3 unique values and 0% NA
The function map_vec()
directly returns a vector, so you don’t have to unnest later:
tibble(words = c("mango", "lychee", "blueberry")) |>
mutate(length = map_vec(words, nchar))
map_*()
Surprisingly, map_vec()
is a recent introduction to tidyverse
(specifically, the end of 2022, see here). Before that, you could use map_dbl()
or map_lgl()
to get a vector, but you need to specify the type of vector you want to return, which is annoying! Luckily, this is not something you need to worry about anymore.
19.2 Anonymous Functions with map()
So far, it might not be clear why map()
is better than rowwise()
except that it is shorter to write. Below I will show you one of the huge advantages of map()
- you can use anonymous functions (a.k.a. lambda function): These are functions you define “on the fly” without formally naming them, which can make your code more concise when you only need a function once.
19.2.1 Using rowwise()
Let’s say you want to generate a series of random samples from a uniform distribution. Using rowwise()
, you’d first have to define a separate function:
# Define a function to generate samples
<- function(n) {
sample_distribution runif(n, min = -1, max = 1)
}
# Apply the function rowwise
tibble(n = c(10, 20, 30)) |>
rowwise() |>
mutate(sample = list(sample_distribution(n))) |>
ungroup()
## Equivalent `map()` approach
tibble(n = c(10, 20, 30)) |>
mutate(sample = map(n, sample_distribution))
- 1
-
Generate
n
random samples from a uniform distribution from-1
to1
.
19.2.2 Using Anonymous Functions
If the function is only needed once, you can define it right in the call to map():
tibble(
num_samples = c(10, 20, 30)
|>
) mutate(
sample = map(num_samples, \(x) runif(x, min = -1, max = 1))
)
mutate: new variable 'sample' (list) with 3 unique values and 0% NA
Explanation:
\(x)
defines an anonymous function with x as its argument.- The body of the function (runif(x, min = -1, max = 1)) uses x directly.
- This means “for each value in
num_samples
, generate x random numbers usingrunif()
.”
Older code might use a formula-style notation for anonymous functions. Although less common now, it’s still seen in many codebases, so it’s good to know about it. The syntax is as follows:
tibble(
num_samples = c(10, 20, 30)
|>
) mutate(
sample = map(num_samples, ~ runif(., min = -1, max = 1))
)
In this version, ~
denotes an anonymous function and .
is a placeholder for the argument.
:::
19.3 Mapping with Multiple Inputs
Sometimes, you need to work with more than one column simultaneously. Purrr provides two handy functions: map2()
and pmap()
.
19.3.1 map2()
: Working with Two Inputs
map2()
applies a function to two vectors element-wise. For example:
tibble(
num_samples = c(10, 20, 30),
sample_min = c(-1, -2, -3)
|>
) mutate(
sample = map2(
num_samples, sample_min,runif(n, min = min, max = 1)
\(n, min)
) )
mutate: new variable 'sample' (list) with 3 unique values and 0% NA
How It Works:
map2()
takes the first element fromnum_samples
and the first element fromsample_min
- We pass them as
n
andmin
to the anonymous function\(n, min) runif(n, min = min, max = 1)
. - This is repeated for all corresponding pairs.
19.4 pmap()
: Working with Multiple Inputs
If you have more than two vectors, use pmap()
:
tibble(
num_samples = c(10, 20, 30),
sample_min = c(-1, -2, -3),
sample_max = c(2, 3, 4)
|>
) mutate(
sample = pmap(
list(num_samples, sample_min, sample_max),
runif(n, min = min, max = max)
\(n, min, max)
) )
- 1
-
The
pmap()
function takes a list of columns. So you need to wrap your columns in a list. - 2
-
The anonymous function takes three arguments as what you input in
pmap()
.
mutate: new variable 'sample' (list) with 3 unique values and 0% NA
19.5 When to use map()
vs rowwise()
Both map()
and rowwise()
help you loop over your data, but they have different vibes:
map()
is your go-to for functional programming. It’s built to apply a function to every element in a vector or list and is extra handy when you’re juggling multiple inputs with helpers likemap2()
orpmap()
. If you dig writing concise code with anonymous functions,map()
is where it’s at.rowwise()
is more about thinking in rows. It’s great if you want to work row by row in a tibble, and it feels pretty intuitive—like taking it one row at a time. Just keep in mind that once you’re done, you’ll usually want to callungroup()
to tidy up your tibble.
Usually, map()
leads to cleaner and more streamlined code, especially if you’re comfortable with a functional programming approach. That said, if you’re just dipping your toes into these concepts, rowwise()
can be a nice stepping stone—a sort of “gateway drug” into the world of functional programming.
In most turorials, rowwise()
is simply not taught. While I agree that map()
is more powerful and you do not need to use it if you do not want to, I personally find that rowwise()
is much easier to write and read. It is a natural building step (gateway drug) to functional programming!.