Functional Programming

EE BIOL C177/C234

Chuliang Song

Why Mapping?

Mapping vs. Rowwise 📂

Imagine we want to count the characters in several words. Let’s see them side-by-side:

✅ Rowwise Approach

tibble(words = c("mango", "lychee", "blueberry")) |>
    rowwise() |>
    mutate(length = nchar(words)) |>
    ungroup()

# A tibble: 3 × 2
  words     length
  <chr>      <int>
1 mango          5
2 lychee         6
3 blueberry      9

⚡ Mapping Approach

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map(words, nchar))

# A tibble: 3 × 2
  words     length   
  <chr>     <list>   
1 mango     <int [1]>
2 lychee    <int [1]>
3 blueberry <int [1]>

Note

map() is cleaner and avoids marking/unmarking grouping with rowwise() & ungroup().

Anatomy of `map()` 🔎

The syntax is exceptionally clean:

map(.x, .f)

.x: The vector or list you want to loop over.
.f: The function to apply to each element.
R takes each element of .x, runs it through .f, and outputs a list.

List Column Outputs ⚙️

Because map() returns a list by default, mutate(length = map(words, nchar)) creates a list-column:

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map(words, nchar))

# A tibble: 3 × 2
  words     length   
  <chr>     <list>   
1 mango     <int [1]>
2 lychee    <int [1]>
3 blueberry <int [1]>

Note how <list [1]> is stored under length. To get a flat vector, we need to handle it.

Vectorized Outputs: Two Roads 🛣️

Road 1: unnest() the list-column afterward.
Road 2: Use map_vec() to return a vector directly.

🛣️ Road 1: Unnest

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map(words, nchar)) |>
    unnest(length)

# A tibble: 3 × 2
  words     length
  <chr>      <int>
1 mango          5
2 lychee         6
3 blueberry      9

🛣️ Road 2: map_vec

tibble(words = c("mango", "lychee", "blueberry")) |>
    mutate(length = map_vec(words, nchar))

# A tibble: 3 × 2
  words     length
  <chr>      <int>
1 mango          5
2 lychee         6
3 blueberry      9

💡 Callout: The History of `map_*()`

Before map_vec()

map_vec() was introduced in late 2022 (purrr 1.0.0). Before that, developers had to write type-specific functions to get vectors: - map_dbl(words, nchar) (for numbers) - map_chr(words, nchar) (for characters) - map_lgl(words, nchar) (for logicals)

In older repositories, you will see map_dbl() or map_chr() frequently!

Anonymous Functions

What is an Anonymous Function? 👥

An anonymous function (or lambda function) is defined “on the fly” without a formal name.
Extremely useful when you need a custom function once inside a map() call.

Suppose we want to generate random uniform samples. - With rowwise(): We had to write a named helper function beforehand. - With map(): We can define it right inside the pipeline!

Comparing Approaches: Uniform Samples 🎲

📦 Named Helper

# 1. Define named function
sample_distribution <- function(n) {
    runif(n, min = -1, max = 1)
}
# 2. Map it
tibble(n = c(3, 4, 5)) |>
    mutate(sample = map(n, sample_distribution))

# A tibble: 3 × 2
      n sample   
  <dbl> <list>   
1     3 <dbl [3]>
2     4 <dbl [4]>
3     5 <dbl [5]>

👥 Anonymous Inline

# Defined inline using \(x)
tibble(n = c(3, 4, 5)) |>
    mutate(sample = map(n, \(x) runif(x, min = -1, max = 1)))

# A tibble: 3 × 2
      n sample   
  <dbl> <list>   
1     3 <dbl [3]>
2     4 <dbl [4]>
3     5 <dbl [5]>

Inside the `\(x)` Syntax 🔍

Let’s break down \(x) runif(x, min = -1, max = 1):

\(x): Represents “this is a function taking one argument called x”.
runif(x, ...): The body of the function. For each element of .x in map(), R binds the element’s value to x and evaluates this expression.

Note

In standard R, \(x) is identical to writing function(x). It is simply a shorter, modern shorthand!

Older Syntax: Formula Style 🕰️

In older codebases, you will see a formula-style syntax using ~ and . instead:

tibble(num_samples = c(10, 20, 30)) |>
    mutate(
        sample = map(num_samples, ~ runif(., min = -1, max = 1))
    )

~: Marks the beginning of the formula-based anonymous function.
.: Represents the value being passed (the placeholder).
Modern R recommends the standard \(x) lambda syntax as it is cleaner and less ambiguous!

Multi-Input Mapping

`map2()`: Working with Two Inputs 👥

When you need to iterate over two columns simultaneously element-by-element, use map2():

tibble(
    num_samples = c(3, 4, 5),
    sample_min = c(-1, -2, -3)
) |>
    mutate(
        sample = map2(num_samples, sample_min,
            \(n, min) runif(n, min = min, max = 1))
    )

# A tibble: 3 × 3
  num_samples sample_min sample   
        <dbl>      <dbl> <list>   
1           3         -1 <dbl [3]>
2           4         -2 <dbl [4]>
3           5         -3 <dbl [5]>

`pmap()`: Three or More Inputs 👥👥

If you have three or more vectors/columns, use pmap(). You must wrap the columns in a list():

tibble(
    num_samples = c(3, 4, 5),
    sample_min = c(-1, -2, -3),
    sample_max = c(2, 3, 4)
) |>
    mutate(
        sample = pmap(list(num_samples, sample_min, sample_max),
            \(n, min, max) runif(n, min = min, max = max))
    )

# A tibble: 3 × 4
  num_samples sample_min sample_max sample   
        <dbl>      <dbl>      <dbl> <list>   
1           3         -1          2 <dbl [3]>
2           4         -2          3 <dbl [4]>
3           5         -3          4 <dbl [5]>

Choosing Your Tool

`map()` vs `rowwise()` ⚔️

Both solve the same basic problem, but they have distinct strengths:

map() (Functional Programming):
- Extremely concise, fast, and robust.
- Excellent for complex structures and multiple inputs (map2(), pmap()).
rowwise() (Row-oriented thinking):
- Exceptionally intuitive and easy to read.
- Feels like taking a dataset “one row at a time.”
- Requires an explicit ungroup() to tidy up afterward.

Summary Checklist 📋

map(.x, .f) applies function .f element-wise to .x.
map_vec() directly returns a flat vector instead of a list.
Use \(x) to declare anonymous functions inline for clean, one-off logic.
Iterate over two vectors with map2(), and three or more with pmap(list(...)).
Let rowwise() be your gateway drug, but master map() for production-grade functional pipelines!