[1] 5 7 9
EE BIOL C177/C234
rowwise()for and while loops in Week 1 or 2.for loop!Suppose we want to add two numeric vectors element-by-element:
What is the CPU doing in our loop?
z.i = 1.x[1], retrieves y[1], adds them.z to size 1, stores result.i = 2, repeats lookup, grows z again, etc.What is R doing with the vectorized approach?
i), allocating empty vectors, and index offsets.x + y, log(x))“I want to specify at a conceptual level how the data should be analyzed… I don’t want to have to think about the logistics of how the computation is performed.”
— Claus Wilke
Conceptual Definition
Say we have a function f() and we pass a vector x to it. If f() is vectorized, it automatically applies to each element of x and returns a vector of results:
c(f(x[1]), f(x[2]), f(x[3]), ...)
What happens when we add a vector and a single number?
2 to match the length of x.c(1, 2, 3) + c(2, 2, 2).In newer languages like Julia, this implicit recycling is considered a design flaw and is prohibited!
#| eval: false
# Julia code
x = [1, 2, 3]
x + 2 # ❌ ERROR: DimensionMismatch!
x .+ 2 # ✅ Works! The dot (.) explicitly requests element-wise vectorization.R is extremely permissive, which means the programmer must be extra careful!
Suppose we have a tibble of sites and sample depths, and we want to calculate the average depth per row (mean of x and y):
# A tibble: 3 × 3
x y row_mean
<dbl> <dbl> <dbl>
1 1 4 3.5
2 2 5 3.5
3 3 6 3.5
❌ Wait… why is the row mean 3.5 for every single row?
Let’s look at how R evaluates mutate(row_mean = mean(c(x, y))):
c(x, y) first by combining the entire x column and the entire y column: c(1, 2, 3, 4, 5, 6).mean() is not vectorized! It takes that combined 6-element vector and calculates its single average value: 3.5.mutate() expects one result per row, R silently recycles 3.5 to fill all 3 rows.rowwise()rowwise() 🎯rowwise() is a special grouping function in dplyr that forces operations inside mutate() (and other verbs) to be applied individually, row by row:
tibble(
x = c(1, 2, 3),
y = c(4, 5, 6)
) |>
rowwise() |>
mutate(row_mean = mean(c(x, y))) |>
ungroup()# A tibble: 3 × 3
x y row_mean
<dbl> <dbl> <dbl>
1 1 4 2.5
2 2 5 3.5
3 3 6 4.5
c(1, 4) → mean is 2.5c(2, 5) → mean is 3.5c(3, 6) → mean is 4.5By using vectorized functions and dplyr verbs, your code reads like a list of conceptual steps:
✅ Vectorized (Declarative)
❌ Loop (Imperative)
Tip
Vectorized code describes what you want to achieve, rather than details of how to traverse the computer’s memory.
for loop, it has to evaluate types, check variables, and manage memory under the hood.for loop to vectorize!Managing manual index offsets and tracking variables leads to complicated control flow:
“Complicated control flows confuse programmers. Messy code often hides bugs.”
— Bjarne Stroustrup (Creator of C++)
Core Rule
Let the language handle the logistics of traversal, so you can focus on writing correct scientific logic.
mutate(), filter()) are designed to be vectorized.mean(), sum() average or sum their entire input.rowwise() |> ... |> ungroup() to force row-by-row evaluation in tibbles!Basics of Vectorization