<- function(weight, height) {
calculate_bmi <- weight / (height^2)
bmi return(bmi)
}
calculate_bmi(weight = 70, height = 1.75)
[1] 22.85714
Remember our discussion about vectorization and nested data? We’ve been using functions all along - mean()
and sum()
. But sometimes, you need a function that doesn’t exist yet. That’s when you write your own!
Here are three signs you need a function:
Writing functions also helps you avoid mistakes. For example, if you need to change a number in multiple places, you may forget to change one of them (speaking from experience :-(
Let’s start with a simple function:
<- function(weight, height) {
calculate_bmi <- weight / (height^2)
bmi return(bmi)
}
calculate_bmi(weight = 70, height = 1.75)
[1] 22.85714
Let’s break down the components:
calculate_bmi
: The function name (make it descriptive!)function(weight, height)
: The arguments your function needs{
and }
: What your function doesreturn(bmi)
: What your function gives backImportantly, the last line of your function is the value that is returned. So you can also write:
<- function(weight, height) {
calculate_bmi / (height^2)
weight }
It is recommened to not use return()
in your function.
Write a function to convert Fahrenheit to Celsius.
Remember how we talked about the importance of vectorization? When writing your own functions, you want them to be vectorized too! The good news is that R makes this easy - if you use vectorized operations inside your function, your function will be vectorized automatically. For example, our function calculate_bmi()
is vectorized automatically:
library(tidyverse)
tibble(
weight = c(70, 80, 90),
height = c(1.75, 1.80, 1.85)
|>
) mutate(bmi = calculate_bmi(weight, height))
Sometimes it might be not obvious that a function is vectorized. The most simple solution is to test your function to a vector or a tibble. If it works, it’s (almost certainly) vectorized!
Write a function to calculate bill area (length × depth) that works with the penguins dataset.
Some functions work with entire datasets rather than element by element. For these cases, we need to use rowwise()
when working with nested data:
# Function that works with a dataset
<- function(data) {
calculate_r2 <- lm(bill_length_mm ~ bill_depth_mm, data = data)
model return(summary(model)$r.squared)
}
library(palmerpenguins)
# Need rowwise() because calculate_r2 works on entire datasets
|>
penguins group_by(species) |>
nest() |>
rowwise() |> # Important! Because calculate_r2 is not vectorized
mutate(
r2 = calculate_r2(data)
|>
) ungroup()
dplyr
: Tidy EvaluationWe now touch a (used to be scary) topic known as tidy evaluation
. In tidyverse, a gereat feature is that we do not need to use "variable"
to refer to a variable. Instead, we can use the variable itself without quoting. However, this feature can be a bit tricky when we want to write functions that uses variables as arguments.
I know it sounds abstract. Let’s imagine you’re analyzing the penguins
dataset and want to calculate the mean bill_length_mm
grouped by different groups. Your code looks like this:
# Group by species
|>
penguins group_by(species) |>
summarise(mean = mean(bill_length_mm, na.rm = TRUE))
# Group by sex
|>
penguins group_by(sex) |>
summarise(mean = mean(bill_length_mm, na.rm = TRUE))
# ... and so on for island, year, etc. 😩
It gets a bit tedious to write the same code over and over again. Let’s write a function to automate this.
With what we know so far, you might try this:
<- function(data, variable) {
grouped_mean |>
data group_by(variable) |>
summarise(mean = mean(bill_length_mm, na.rm = TRUE))
}
However, this function does not work:
grouped_mean(penguins, "species")
Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `variable` is not found.
R thinks you’re literally grouping by a column named “variable” in the dataset (which doesn’t exist).
Luckily, it is super easy to fix this. To tell R, “Hey, use the variable I’m giving you, not the word ‘variable’!” wrap your argument in { }
:
<- function(data, variable) {
grouped_mean_corrected |>
data group_by({{ variable }}) |>
summarise(mean = mean(bill_length_mm, na.rm = TRUE))
}
grouped_mean_corrected(penguins, species)
In other words, {{ }}
tells tidyverse: “Treat variable as the column I pass into the function, not as a literal name.”
When writing functions, follow these guidelines:
calculate_bmi()
, not bmi()
# Good function design
<- function(weight, height, units = "metric") {
calculate_bmi if (units == "imperial") {
# Convert pounds to kg and inches to meters
<- weight * 0.453592
weight <- height * 0.0254
height
}return(weight / (height^2))
}
# Now works with both metric and imperial units
calculate_bmi(70, 1.75) # metric (default)
[1] 22.85714
calculate_bmi(154, 69, units = "imperial") # imperial
[1] 22.74157
It is a good idea to check the input validity of your function. This can be done by using stop()
to stop the function and return an error message.
<- function(weight, height, units = "metric") {
calculate_bmi # Input validation
if (weight <= 0 || height <= 0) {
stop("Weight and height must be positive numbers")
}
if (units == "imperial") {
<- weight * 0.453592
weight <- height * 0.0254
height else if (units != "metric") {
} stop('Units must be either "metric" or "imperial"')
}
return(weight / (height^2))
}
Good functions need good documentation, especially if you want others (or even you in the future) what the function does.
In R, we use roxygen2 style comments to document our functions:
#' Calculate Body Mass Index (BMI)
#'
#' @param weight Weight in kilograms
#' @param height Height in meters
#' @return BMI value (kg/m^2)
#' @examples
#' calculate_bmi(70, 1.75)
<- function(weight, height) {
calculate_bmi <- weight / (height^2)
bmi return(bmi)
}
The documentation includes:
@param
)@return
)@examples
)Writing functions is a crucial skill that builds on our understanding of vectorization and nested data:
Remember: A well-written function is like a good tool - it should do one thing, do it well, and be easy to use! 🛠️