12  Visualizing Trends

Class Objectives:
  1. Trend of a single variable
  2. Smoothed trends
  3. Trend of multiple groups

Trends are another most common type of plot. They are used to show how a variable changes over time (or any other variable of interest).

We load the same packages as before.

library(tidyverse)
library(tidylog)
library(palmerpenguins)
theme_set(hrbrthemes::theme_ipsum())
1
Set the theme for the plots.

12.1 The Most Simple Case: A Basic Trend Plot

Let’s start with a simple example using the economics dataset from the tidyverse. To simplify the example, we’ll filter the data to include only observations after the year 2000 and further restrict it to January data.

economics |>
    filter(year(date) > 2000) |>
    filter(month(date) == 1) |>
    ggplot(aes(x = date, y = unemploy)) +
    geom_point(color = "white", fill = "#0072B2", shape = 21, size = 2)
1
Filter the data to only include data after the year 2000.
2
Filter the data to only include data for the first month of each year.
filter: removed 402 rows (70%), 172 rows remaining
filter: removed 157 rows (91%), 15 rows remaining

The figure above is not very informative. It is difficult to see the trend. Below are some tweaks that can help.

Connect the dots with a line to reveal the trend. This is easy to do with geom_line().

economics |>
    filter(year(date) > 2000) |>
    filter(month(date) == 1) |>
    ggplot(aes(x = date, y = unemploy)) +
    geom_line(linewidth = 1, color = "#0072B2") +
    geom_point(color = "white", fill = "#0072B2", shape = 21, size = 4)
filter: removed 402 rows (70%), 172 rows remaining
filter: removed 157 rows (91%), 15 rows remaining

A useful tip is to color the border of the points white, so they stand out better against the line.

Sometimes a smooth curve is easier on the eyes. We can use the geom geom_xspline() from the ggalt package to create a spline to connect all the points.

economics |>
    filter(year(date) > 2000) |>    
    filter(month(date) == 1) |>
    ggplot(aes(x = date, y = unemploy)) +
    ggalt::geom_xspline(color = "#0072B2") +
    geom_point(color = "white", fill = "#0072B2", shape = 21, size = 4)

An area plot is a great way to show the trend of a variable over time. It is a line plot with a shaded area under the line. We can use geom_area() to create an area plot.

economics |>
    filter(year(date) > 2000) |>
    filter(month(date) == 1) |>
    ggplot(aes(x = date, y = unemploy)) +
    geom_line(linewidth = 1, color = "#0072B2") +
    geom_point(color = "white", fill = "#0072B2", shape = 21, size = 4) +
    geom_area(fill = "#0072B2", alpha = 0.2)
filter: removed 402 rows (70%), 172 rows remaining
filter: removed 157 rows (91%), 15 rows remaining

12.2 Dealing with Many Points

When there are many data points, plotting each one might make your graph look like a plate of spaghetti—delicious, but messy!

economics |>
    filter(year(date) > 2000) |>    
    ggplot(aes(x = date, y = unemploy)) +
    geom_line(color = "#0072B2") +
    geom_point(color = "white", fill = "#0072B2", shape = 21, size = 3)

In these cases, smoothing the data can help you see the overall trend more clearly. Below are some alternatives to consider.

12.2.1 Moving average

A moving average smooths out short-term fluctuations, giving you a clearer view of the long-term trend. It’s like averaging your favorite TV show ratings over several episodes rather than judging based on one controversial finale.

We can use the zoo package to calculate the moving average.

economics |>
    filter(year(date) > 2000) |>
    mutate(unemploy_avg_year = zoo::rollmean(unemploy, k = 12, fill = NA)) |>
    mutate(unemploy_avg_2years = zoo::rollmean(unemploy, k = 24, fill = NA)) |>
    ggplot(aes(x = date, y = unemploy)) +
    geom_line(aes(color = 'line')) +
    geom_line(aes(y = unemploy_avg_year, color = 'year'), size = 1) +
    geom_line(aes(y = unemploy_avg_2years, color = '2 years'), size = 1) +
    scale_color_manual(
      values = c(`line` = "grey60", `year` = "#d55e00", `2 years` = "#009E73"),
      breaks = c("year", "2 years", "line"),
      labels = c("Yearly average", "2-year average", "Line"),
      name = NULL
    ) + 
    theme(legend.position = 'top')
1
Calculate the moving average for the last 12 months.
2
Calculate the moving average for the last 24 months.
3
Add the moving average for the last 12 months to the plot.
4
Add the moving average for the last 24 months to the plot.
5
Customize the color legend.

12.2.2 Regression smoother

Alternatively, you can add a regression smoother to highlight the trend. We have already used the linear regression smoother with geom_smooth(method = "lm) in Chapter 6. It is easy to use other nonlinear methods.

economics |>
    filter(year(date) > 2000) |>    
    ggplot(aes(x = date, y = unemploy)) +
    geom_point(color = "white", fill = "grey60", shape = 21, size = 3) +
    geom_smooth(method = "gam", formula = y ~ s(x, k = 20, bs = 'cr'))
1
Use a generalized additive model (GAM) with a high-degree regression spline.

It is important to note that the smooth line depends heavily on the method used! When possible, it is a good idea to specify the method (like what we did above).

Additionally, when handling many data points, consider preprocessing or smoothing your data with packages like tidymodels—it can be much faster than forcing ggplot2 to do all the heavy lifting.