data_nobel <- read.csv("http://clsong.com/assets/class_data/data_nobel.csv")PSet 1
Hello there! Welcome to your first PSet. This time, we’re going to use your R plotting skills to make your data visualizations pop.
- Your Mission: Produce 2 stylish, publication-ready figures using ggplot2 and its many add-on packages.
- You’ll submit:
- Deadline: Apr 26th, 2026
| Criterion | Weight | What we’re looking for |
|---|---|---|
| Required elements | 50% | Does each plot include every bullet-pointed requirement listed below? |
| Readability | 20% | Are axis labels, titles, and annotations large enough to read comfortably? (Recall the Claus Wilke quote from Chapter 6!) |
| Aesthetics & creativity | 20% | Did you go beyond the minimum? Custom theme, smart color palette, clean layout, etc. |
| Reproducibility | 10% | Does your script run top-to-bottom and produce the submitted figures via ggsave() with explicit width and height? |
So, get your creativity (and code) fired up. Let’s begin!
Why Correlation is NOT Causation
We’ve all heard it a thousand times: Correlation does not imply causation. A great example is that nations which consume more chocolate per capita tend to have more Nobel Prize winners per 10 million citizens. I highly doubt if any doctor would advise you that eating more chocolate is your ticket to Stockholm.
The data for chocolate consumption (Chocolate) and the number of Nobel laureates (Nobel) in various countries is here:
We could do a quick scatter plot to see the relationship between the two variables:
library(tidyverse)
ggplot(
data = data_nobel,
aes(x = Chocolate, y = Nobel)
) +
geom_point() +
theme_bw()
This quick plot is, well, meh. It’s not something you’d proudly show in your paper or presentation. For example, in the original paper that published this result, the figure looked much better:

Let’s transform the quick plot into something more publication-ready. Specifically, your plot must include:
-
A descriptive title that clearly communicates the message of the plot (use
labs(title = ...)). -
Country flags instead of points — replace
geom_point()with a flag geom. - A correlation coefficient displayed on the plot.
- Readable text sizes — make sure axis labels, titles, etc., are big enough to read clearly. Remember: the default sizes are almost always too small (Chapter 6).
-
A non-default theme — pick one you like from
jtools,hrbrthemes,ggthemr, etc. (Chapter 6). -
Export with
ggsave()— specify explicitwidthandheight(Chapter 6).
Of course, you can do so much more than that. Explore and be creative!
We did not cover everything in the class, but it is easy to learn new ggplot2 tricks. Below is a step-by-step guide:
Step 1 — Flags instead of points. Install the ggflags package and swap out geom_point():
# install once:
pak::pak("jimjam-slam/ggflags")
# then in your plot, use:
ggflags::geom_flag(aes(country = Flag))The Flag column in the data already contains the two-letter country codes that ggflags needs.
Step 2 — Correlation coefficient. The ggpubr package can stamp the Pearson r and p-value right on the plot:
Step 3 — Credit the data source. Use the caption argument inside labs():
labs(
...,
caption = "Data: Messerli (2012), NEJM"
)Step 4 (Optional) — Country name labels. Non-overlapping text labels:
# install once:
pak::pak("ggrepel")
ggrepel::geom_text_repel(aes(label = Country))As one example, below is a figure that I created using the hrbrthemes package. This is just some reference to help you navigate. I am looking forward to losing to you.

Keeping up with the New York Times
The New York Times often publishes elegant data visualizations. In this problem, we’ll mimic one of their COVID-19 vaccination rate vs. GDP charts. Their graph is below:

We’ll use a smaller dataset that approximates what they used. Even if it’s not the exact same dataset, the design principles remain the same. To load the data into your R environment, run the following code:
data_nyt <- read.csv("http://clsong.com/assets/class_data/data_nyt.csv")In this problem, please build an NYT-style plot. You can add more features to make your plot more informative and visually appealing, but it needs to have at least the following requirements:
- Log-scaled x-axis — GDP spans large orders of magnitude, so use a log scale for the x-axis.
- Color by continent — distinguish points by continent (instead of the income group in the original figure).
- Size by population — we want bigger bubbles for more populous countries.
- Custom color palette — don’t just rely on default color scales. Pick a palette you like (see Chapter 6 for options like MoMA, MetBrewer, PNWColors, etc.).
-
Currency-formatted x-axis — the x-axis labels should display dollar signs (e.g.,
$1,000,$10,000,$100,000) instead of plain numbers. - A non-default theme (Chapter 6).
-
Export with
ggsave()— specify explicitwidthandheight.
Step 1 — Log scale with currency labels. The x-axis is an aesthetic too, so it is controlled by scale_*(). To use a log-10 x-axis with dollar-sign labels:
scale_x_log10(
labels = scales::label_currency()
)The scales package is installed automatically with ggplot2, so no extra installation needed.
Step 2 — Bubble size by population. Map Population to the size aesthetic inside aes(), then control the bubble range with scale_size():
geom_point(
aes(size = Population, fill = Continent),
shape = 21, # filled circle (Chapter 6)
alpha = 0.5
) +
scale_size(
range = c(1, 12), # min and max bubble radius
guide = "none" # hide the size legend
)Step 3 — Custom fill palette. Use scale_fill_manual() with a palette you like:
scale_fill_manual(
values = MoMAColors::moma.colors("Klein", n = 6, type = "discrete")
)Step 4 — Informative title and subtitle. A good title tells the reader the main takeaway. A subtitle can explain what the bubble sizes represent:
labs(
title = "Wealthier countries have administered more COVID-19 vaccines",
subtitle = "Circles are sized by country population",
x = "G.D.P. per capita",
y = "Doses administered per 100 people"
)Again, just for reference. I am looking forward to seeing your more creative and informative plots.
