data_nobel <- read.csv("http://clsong.com/assets/class_data/data_nobel.csv")
PSet 1
Hello there! Welcome to your first PSet. This time, we’re going to use your R plotting skills to make your data visualizations pop.
- Your Mission: Produce 2 stylish, publication-ready figures using ggplot2 and its many add-on packages.
-
You’ll submit:
- Your R scripts.
- The exported images of your plots (PDF AND PNG).
- Deadline: Feb 2nd, 2025
So, get your creativity (and code) fired up. Let’s begin!
Why Correlation is NOT Causation
We’ve all heard it a thousand times: Correlation does not imply causation. A great example is that nations which consume more chocolate per capita tend to have more Nobel Prize winners per 10 million citizens. I highly doubt if any doctor would advise you that eating more chocolate is your ticket to Stockholm.
The data for chocolate consumption (Chocolate
) and the number of Nobel laureates (Nobel
) in various countries is here:
We could do a quick scatter plot to see the relationship between the two variables:
library(tidyverse)
ggplot(
data = data_nobel,
aes(x = Chocolate, y = Nobel)
) +
geom_point() +
theme_bw()
This quick plot is, well, meh. It’s not something you’d proudly show in your paper or presentation. For example, in the original paper that published this result, the figure looked much better:
Let’s transform the quick plot into something more publication-ready. Specifically, your plot must include:
- A Title that clearly shows the message of the plot.
- Country flags instead of points.
- A correlation coefficient on the plot.
- Readable text sizes. Make sure axis labels, titles, etc., are big enough to see clearly.
Of course, you can do so much more than that. Explore and be creative!
We did not cover everything in the class, but it is easy to learn new ggplot2 tricks. Below are some potentially useful resources:
- How to add country flags? You can use the geom from
ggflags::geom_flag()
to do this. TheFlag
column in the data contains the country codes. So it would be something likeggflags::geom_flag(aes(country = Flag))
. - How to add a correlation coefficient to the plot? You can use the
ggpubr::stat_cor()
package to do this. - How to indicate the data source? To give due credit to the original author of the data, you can use the
caption
argument in thelabs()
function. - (Optional) add the country names to the plot. You can use the
ggrepel::geom_text_repel()
function to do this.
As one example, below is a figure that I created using the hrbrthemes
package. This is just some reference to help you navigate. I am looking forward to losing to you.
Keeping up with the New York Times
The New York Times often publishes elegant data visualizations. In this problem, we’ll mimic one of their COVID-19 vaccination rate vs. GDP charts. Their graph is below:
We’ll use a smaller dataset that approximates what they used. Even if it’s not the exact same dataset, the design principles remain the same. To load the data into your R environment, run the following code:
data_nyt <- read.csv("http://clsong.com/assets/class_data/data_nyt.csv")
In this problem, please build an NYT-style plot for this plot. You can add more features to make your plot more informative and visually appealing, but it needs to have at least the following requirement:
- Because GDP spans large orders of magnitude, the x-axis should be in log scale.
- Distinguish points by continent (instead of the income group in the original figure).
- We want bigger bubbles for more populous countries.
- Customized color/fill palette. Don’t just rely on default color scales. Find a palette you like.
- Add the
$
logo on the x-axis.
- As we mentioned, x axis is also an aes, so it is also controlled by
scale_*()
. Google or ask ChatGPT how to do this. - You can use the
scales::label_currency()
function to format the axis labels.
Again, just for refernece purpose. I am looking forward to seeing your more creative and informative plots.