PSet 2

Welcome to your second problem set! In this assignment, you will import and tidy real data to perform exploratory analyses. We will replicate parts of the analysis from a recent paper at the interface between ecology and evolution. You can read the paper here.

What to Turn In

  • Submission Package: A folder (or zip file) that contains:
    • RStudio project.
    • Your code for data wrangling and analysis.
    • The two final publication-ready figures.
  • Deadline: February 23rd, 2025

Assignment Instructions

  1. Project Setup

    • Create a New Project: Start a new RStudio project.
    • Download Data:
      • Download the full data archive from here.
      • This archive contains two CSV files:
        • Snail Data: PSet2_snail.csv
        • Vegetation Zone Totals: PSet2_vegzonetotals.csv
    • Organize Your Files: Place the downloaded CSV files in a subfolder named data within your project directory.
  2. Data Import

    • Load the Data: Import both CSV files into R.
    • Hint: Use RSudio import shortcut and then copy-paste the associated read_csv() code chunk into your script.
  3. Data Cleaning: Filter Out Specific Islands

    • In the data_snail tibble, remove all rows where the island column is “CH”, “ED”, or “GA”.
    • Hint: Think whether we should use the filter() or select() verb?
  4. Join Data:

    • Join data_snail with data_veg using the common variable island.
    • Hint: Consider whether a left_join(), full_join(), or another type of join is most appropriate.
  5. Data Transformation: Normalize Species Diversit

    • The variable spdiv (species diversity) needs to be normalized based on habitat type:

      • If habitat is “Arid”, compute: normalized_spdiv = spdiv / AridTotal
      • If habitat is “Humid”, compute: normalized_spdiv = spdiv / HumidTotal
    • Hint: Use mutate() together with if_else() or case_when() to create a new normalized species diversity variable.

  6. Data Visualization: Replicate Figure 5 from the Paper

    • Create a scatter plot using ggplot2 that shows the relationship between the normalized species diversity (normalized_spdiv) and functional diversity (funcdiv).
    • Instead of using geom_point(), use geom_text() so that each data point is labeled with the island.
    • Aim to reproduce or even improve upon the appearance of Figure 5 from the paper.
  7. Create One More Publication-Quality Figure:

    • Explpre another interesting relationship in the dataset (such as amount or distribution).
    • Use what you have learned in the class to make the figure informative and aesthetically pleasing.