1  Course goal

1.1 Why Bother with This Course in the Age of AI?

Alright, let’s cut to the chase. You signed up to study Ecology and Evolutionary Biology, not Computer Science. You probably envision yourself out in the field, knee-deep in mud, observing the subtle interactions of ecosystems—not stuck behind a screen, wrestling with semicolons and syntax errors. And now, with these fancy new AI tools that seem to do everything but your taxes, you might be wondering: why bother learning this computational stuff at all? Can’t a chatbot handle it all—generate analyses, whip up some plots, and call it a day?

It’s a fair question. Technology is reshaping everything, and science is no exception. But here’s the thing: just as biologists in the past had to master the microscope, today’s biologists need a strong foundation in computing. This course isn’t about turning you into a programmer; it’s about giving you the keys to harness technology effectively—so you’re steering the ship, not just along for the ride.

Before any anxiety sets in, let me assure you that this course is not gonna be a sink-or-swim experience. We’ll embark on a guided journey into computing, building your skills step by step. Think of it as learning a new language—with much simpler grammar, thankfully. We’ll use concrete example with annotated codes and hands-on exercises to support you along the way.

And about those AI tools? Yes, they can be incredibly useful, no doubt. But relying on them blindly, without understanding the underlying principles, is like cooking from a recipe in a language you barely know. Sure, you might finish the dish, but there’s a good chance you’ll miss something critical—maybe misinterpret a step, add the wrong ingredient, or be puzzled by the end result. Computational literacy helps you to catch when the AI is leading you astray, saving you from headaches (and potentially flawed results) down the line.

I get the temptation of the “easy button”, we all do. But investing in these skills now will pay off many times over. You’ll be a more efficient, more confident researcher—and, frankly, it will boost your chances of publishing in those coveted high-profile journals.

1.2 What You’ll Learn

We’ll cover the following topics, which are, in my view, the most essential computing skills for ecologists today. These are the things I wish I knew when I started grad school – it would have saved me so much time and frustration!

1.2.1 Fluency with R

What is our weapon of choice in this course? The R programming language. It is not the most well-polished language. It is not the fastest language. It is not the most user-friendly language. So, why R? Well, it is the lingua franca in our community. The computational tools in ecology are largely R-based. So, if you want to tap into this wealth, you’ll need a working knowledge of R.

Not sold on R? You could try Julia, a newer language that’s faster, more elegant, and gaining popularity. The syntax is similar enough to R to make it a relatively easy transition, and it can even load R packages, so you can have the best of both worlds.

R has also evolved dramatically in the past decade with the rise of the tidyverse suite. When I started grad school, I absolutely hated using R. But the tidyverse transformed that experience. It introduced a coherent way of handling and visualizing data—one that feels logical and intuitive. I firmly believe it’s the right approach to data manipulation and plotting.

The tidyverse revolution is largely thanks to Hadley Wickham, whose contributions earned him the COPSS Presidents’ Award, the top honor in statistics.

But it’s not a one-person show—the tidyverse thrives because of a vibrant community united by a shared vision (link).

1.2.2 Crafting Publication-Ready Figures

With ggplot2, you’ll learn the Grammar of Graphics, one of the most powerful plotting systems out there. Beyond just producing high-quality figures, it will teach you a structured approach to data visualization.

While ggplot2 is powerful, its defaults aren’t always perfect. We’ll cover design principles that enhance both the appeal and informativeness of your figures. This skill is quite useful—after all, figures often make the first impression on your readers. By the end of the course, you’ll be capable of creating figures like this:

by Cédric Scherer

1.2.3 The Art of Data Wrangling

In our daily research, data is rarely neat. Before you can start analyzing, you’ll likely spend hours organizing it into a usable format. Fortunately, dplyr and tidyr simplify this process, turning data wrangling into a clear, logical workflow.

To get a quick taste, let us imagine describing your morning routine. In base R, it might look like a convoluted nest of functions:

eat(dressed(shower(brush(wakeup(you)))))

Or, a series of perplexing intermediate variables:

you_w <- wakeup(you)
you_bw <- brush(you_w)
you_sbw <- shower(you_bw)
...

With dplyr, however, it transforms into a straightforward, logical flow:

you |> 
  wakeup() |> 
  brush() |> 
  shower() |> 
  get_dressed() |> 
  eat() 

In the tidyverse ecosystem, each step of data manipulation is easy to write and, more importantly, easy to follow when reading other people’s code. We’ll introduce you to a core set of verbs—fewer than ten—that will handle over 90% of everyday tasks. You see, coding languages are indeed simpler than human languages!

1.2.4 Scientific typsetting

Yes, Microsoft Word is great. But when it comes to updating figures, managing references, or maintaining consistent layouts, it can quickly turn into a labyrinthine headache. The core issue is that Word is a WYSIWYG editor—What You See Is What You Get (a weird acronym, I know).

The problem with WYSIWYG editors is that they often require manual tweaks—nudging an image here, adjusting a margin there—which not only consumes valuable time but also opens the door to errors. This is why separating content from layout is a strategy worth adopting. Instead of wrestling with formatting, you focus on the substance of your work—the data, the analysis, the insights—and let the tools handle the presentation.

While there are several tools for this purpose (including the notorious LaTeX), we’ll focus on Quarto, a modern document preparation system tailored for researchers. Quarto allows you to seamlessly integrate code, references, and figures, producing professional outputs in submission-ready formats like Word or LaTeX with minimal effort. By the end of this course, you’ll be able to generate documents like this with a single click in R:

by LaPreprint

While Quarto is powerful, sometimes you need specialized tools for optimal layout. LaTeX is one option, but it can feel outdated and cumbersome. We’ll briefly explore Typst, a modern alternative that offers the sophistication of LaTeX without the steep learning curve. It is also much faster and provided user-friendly interface for bebuggging. I firmlt believe Typst will replace LaTeX in the near future.

1.2.5 Reproducible Research

As a data editor for a journal, I’m often struck by how many papers are challenging to reproduce—which I see as a huge problem for science. In this course, you’ll learn to make your research fully reproducible, which benefits not only the scientific community but also you—it makes it easier to revisit and build upon your work.

PhD comics by Jorge Cham

We’ll get into version control using Git and GitHub, tools that help you track changes, back up your work, and collaborate effectively. Additionally, we’ll cover specific techniques in R that support reproducible workflows, ensuring your research stands on a solid foundation.

1.3 Course Plan (Tentative)

1.3.1 Introduction to R and RStudio

  • Installing R and RStudio
  • Navigating the RStudio interface
  • Basic R syntax and operations
  • Introduction to R packages and the tidyverse
  • Basic Data Types: tibble and vector

1.3.2 Introduction to ggplot2

  • The Grammar of Graphics concept
  • Creating basic plots: scatter plots, line graphs, histograms, boxplots
  • Customizing plots with aesthetics, themes, and labels
  • Saving and exporting plots

1.3.3 Core dplyr Verbs for Data Manipulation

  • Using the pipe operator (|>) for streamlined code
  • filter(): subsetting rows based on conditions
  • select(): choosing specific columns
  • mutate(): creating and transforming variables
  • group_by() and summarize(): aggregating and summarizing data

1.3.4 Advanced ggplot2 Techniques

  • Design principles based on Fundamentals of Data Visualization by Claus O. Wilke
  • Annotation with ggrepel and gghighlight for labeling and highlighting key data
  • Enhanced text formatting with ggtext
  • Multi-panel plots with patchwork
  • Interactive plots with ggiraph
  • Animated plots with gganimate to visualize changes over time
  • Handling overplotting with ggpointdensity and ggdensity for dense scatter plots
  • Uncertainty visualization with ggdist and ggridgesmaterial
  • Applying beautiful themes
  • How to adjust figures for presentations

1.3.5 Importing and Cleaning Data

  • Importing data with readr
  • Cleaning data with janitor
  • Handling missing data
  • Handling dates with lubridate
  • Joining data with left_join()

1.3.6 Making R Run Faster

  • Faster data wrangling with dtplyr for speed improvements
  • Functional programming with purrr
  • Parallel computing with furrr to utilize multiple cores for concurrent execution

1.3.7 Introduction to Quarto

  • Setting up Quarto and integrating it with RStudio
  • Document structure and YAML metadata
  • Blocks and chunk options: controlling code execution and output
  • Reference management with bibtex
  • Generating outputs: rendering documents to HTML, PDF, and Word formats

1.3.8 Best Practices for Reproducible Research

  • Version control with Git and GitHub using RStudio
  • Using Quarto for reproducible reports: integrating code, analysis, and narrative text
  • creating reproducible environments and managing dependencies with renv

1.3.9 How to Use AI in Coding

  • Introduction to GitHub Copilot

1.4 What We Won’t Cover

There’s a vast universe of computing out there, and we have to draw the line somewhere (before we all get lost in a black hole of code). There’s a reason why programmers in the Silicon Valley are paid so well—computing is hard! Here are some intriguing areas we won’t cover in this course, but I encourage you to explore them in the future:

  • Advanced Statistics and Machine Learning: These areas are extensive and deserve dedicated courses. To dive deeper into the world of algorithms and data models, I’d recommend starting with Tidymodels, which shares the same design philosophy as the tidyverse and should make for a smooth transition.

  • Writing R Packages: Creating your own R packages is super useful, but might be overkill for our current adventure. If you’re interested in trying this, check out Hadley Wickham’s book R Packages as a starting point.

  • High-Performance Computing and Big Data Analytics: Working with massive datasets and supercomputers is exciting (who doesn’t want to feel like they’re in a NASA control room?), but it often requires specialized knowledge and environment-specific setups. Plus, ecological datasets aren’t usually large enough to make your laptop break a sweat. We’ll stick to tools that run on a standard machine. If you’re eager to unleash the Kraken of computing power, UCLA’s Hoffman2 Cluster has you covered with detailed guidelines.

  • Building Personal Websites: Maintaining a professional personal website is important for academics—it’s like your digital business card. Quarto makes it quite easy to set one up, but we won’t cover the details in this course. If you’re intrested, I’d recommend exploring the Quarto documentation on making websites.

  • Other Programming Languages: Python, Julia, and friends are like the enticing dessert menu after a big meal—tempting but perhaps best saved for another time. As mentioned earlier, transitioning from R to Julia is relatively straightforward, so you can always explore other languages when you’re ready to expand your coding palate.