27  Agentic AI and Autonomous Workflows

Class Objectives
  1. Understand the Agentic AI paradigm and how iterative reasoning loops differ from standard chatbot queries.
  2. Master the planning and execution loop using structured artifacts (implementation_plan.md, task.md, walkthrough.md).
  3. Learn to coordinate subagent delegation and leverage modular agent skills.
  4. Implement automated verification via unit tests (using the testthat package in R) to ensure agent stability and correctness.
  5. Practice writing test assertions to create a self-correcting development loop.

27.1 The Agentic AI Paradigm

In the early days of generative AI, interactions were mostly zero-shot or chat-based. You asked a question, and the model returned an answer. If the code was buggy or incomplete, you had to copy the error, paste it back, and ask for a fix. This is a manual, human-driven loop.

Agentic AI represents a paradigm shift. Instead of acting as a passive responder, an AI agent is given a goal, a set of tools, and the autonomy to act, observe, and refine its approach iteratively.

27.1.1 Reasoning Loops vs. Simple Chat

To understand the difference, let’s compare a standard chatbot query with an agentic workflow:

Feature Simple Chatbot (e.g., standard ChatGPT) Agentic AI (e.g., Gemini Agent)
Interaction Single-turn or conversational Q&A Iterative, self-directed loop
Action Capability Outputs text/code (requires user to run it) Interacts with the filesystem, runs commands, edits files
Error Handling Relies on the user to report errors Runs code, reads logs, and fixes its own bugs
Context Scope Limited to the immediate chat history Grounded in the entire project workspace
Workflow Linear (Ask \(\rightarrow\) Answer) Loop (Plan \(\rightarrow\) Tool Use \(\rightarrow\) Observe \(\rightarrow\) Reflect)

Here is how the agentic reasoning loop operates:

graph TD
    Goal[Define Goal] --> Plan[Formulate Plan]
    Plan --> Execute[Execute Actions: Edit Files, Run Commands]
    Execute --> Observe[Observe Results: Outputs, Tests, Logs]
    Observe --> Reflect{Did it succeed?}
    Reflect -- No, errors found --> Fix[Diagnose & Refine Plan]
    Fix --> Execute
    Reflect -- Yes, verified --> Finish[Complete Task & Summarize]

This self-correcting cycle makes agents highly effective at complex programming tasks, such as refactoring large codebases or building scientific analysis pipelines.


27.2 The Planning & Execution Loop: Structured Artifacts

When executing complex tasks, AI agents use structured documents called artifacts to maintain state, organize thoughts, and coordinate with the user. Using artifacts ensures that the agent’s work is transparent, reproducible, and verifiable.

There are three primary artifacts used during an agentic session:

stateDiagram-v2
    [*] --> implementation_plan.md : Initial Planning
    implementation_plan.md --> task.md : Tracking Tasks & State
    task.md --> walkthrough.md : Documenting Results & Diffs
    walkthrough.md --> [*]

27.2.1 1. implementation_plan.md

Before writing code, the agent creates an Implementation Plan. This plan outlines: * User Requirements: What needs to be done. * Proposed Changes: Which files will be created, modified, or deleted. * Verification Plan: How the changes will be tested (e.g., specific test commands, manual verification steps).

This plan is reviewed by the user (or caller agent) before any execution begins, ensuring alignment.

27.2.2 2. task.md

As the agent executes the plan, it uses a Task Log to track progress. It includes: * A checklist of subtasks. * Current status of each task (Pending, In Progress, Completed). * Notes on any roadblocks or plan adjustments.

27.2.3 3. walkthrough.md

Once the tasks are complete, the agent generates a Walkthrough. This file acts as a pull request summary: * Changes Made: A clean summary of edits, including code diffs. * Verification Logs: Output of tests or build commands proving the solution works. * Usage Examples: Quick snippets showing how to use the new features.


27.3 Subagents and Agent Skills

To tackle large-scale projects, AI systems don’t rely on a single monolithic model. Instead, they employ a modular architecture of subagents and skills.

27.3.1 Subagent Delegation

A coordinator agent (the “main agent”) receives the high-level user request. If the request involves multiple distinct roles (e.g., fetching protein data, performing molecular rendering, and editing course notes), the main agent delegates these subtasks to specialized subagents.

For instance: * Course Developer Subagent: Specialized in writing Quarto books and slides. * Bioinformatics Subagent: Specialized in running sequence alignment and database lookups.

This delegation keeps the prompt context clean and ensures the most specialized prompts and tools are used for each subtask.

27.3.2 Agent Skills

Skills are modular, self-contained directories of instructions, scripts, and resources that extend the agent’s capabilities. In our environment, skills are structured with a SKILL.md file (which details instructions and requirements) and helper scripts. Examples of available skills in our environment include: * uniprot-database: For fetching protein metadata. * pymol: For molecular structure visualization. * literature-search-europepmc: For searching and downloading open-access papers. * nature-figure: For generating publication-ready plots.

When an agent identifies a task that matches a skill, it reads the skill’s markdown guide and uses the designated tools to execute the task flawlessly.


27.4 Automated Verification via Unit Tests

In agentic workflows, automated verification is the safety net. Because agents write and execute code autonomously, it is easy for them to inadvertently introduce regressions or break edge cases.

By writing robust unit tests, we establish an automated feedback loop: 1. The agent writes or modifies code. 2. The agent runs the test suite. 3. If a test fails, the agent reads the test error and corrects the code. 4. The cycle repeats until all tests pass.

27.4.1 Unit Testing in R with testthat

In R, the standard framework for writing tests is the testthat package. It uses a human-readable syntax to declare expectations.

Let’s look at an example. Imagine we are writing a function to calculate the Shannon Diversity Index (\(H'\)) of an ecological community:

\[H' = -\sum_{i=1}^S p_i \ln p_i\]

where \(p_i\) is the proportion of individuals belonging to species \(i\).

Here is our function code:

# File: R/diversity.R

calculate_shannon <- function(abundances) {
  # Input validation
  if (!is.numeric(abundances)) {
    stop("Abundances must be numeric")
  }
  if (any(abundances < 0, na.rm = TRUE)) {
    stop("Abundances cannot be negative")
  }
  
  # Remove zeros and NAs (zeros don't contribute to diversity)
  abundances <- abundances[!is.na(abundances) & abundances > 0]
  
  if (length(abundances) == 0) {
    return(0)
  }
  
  # Calculate proportions
  p <- abundances / sum(abundances)
  
  # Calculate Shannon Index
  return(-sum(p * log(p)))
}

Now, let’s write the corresponding unit tests using testthat:

# File: tests/testthat/test-diversity.R
library(testthat)

test_that("calculate_shannon computes diversity correctly", {
  # 1. Test uniform distribution (Shannon index should be log(S))
  expect_equal(calculate_shannon(c(10, 10)), log(2))
  expect_equal(calculate_shannon(c(5, 5, 5)), log(3))
  
  # 2. Test single species (Shannon index should be 0)
  expect_equal(calculate_shannon(c(100)), 0)
  
  # 3. Test handling of zeros (zeros should be ignored)
  expect_equal(calculate_shannon(c(10, 10, 0)), log(2))
})

test_that("calculate_shannon handles edge cases and errors", {
  # 1. Test negative input (should throw an error)
  expect_error(calculate_shannon(c(10, -5)))
  
  # 2. Test non-numeric input (should throw an error)
  expect_error(calculate_shannon(c("ten", "twenty")))
  
  # 3. Test empty or all-zero vectors
  expect_equal(calculate_shannon(c(0, 0)), 0)
  expect_equal(calculate_shannon(numeric(0)), 0)
})

27.4.2 Running the Tests

To run these tests locally or inside an agentic workflow: * In RStudio: Press Cmd + Shift + T (macOS) or Ctrl + Shift + T (Windows). * Via Console: Run devtools::test() to run the whole test suite, or testthat::test_file("tests/testthat/test-diversity.R") for a specific file.


27.5 Interactive Exercise

Now, let’s practice writing functions and verification tests in R!

We want to write a function calculate_simpson to compute Simpson’s Diversity Index (\(D\)), which represents the probability that two individuals randomly selected from a sample belong to the same species:

\[D = \sum_{i=1}^S p_i^2\]

where \(p_i = \frac{n_i}{N}\). The index ranges from 0 (infinite diversity) to 1 (no diversity).

Exercise: Simpson’s Diversity & Testing

Write the body of calculate_simpson to calculate Simpson’s Index. Ensure that: 1. It validates that the input is numeric and non-negative. 2. It removes zeroes and NAs. 3. It returns 0 if the vector is empty.

Then, complete the test_that block to verify your function’s behavior.

Here is one way to implement the function and the test cases:


27.6 Summary

  1. Agentic AI moves beyond chat interfaces to execute self-correcting loops of planning, action, observation, and reflection.
  2. Structured Artifacts (implementation_plan.md, task.md, walkthrough.md) document the reasoning path and ensure transparency.
  3. Subagents and Skills allow modular delegation of specialized tasks to custom-configured agents and toolkits.
  4. Automated Verification (like testthat in R) is essential to provide direct feedback to the agent, ensuring that changes do not break existing code.