Task Examples

The examples/ directory contains ready-to-run task configurations spanning optimization, ML, and math.

Circle Packing

Goal: Pack 26 circles into a unit square to maximize the sum of radii.

task:
  name: "Circle Packing"
  description: |
    Pack N = 26 circles into a unit square (side length 1)
    to maximize the sum of radii.

grader:
  timeout: 600
  direction: maximize

agents:
  count: 2
  model: claude-sonnet-4-6

What agents work on: Implementing optimization algorithms (simulated annealing, basin-hopping, gradient descent) to find the densest packing.

Score: sum_radii / best_known_result — 1.0 means matching the best known configuration.

coral start -c examples/circle_packing/task.yaml

Erdos

Goal: Explore mathematical conjectures related to Erdos problems.

What agents work on: Generating counterexamples or proofs for combinatorial conjectures.

This example uses the new entrypoint-based grader layout — the grader lives at examples/erdos/grader/ as a standalone Python package and task.yaml points at it via:

grader:
  entrypoint: "erdos_grader.grader:Grader"
  setup:
    - "uv pip install -e ./grader"

coral start -c examples/erdos/task.yaml

coral start will create .coral/private/grader_venv/, install the grader package into it, and spawn worker subprocesses from there during evaluation.

Kernel Builder

Goal: Optimize VLIW SIMD kernels for minimum cycle count.

What agents work on: Assembly-level optimization, instruction scheduling, register allocation.

coral start -c examples/kernel_builder/task.yaml

Kernel Engineering

Goal: Optimize GPU kernel implementations for maximum throughput.

Score: 1000 / runtime_us — lower kernel runtime = higher score.

coral start -c examples/kernel_engineering/trimul/task.yaml

MNIST

Goal: Build an ML classifier for handwritten digits.

What agents work on: Neural network architecture, training procedures, data augmentation.

coral start -c examples/mnist/task.yaml

Spaceship Titanic

Goal: Kaggle-style classification competition.

Score: Classification accuracy (0.50 = naive, 0.80+ = strong).

coral start -c examples/spaceship_titanic/task.yaml

Stanford COVID Vaccine

Goal: Predict mRNA degradation rates.

What agents work on: Feature engineering, model selection, sequence analysis.

coral start -c examples/stanford_covid_vaccine/task.yaml

Three examples use LLM rubric judges instead of programmatic graders. They share the same architecture — a grader package spawns a Claude Code judge agent that scores the worker's output against a weighted list of PASS/FAIL criteria. See Rubric Judges for the full pattern.

Japan Elderly Market Analysis 2050 (RACE)

Goal: Write a market-size analysis report for Japan's elderly demographic, covering population projections, consumption patterns, and consumer willingness through 2050.

Grader: race_japan_grader — static 25-criterion rubric, Claude Code judge, fact-checks against a reference article bundled inside the grader package.

Four yamls exercise different experimental conditions:

Config	Rubric visible to agent?	Feedback level
`task.yaml`	Yes (baked into `task.description`)	`full` — per-criterion verdict + rationale
`task_baseline.yaml`	No — hidden under `grader.args.rubrics`	`score_only`
`task_aggregate_only.yaml`	Yes	`aggregate_only` — score + N/25 passed
`task_agent_judge.yaml`	No — dynamic rubric via `apex_judge`	`full`

coral start -c examples/race-japan-elderly/task.yaml

Eggshell Skull Legal Memorandum (APEX)

Goal: Draft a legal memo analyzing whether the eggshell skull rule applies to a wrongful-death complaint. Source materials ship as .docx / .pdf and are read via Archipelago MCP servers.

Grader: apex_judge — dynamic rubric. The judge auto-generates its own PASS/FAIL criteria on the first eval and evolves them when scores plateau.

export ARCHIPELAGO_PATH=/path/to/archipelago
python examples/apex-eggshell-skull/download_apex_data.py
coral start -c examples/apex-eggshell-skull/task.yaml

Frontier BU Forecast (APEX)

Goal: Quantitative forecasting — estimate the size of Frontier's business units in 2035, assuming constant growth rates and market shares.

Grader: reuses the same apex_judge package from eggshell-skull via grader.setup: ["uv pip install -e ../apex-eggshell-skull/grader"], so two tasks share one judge implementation.

coral start -c examples/apex-frontier-bu/task.yaml

Creating your own

To create a new task from scratch:

coral init my-task

This scaffolds the directory structure. See Quick Start and Writing a Custom Grader for the full walkthrough.