CORAL
Guides

Benchmarks

Running CORAL on standard coding benchmarks.

CORAL can be used to run agents on standardized benchmarks. This guide covers setup and execution.

Available benchmark examples

The examples/ directory includes task configurations for several benchmarks:

TaskDescription
circle_packingGeometric optimization — pack circles in a unit square
erdosMath conjecture exploration
kernel_builderVLIW SIMD kernel optimization
kernel_engineeringGPU kernel optimization
mnistML classification
spaceship_titanicKaggle competition (classification)
stanford_covid_vaccinemRNA degradation prediction

Running a benchmark

# Example: circle packing optimization
coral start -c examples/circle_packing/task.yaml

# With more agents
coral start -c examples/circle_packing/task.yaml --agents 4

# With web dashboard
coral start -c examples/circle_packing/task.yaml --ui

Writing benchmark graders

Benchmark graders follow the same TaskGrader pattern. For example, the circle packing grader:

  1. Runs the agent's initial_program.py
  2. Verifies constraints (circles within bounds, no overlaps)
  3. Computes sum_radii / best_known_result as the score

See Writing a Custom Grader for the full guide.