Grader API
GraderInterface protocol, BaseGrader, TaskGrader, and SubprocessGrader.
CORAL's grading system has three layers: a protocol, an abstract base class,
and the recommended task grader. A separate SubprocessGrader is the
runtime CORAL spawns when your task uses grader.entrypoint.
GraderInterface
Module: coral.grader.protocol
The protocol that all graders must satisfy:
from typing import Protocol, runtime_checkable
from coral.types import ScoreBundle, Task
@runtime_checkable
class GraderInterface(Protocol):
async def grade(
self,
codebase_path: str,
tasks: list[Task],
**kwargs,
) -> ScoreBundle: ...Any object with a matching grade() method satisfies the protocol.
BaseGrader
Module: coral.grader.base
Abstract base class with helper methods. Use this if you need full control over scoring.
from coral.grader.base import BaseGrader
from coral.types import Task, ScoreBundle
class MyGrader(BaseGrader):
async def grade(self, codebase_path: str, tasks: list[Task]) -> ScoreBundle:
score = self._make_score(0.85, explanation="Good result")
return self._make_bundle(score, aggregated=0.85)Constructor
BaseGrader(name: str, description: str = "", is_public: bool = True, **kwargs)Methods
| Method | Description |
|---|---|
grade(codebase_path, tasks) | Abstract — implement this |
grade_sync(codebase_path, tasks) | Synchronous wrapper for grade() |
_make_score(value, explanation, metadata) | Create a Score with this grader's name |
_make_bundle(score, aggregated) | Create a ScoreBundle with this grader's settings |
TaskGrader (recommended)
Module: coral.grader.task_grader
The standard way to write graders. Simpler API with built-in helpers.
from coral.grader import TaskGrader
from coral.types import ScoreBundle
class Grader(TaskGrader):
def evaluate(self) -> float | ScoreBundle:
result = self.run_program("solution.py")
return float(result.stdout.strip())Methods
| Method | Description |
|---|---|
evaluate() | Abstract — implement this. Return float or ScoreBundle |
run_program(filename, *args, timeout=300) | Run a file from the agent's codebase |
read_eval(relative_path) | Read a file from eval/ (legacy private directory) |
read_eval_path(relative_path) | Get absolute path to an eval file (legacy) |
score(value, explanation) | Return a single-score bundle |
fail(explanation) | Return a failed evaluation (null score) |
bundle(value, explanation) | Create a ScoreBundle directly |
Attributes
| Attribute | Type | Description |
|---|---|---|
codebase_path | str | Absolute path to agent's worktree (set by framework) |
private_dir | str | Absolute path to .coral/private/ (set by framework) |
args | dict | Extra arguments from grader.args in config |
Return types
evaluate() can return:
float— automatically wrapped in a ScoreBundleScoreBundle— returned as-is (useself.score()orself.fail())
GraderConfig fields
Configured under grader: in task.yaml.
| Field | Type | Default | Description |
|---|---|---|---|
entrypoint | str | "" | Required. Setuptools-style module.path:ClassName resolved inside the grader venv. Empty falls back to the deprecated eval/grader.py discovery. |
setup | list[str] | [] | Shell commands run once in the grader venv at coral start / coral validate time. Typically a single uv pip install -e ./grader. |
timeout | int | 300 | Eval timeout in seconds (0 = no limit). |
args | dict | {} | Free-form kwargs accessible inside the grader as self.args. |
private | list[str] | [] | Extra files / directories copied to .coral/private/ (hidden from agents). |
direction | str | "maximize" | "maximize" or "minimize" — which direction makes a score better. |
The legacy grader.type and grader.module fields have been removed. See
the migration section
of the Writing a Custom Grader guide.
SubprocessGrader (runtime)
Module: coral.grader.subprocess_grader
You don't usually construct this directly — coral.grader.loader.load_grader
returns one whenever grader.entrypoint is set. It implements
GraderInterface by spawning a worker subprocess in
.coral/private/grader_venv/bin/python and exchanging JSON over
stdin/stdout.
Errors raised inside the worker are returned as
{"error": ..., "traceback": ...} and re-raised in the parent process so
you see the original Python traceback.
