Grader API

CORAL's grading system has three layers: a protocol, an abstract base class, and the recommended task grader. A separate SubprocessGrader is the runtime CORAL spawns when your task uses grader.entrypoint.

GraderInterface

Module: coral.grader.protocol

The protocol that all graders must satisfy:

from typing import Protocol, runtime_checkable
from coral.types import ScoreBundle, Task

@runtime_checkable
class GraderInterface(Protocol):
    async def grade(
        self,
        codebase_path: str,
        tasks: list[Task],
        **kwargs,
    ) -> ScoreBundle: ...

Any object with a matching grade() method satisfies the protocol.

BaseGrader

Module: coral.grader.base

Abstract base class with helper methods. Use this if you need full control over scoring.

from coral.grader.base import BaseGrader
from coral.types import Task, ScoreBundle

class MyGrader(BaseGrader):
    async def grade(self, codebase_path: str, tasks: list[Task]) -> ScoreBundle:
        score = self._make_score(0.85, explanation="Good result")
        return self._make_bundle(score, aggregated=0.85)

Constructor

BaseGrader(name: str, description: str = "", is_public: bool = True, **kwargs)

Methods

Method	Description
`grade(codebase_path, tasks)`	Abstract — implement this
`grade_sync(codebase_path, tasks)`	Synchronous wrapper for `grade()`
`_make_score(value, explanation, metadata)`	Create a `Score` with this grader's name
`_make_bundle(score, aggregated)`	Create a `ScoreBundle` with this grader's settings

TaskGrader (recommended)

Module: coral.grader.task_grader

The standard way to write graders. Simpler API with built-in helpers.

from coral.grader import TaskGrader
from coral.types import ScoreBundle

class Grader(TaskGrader):
    def evaluate(self) -> float | ScoreBundle:
        result = self.run_program("solution.py")
        return float(result.stdout.strip())

Methods

Method	Description
`evaluate()`	Abstract — implement this. Return `float` or `ScoreBundle`
`run_program(filename, *args, timeout=300)`	Run a file from the agent's codebase
`read_eval(relative_path)`	Read a file from `eval/` (legacy private directory)
`read_eval_path(relative_path)`	Get absolute path to an eval file (legacy)
`score(value, explanation)`	Return a single-score bundle
`fail(explanation)`	Return a failed evaluation (null score)
`bundle(value, explanation)`	Create a ScoreBundle directly

Attributes

Attribute	Type	Description
`codebase_path`	`str`	Absolute path to agent's worktree (set by framework)
`private_dir`	`str`	Absolute path to `.coral/private/` (set by framework)
`args`	`dict`	Extra arguments from `grader.args` in config

Return types

evaluate() can return:

float — automatically wrapped in a ScoreBundle
ScoreBundle — returned as-is (use self.score() or self.fail())

GraderConfig fields

Configured under grader: in task.yaml.

Field	Type	Default	Description
`entrypoint`	`str`	`""`	Required. Setuptools-style `module.path:ClassName` resolved inside the grader venv. Empty falls back to the deprecated `eval/grader.py` discovery.
`setup`	`list[str]`	`[]`	Shell commands run once in the grader venv at `coral start` / `coral validate` time. Typically a single `uv pip install -e ./grader`.
`timeout`	`int`	`300`	Eval timeout in seconds (`0` = no limit).
`args`	`dict`	`{}`	Free-form kwargs accessible inside the grader as `self.args`.
`private`	`list[str]`	`[]`	Extra files / directories copied to `.coral/private/` (hidden from agents).
`direction`	`str`	`"maximize"`	`"maximize"` or `"minimize"` — which direction makes a score better.

The legacy grader.type and grader.module fields have been removed. See the migration section of the Writing a Custom Grader guide.

SubprocessGrader (runtime)

Module: coral.grader.subprocess_grader

You don't usually construct this directly — coral.grader.loader.load_grader returns one whenever grader.entrypoint is set. It implements GraderInterface by spawning a worker subprocess in .coral/private/grader_venv/bin/python and exchanging JSON over stdin/stdout.

Errors raised inside the worker are returned as {"error": ..., "traceback": ...} and re-raised in the parent process so you see the original Python traceback.

On this page