CORALCORAL
API Reference

Grader API

GraderInterface protocol, BaseGrader, TaskGrader, and SubprocessGrader.

CORAL's grading system has three layers: a protocol, an abstract base class, and the recommended task grader. A separate SubprocessGrader is the runtime CORAL spawns when your task uses grader.entrypoint.

GraderInterface

Module: coral.grader.protocol

The protocol that all graders must satisfy:

from typing import Protocol, runtime_checkable
from coral.types import ScoreBundle, Task

@runtime_checkable
class GraderInterface(Protocol):
    async def grade(
        self,
        codebase_path: str,
        tasks: list[Task],
        **kwargs,
    ) -> ScoreBundle: ...

Any object with a matching grade() method satisfies the protocol.

BaseGrader

Module: coral.grader.base

Abstract base class with helper methods. Use this if you need full control over scoring.

from coral.grader.base import BaseGrader
from coral.types import Task, ScoreBundle

class MyGrader(BaseGrader):
    async def grade(self, codebase_path: str, tasks: list[Task]) -> ScoreBundle:
        score = self._make_score(0.85, explanation="Good result")
        return self._make_bundle(score, aggregated=0.85)

Constructor

BaseGrader(name: str, description: str = "", is_public: bool = True, **kwargs)

Methods

MethodDescription
grade(codebase_path, tasks)Abstract — implement this
grade_sync(codebase_path, tasks)Synchronous wrapper for grade()
_make_score(value, explanation, metadata)Create a Score with this grader's name
_make_bundle(score, aggregated)Create a ScoreBundle with this grader's settings

Module: coral.grader.task_grader

The standard way to write graders. Simpler API with built-in helpers.

from coral.grader import TaskGrader
from coral.types import ScoreBundle

class Grader(TaskGrader):
    def evaluate(self) -> float | ScoreBundle:
        result = self.run_program("solution.py")
        return float(result.stdout.strip())

Methods

MethodDescription
evaluate()Abstract — implement this. Return float or ScoreBundle
run_program(filename, *args, timeout=300)Run a file from the agent's codebase
read_eval(relative_path)Read a file from eval/ (legacy private directory)
read_eval_path(relative_path)Get absolute path to an eval file (legacy)
score(value, explanation)Return a single-score bundle
fail(explanation)Return a failed evaluation (null score)
bundle(value, explanation)Create a ScoreBundle directly

Attributes

AttributeTypeDescription
codebase_pathstrAbsolute path to agent's worktree (set by framework)
private_dirstrAbsolute path to .coral/private/ (set by framework)
argsdictExtra arguments from grader.args in config

Return types

evaluate() can return:

  • float — automatically wrapped in a ScoreBundle
  • ScoreBundle — returned as-is (use self.score() or self.fail())

GraderConfig fields

Configured under grader: in task.yaml.

FieldTypeDefaultDescription
entrypointstr""Required. Setuptools-style module.path:ClassName resolved inside the grader venv. Empty falls back to the deprecated eval/grader.py discovery.
setuplist[str][]Shell commands run once in the grader venv at coral start / coral validate time. Typically a single uv pip install -e ./grader.
timeoutint300Eval timeout in seconds (0 = no limit).
argsdict{}Free-form kwargs accessible inside the grader as self.args.
privatelist[str][]Extra files / directories copied to .coral/private/ (hidden from agents).
directionstr"maximize""maximize" or "minimize" — which direction makes a score better.

The legacy grader.type and grader.module fields have been removed. See the migration section of the Writing a Custom Grader guide.

SubprocessGrader (runtime)

Module: coral.grader.subprocess_grader

You don't usually construct this directly — coral.grader.loader.load_grader returns one whenever grader.entrypoint is set. It implements GraderInterface by spawning a worker subprocess in .coral/private/grader_venv/bin/python and exchanging JSON over stdin/stdout.

Errors raised inside the worker are returned as {"error": ..., "traceback": ...} and re-raised in the parent process so you see the original Python traceback.