task
Frozen task
The question, source set, baseline, and scoring rule are fixed before the run is interpreted.
frontiers / frontier
task
The question, source set, baseline, and scoring rule are fixed before the run is interpreted.
attempt
The agent or reviewer, capability, system, input material, declared output material, environment, and failures stay attached to the result.
evaluation
The evaluation record pins the target, outcome, score, evidence refs, evaluator, and timestamp.
review
Review decides what the result means for the frontier. The score is evidence, not the event.
No benchmark runs yet
This frontier has no frozen-task evaluations on record. Benchmarks are frontier CI: a fixed task set scored against a baseline, signed when possible, and replayable.
Evaluation records (ver_*) live under the frontier's .vela/evaluations/.