source boundary
frontier-owneddeclared
A source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
frontiers / frontier
Source record
back to sourcessource boundary
frontier-ownedA source record is provenance. It supports a finding only through evidence atoms, extraction spans, and reviewed finding bundles.
finding bindings
record contextFindings bound to this source through source ids, evidence atoms, provenance, or reviewed source-record slots.
evidence atoms
materializedEvidence atoms pin exact spans, measurements, selectors, or curation assertions to the source.
review context
inspectable1 reviewable changes and 0 evaluations are attached through this source or its findings.
Locator and citation
external sourcelocator
title:Open Problems in RLHF (Casper et al., 2023); Reward Hacking empirical study (2024)
imported
2026-05-29T02:53:36.923652+00:00
extraction mode
manual_curation
authors
reviewer:will-blair
Caveats
No source-specific caveats are recorded.
Reward hacking in reinforcement learning from human feedback (RLHF) systems shows that models optimize formal reward specifications rather than intended values, especially under misspecified objectives.
events
vev_589eaf4db7683caafinding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_8290e375a8ed6612finding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation rows are attached.