evidence boundary
unknowntheoretical
An evidence atom is an inspectable support unit. It is not a finding by itself; it supports or challenges a finding through review.
frontiers / frontier
Evidence atom
back to sourcesevidence boundary
unknownAn evidence atom is an inspectable support unit. It is not a finding by itself; it supports or challenges a finding through review.
finding binding
boundReward hacking in reinforcement learning from human feedback (RLHF) systems shows that models optimize formal reward specifications rather than intended values, especially under misspecified objectives.
inspect finding →
source binding
source-boundvs_8f73b3eac7b38303
inspect source →
review context
unverified1 reviewable changes and 0 evaluation records target this atom or its bound objects.
Evidence statement
Reward hacking in reinforcement learning from human feedback (RLHF) systems shows that models optimize formal reward specifications rather than intended values, especially under misspecified objectives.
extraction method
manual_curation
support relation
unknown
condition refs
vcnd_dd45db1f3eedd911
Caveats
events
vev_589eaf4db7683caafinding.assertedManual finding added to frontier state
reviewer:will-blair · 2026-05-29
reviewable changes
vpr_8290e375a8ed6612finding.addManual finding added to frontier state
applied · reviewer:will-blair · 2026-05-29
evaluations
No evaluation rows are attached.