Media Summary: We discuss our new paper, "Natural emergent misalignment from This talk was recorded at NDC Security in Oslo, Norway. ... When AI Games the System: The Truth About Reward Hacking
When Ai Chooses Praise Over Truth Learned Reward Model Hacking Ai Red Teaming - Detailed Analysis & Overview
We discuss our new paper, "Natural emergent misalignment from This talk was recorded at NDC Security in Oslo, Norway. ... When AI Games the System: The Truth About Reward Hacking Welcome. In these hour-long streams, I teach you what I've