The Architecture Beneath the Algorithm: Why RL Systems Need Structural Constraints, Not Just Better Rewards
Reward hacking is not an algorithm failure. It is an architecture failure. The system did exactly what I specified — it just found a path I did not anticipate because I had not specified the state space, action boundaries and verification layer precisely enough. The fix was not a new reward function. It was a better architecture.