Safety Layers Are Architecture: Separating Learned Behaviour from Hard Constraints

The RL policy suggests an action. The safety layer decides whether that action is permitted. The boundary between them is an architecture decision that must be specified before training begins. How you draw that line determines whether the system is deployable or just impressive in simulation.


The hardest part of deploying a learned system into a physical environment is not the accuracy. It is the failure modes. A conventional embedded system fails in ways determined by the code and the hardware. You can enumerate them, test against them, and know what the system will do in each case.

A learned system fails in ways determined by the training distribution and the learned policy. You cannot enumerate all of them. Deploying a learned system without a safety layer that operates independently of the policy is not an engineering decision. It is a gamble.

The Boundary Between Policy and Safety

The policy and the safety layer are separate modules with a well-defined interface between them. The policy outputs a proposed action. The safety layer evaluates that action against hard constraints and either permits it, modifies it, or rejects it. The policy does not know which happened. It receives the next observation and continues.

This separation matters for two reasons. First, safety properties do not depend on the correctness of the learned policy. They depend on a much smaller, much more auditable piece of logic. Second, the policy can be retrained or updated without touching the safety layer. The safety layer is stable. The policy evolves.

// the structural principle

The safety layer should be simple enough to verify exhaustively and stable enough to remain unchanged across policy updates. If your safety layer requires retraining when your policy changes, it is not a safety layer. It is part of the policy.

What Belongs in the Safety Layer

Hard limits belong in the safety layer. Physical boundaries. Force envelopes. Velocity limits. Position constraints that represent the edge of the safe operating envelope. These are properties the system must always satisfy regardless of what the policy has decided.

Soft constraints do not belong in the safety layer. Efficiency preferences. Smoothness objectives. Secondary goals the system should pursue but that are not safety-critical. These belong in the reward function. Mixing hard and soft constraints in the same layer makes the safety layer harder to reason about and harder to verify.

Implementation on Edge Hardware

On the systems I have deployed, the safety layer runs as a separate firmware module with a well-defined interface to the inference engine. The inference engine produces an action vector. The safety module checks that action vector against its constraint definitions before any output reaches the actuator.

The safety module is written in deterministic code with no dynamic memory allocation. It has a fixed execution time budget. It can be unit tested exhaustively. It does not change when the model is updated. A safety layer that shares codebase with the inference engine is not separate in any meaningful way.

// safety layer architecture: gripper system
Interface in
Action vector from policy (joint velocities, grip force target)
Checks applied
Force envelope, velocity limits, workspace boundaries, collision primitives
Interface out
Permitted action (possibly clipped) or null action with flag
Execution
Deterministic, fixed latency, no dynamic allocation

When Safety Layers Are Specified

Safety layers specified before training starts constrain the action space in ways the policy learns to work within. The policy discovers that certain action regions are never permitted and stops exploring them. The constraint becomes part of the learned behaviour.

Safety layers added after training, when the system is already misbehaving, are patches. They work. But they introduce a gap between what the policy expects and what the system actually does. That gap is a source of unexpected behaviour at the edges of the constraint space. Specify the safety layer before the first training run. The earlier the boundary is defined, the more everything downstream can be built to respect it.

If your system is breaking in ways that are hard to trace, it is almost always an architecture problem. I can tell you exactly where.

Book the free review