LLM Guardrails (Layer 4)
Overview
Layer 4 of the risk architecture ensures LLM agents are advisory-only and cannot access execution interfaces. It is implemented across two files:
src/agents/guardrails.py— runtime output validation and self-testsrc/risk/llm_guardrails.py— risk-layer engine used byAutonomyValidator
What Guardrails Prevent
Agents must never:
- Import or reference
BrokerAdapterat module level - Produce output containing broker method names (
place_order,submit_order,cancel_order,execute_trade, etc.) - Return executable instructions rather than advisory data
Runtime Enforcement (src/agents/guardrails.py)
The @guardrail Decorator
Every BaseAgent.run() method is decorated with @guardrail:
@guardrail
async def run(self, context: AgentContext) -> AgentResult:
...
After run() returns, validate_output(result) scans str(result.output) for any of the restricted patterns. If a pattern is found, GuardrailViolationError is raised immediately — the result is never returned to the caller.
Restricted Patterns
_BROKER_ACCESS_PATTERNS = [
"BrokerAdapter", "place_order", "submit_order",
"cancel_order", "get_portfolio", "execute_trade",
]
LLMGuardrailsChecker.verify()
Runs a self-test:
- Passes a result containing
"BrokerAdapter.place_order()"— must raiseGuardrailViolationError - Passes a clean result — must pass through
- Stores current timestamp in Redis:
risk:guardrails:last_verified
Returns True if both checks pass.
Risk-Layer Engine (src/risk/llm_guardrails.py)
LLMGuardrailsEngine is used by the risk module (not the agents module) to verify guardrail integrity as a precondition for FULL_AUTONOMOUS mode.
engine = LLMGuardrailsEngine()
engine.verify_guardrails() # synchronous; checks module imports + stores timestamp
engine.is_recently_verified() # True if verified within last 24h
engine.get_status() # dict for risk dashboard
verify_guardrails() checks that no loaded src.agents.* module exposes a BrokerAdapter instance or class in its namespace (module-level import check).
Integration with Autonomy Validator
AutonomyValidator._validate_bounded_to_full() calls _guardrails_recently_verified() which reads the Redis key risk:guardrails:last_verified. If the key is missing or older than 24 hours, the FULL_AUTONOMOUS transition is blocked:
LLM guardrails have not been verified recently — run guardrail self-test first
To clear this blocker: POST /api/risk/guardrails/verify (triggers LLMGuardrailsChecker.verify()).
Design Decisions
- Why two separate files?
src/agents/guardrails.pyowns runtime enforcement (decorators, output validation).src/risk/llm_guardrails.pyowns the risk-layer view (module import checks, dashboard status). The risk module has no hard dependency on the agents package at import time. - Why not a sandbox? The codebase is a monolith — full OS-level sandboxing is disproportionate. The guardrail approach is proportionate: enforce the invariant that agents import nothing from
src/execution/at module level, and validate outputs at runtime. - Fail-safe direction: if the module-level import check errors (e.g., module not loaded yet), it returns
True(safe). The output pattern check has no such fallback — a violation always raises.