Voice agents running in production face a problem that system prompts alone cannot solve. Conversations drift. Users find creative ways around defined instructions. And when an agent says the wrong thing at scale, the consequences are real. ElevenLabs addressed this directly with Guardrails 2.0, a redesigned control layer for ElevenAgents that went into alpha on March 24, 2026.
The update is squarely aimed at enterprise teams trying to move from pilots to production without introducing unacceptable compliance or brand risk.
Three-layer protection architecture
Guardrails 2.0 works across three distinct levels that reinforce each other rather than operating independently.
The first layer is system prompt hardening via the Focus Guardrail, which keeps agents directed and consistent with their defined goals across long conversations where drift is most likely.
The second is user input validation through Manipulation Guardrails. These analyze incoming messages for prompt injection and instruction override patterns, and can terminate the conversation outright when a security threat is detected. The framing here is important: rather than just filtering agent responses, the system catches manipulation attempts before the agent processes them.
The third layer is agent response validation – every reply is evaluated against configured policies in real time before delivery. If a response violates a rule, it is blocked.
Custom guardrails and how they run
Pre-built guardrails cover common risk categories – focus, manipulation, and content sensitivity – each with tunable thresholds. The more significant addition for complex enterprise deployments is Custom Guardrails, which let teams define domain-specific policies in plain language and enforce them automatically across every call.
A lightweight model evaluates each agent response against those rules independently and in parallel with response generation, returning a block or allow decision. The parallel execution matters for voice: even a short delay in response delivery noticeably disrupts conversation flow.
On that point, ElevenLabs built two execution modes. The first runs guardrails alongside the response for near-zero latency, accepting that a fraction of a second of audio may play before interception. The second holds responses until fully cleared – slightly slower, but nothing reaches the user unchecked. Teams can choose based on their risk tolerance and use case.
Exit strategies and visibility
When a guardrail fires, the behavior is configurable. Teams can end the conversation, transfer to a different agent, escalate to a human, or retry with corrective instructions. Every trigger is logged in conversation analytics with details on which guardrail fired and what action was taken – giving teams the data to refine both system prompts and guardrail configs over time.
Conversation history redaction
Alongside the guardrail controls, the update ships conversation history redaction. After a call ends, sensitive information can be automatically stripped from transcripts, recordings, and webhook payloads. Detected entities are replaced with placeholders in text and bleeps in audio, with granular control down to individual entity types. This sits alongside the existing Zero Retention Mode for deployments with stricter compliance requirements. Both features are available to enterprise clients.
Who this update is actually for
Guardrails 2.0 does not change what ElevenAgents can do – it changes whether enterprise teams can deploy it without significant legal and compliance exposure. The combination of pre-built protections, custom policy enforcement, configurable exit strategies, and conversation redaction covers the core requirements that have kept cautious buyers in pilot mode. The alpha availability now lets teams begin testing those configurations against real workloads before general release.




