Safety Guardrails are an optional content-moderation layer you can turn on for an AloAi Voice Agent.
When enabled, they classify and intercept risky content in real time, both on the caller's side (what the caller says) and the agent's side (what the agent is about to say).
Guardrails are off until you set them up, and you choose exactly which categories to apply, so you can enable only the ones that fit your use case.
During each call, the system evaluates the conversation in real time before generating a response or triggering any action.
If a safety violation is detected, the interaction is immediately stopped and the call is ended. This prevents any restricted or unsafe content from being delivered to the user.
Turning on Safety Guardrails
Guardrails are not on by default. Each category is an independent toggle, and nothing is active until you open the panel and turn specific categories on.
Open the agent's Security & Fallback Settings.
Find the Safety Guardrails card and click Set Up.
Turn on the categories you want to enforce. You can enable any combination of input and output categories.
Changes sync automatically and apply to new calls without restarting the agent. Calls already in progress are not affected.
How content is evaluated
The safety system continuously reviews both what the user says and how the agent responds. This ensures that conversations stay within defined safety and compliance boundaries.
To do this, it applies checks across two areas: (1) what the agent is allowed to say and (2) what inputs it can respond to.
Restricted output categories
Output guardrails check what the agent is about to say before it is converted to speech (text-to-speech, or TTS). Each is an independent toggle:
Harassment or abusive language
Self-harm or suicide-related content
Sexual exploitation or explicit content
Violence or threats of harm
Defense or national security-related topics
Illegal or harmful activities
Gambling-related content
Regulated professional advice (e.g., medical or legal guidance)
Child safety or exploitation-related content
Restricted input categories
Input guardrails check what the caller says before the message reaches the LLM (the AI model that generates the agent's responses). There is one input category:
Platform Integrity / Jailbreaking: detects attempts to bypass the agent's instructions, override its behavior, or get around its safety rules through prompt injection or similar manipulation.
When an input is flagged, it is blocked before the model sees it and is excluded from the call's conversation history, so it cannot influence later responses.
What happens when a violation is detected
When a guardrail catches content in an enabled category, the agent does not say the flagged message.
Instead, it replaces that message with a safe placeholder:
"I'm not able to help with that. Is there something else I can assist you with?"
The call continues normally. Guardrails never end the call or transfer it. The agent stays on the line and keeps handling the conversation.
Where triggers are recorded
Each time a guardrail fires, the system logs the event with the category that was triggered, the direction (input or output), and a timestamp.
You can review these entries in the call's history.
Important clarification
These protections are designed to intervene only when necessary. Standard, on-topic conversations are not affected.
Under normal conditions, the agent continues to operate as configured, following its prompts, using assigned tools, and handling conversations without interruption.





