Skip to main content
Version: v3.1 print this page

Guardrails

Guardrails are regulatory entities that can be used to disallow certain undesirable responses from being returned to the user from AI services. They can be attached to agents, knowledge bases, and chats to filter out undesirable responses.

With Guardrails, you can:

  • Validate model responses against security rules
  • Reject responses that contain sensitive or harmful content
  • Mask sensitive content such as credentials in the model responses
  • Control the tone and content of AI interactions
  • Enforce thresholds on metrics such as grounding and relevance to tailor model responses to your requirements

Guardrail Intro

Guardrail Operations

Guardrails can be found under the AI Space of Amorphic. Upon landing on the guardrails page, users will be able to view both system-defined and user-created guardrails.

Furthermore, Amorphic Guardrails support the following operations:

Guardrail Navigation

Create Guardrail

  1. Go to Explore > AI Space > Guardrails
  2. Click Add New Guardrail
  3. Fill in the following information under the "Define Your Guardrail" section:
AttributeDescription
Guardrail NameChoose a descriptive name (e.g., "Politics-Guardrail").
DescriptionA brief explanation of what the guardrail does (e.g., "Guardrail to prevent expression of political opinions").
ScopeSelect either Global (accessible across all users in your organization) or Private (for personal use only).
Message for blocked promptsEnter a message to display when the guardrail blocks a user prompt.
Enable cross-region inference for your guardrailToggle this option on or off.
Content Filter TierSelect a tier, such as Classic or Standard, which provides different levels of accuracy and language coverage.
  1. Configure additional filters from the left-hand navigation:
Filter TypeDescription
Content FiltersToggle specific harmful categories (e.g., Hate, Sexual, Violence, Insults, Misconduct, Prompt_attack) and set a strength level (None, Low, Medium, High). Higher strength is stricter.
Denied TopicsAdd specific topics that the agent should avoid engaging with. For each topic, provide a name, a definition, and optional sample phrases.
Add Word Filters
  • Enable a profanity filter.
  • Add a custom list of blocked words.
Add Sensitive Information Filters
  • Add filters to block specific types of PII (e.g., MAC Address, AWS Access Key) and choose an action (e.g., Block).
  • Add custom Regex Patterns with a name, pattern, and action (e.g., Block, Anonymize).
Add Contextual Grounding ChecksConfigure grounding and relevance thresholds to ensure the model's response is factually grounded and relevant to the user's query.
  1. Click Create Guardrail

Guardrail Creation

info
  • Guardrail creation is instantaneous.
  • The filtering criteria can be defined using regular expressions, keywords, or predefined system filters to govern content.
  • Guardrails can be attached to Agents, Chats, and Knowledge Bases.
  • In case a guardrail is not specified for one of these supported AI resources, the SYSTEM-Regulated guardrail will be applied to them.

View Guardrail

To view a guardrail's details, click on its name from the guardrails listing page. The Guardrail Details page provides a comprehensive look at all its configurations.

This page is divided into two main tabs: Overview and Resources Attached.

Overview This section displays all the core settings of the guardrail. You can see:

SettingDescription
VersionThe current version of the guardrail (e.g., DRAFT).
TierThe content filter tier selected during creation (e.g., CLASSIC).
ScopeWhether the guardrail is Global or Private.
Blocked MessageThe custom message that will be displayed to the user when their prompt is blocked.
Content FiltersA list of harmful categories (Hate, Sexual, Violence, Insults, Misconduct, Prompt_attack) and their configured strength levels (e.g., LOW, MEDIUM, NONE).
Denied TopicsLists specific topics that are disallowed, along with their definitions and sample phrases.
Blocked WordsShows the custom list of words that are blocked (e.g., "Communism", "Marxism").
Sensitive Information FiltersDisplays any configured PII types (e.g., MAC_ADDRESS) and custom regex patterns.

Guardrail View

Resources Attached This tab shows which AI resources (Agents, Knowledge Bases) are currently using this guardrail.

On the top right of the details page, you can also use the Test Guardrail button to test the guardrail's functionality with a sample input. Additionally, the Activity Logs panel on the right provides a history of recent actions, such as when the guardrail was created, modified.

Test Guardrail

Under the guardrail's view page, The Test Guardrail feature lets you verify if a guardrail is blocking undesirable responses as intended. You can use it to input sample text and confirm whether the guardrail successfully filters or blocks the content according to its configuration. This is a crucial step for validating a guardrail's functionality before beginning active usage.

Guardrail Test

Edit Guardrail

Users can edit a guardrail to update its existing metadata and filtering criteria. To do so, navigate to the Guardrail Details page and click on the "edit" option (three dots in the top corner).

Here's what you can edit:

Editable SettingDescription
DescriptionChange the brief explanation of what the guardrail does.
Message for blocked promptsUpdate the custom message that is displayed when the guardrail blocks a user prompt.
Cross-Region InferenceToggle the option to enable or disable cross-region inference for the guardrail.
Content FiltersModify the strength levels (e.g., Low, Medium, High) for harmful categories like Hate, Sexual, and Violence.
Denied TopicsAdd new topics, edit existing definitions and sample phrases, or remove topics.
Word FiltersAdd or remove words from the custom list of blocked words.
Sensitive Information FiltersAdd new filters for PII types or custom regex patterns, and edit or remove existing ones.
Contextual Grounding ChecksAdjust the grounding and relevance thresholds.

What you cannot edit:

Non-Editable SettingReason
Guardrail NameThe name is permanent once the guardrail is created.
ScopeYou cannot change a guardrail from Global to Private or vice-versa.
Content Filter TierThe tier (Classic or Standard) cannot be changed. If you need a different tier, you must create a new guardrail.

System Guardrails

Amorphic provides a set of pre-baked guardrails that are managed by the system. These guardrails are read-only and cannot be modified or deleted by users. They are designed to provide a baseline for security and governance and are available to all users by default.

Currently, there are three System Guardrails that are readily available out of the box, in descending order of strictness:

  1. SYSTEM-Regulated: Strict enterprise security guard rail with maximum protection for sensitive data and content, blocking all flagged information.
  2. SYSTEM-Standard: Balanced enterprise security guard rail with strong protection and reasonable utility, blocking all flagged information.
  3. SYSTEM-Exploratory: Adaptive enterprise security guard rail with essential protection, allowing more flexibility, blocking all flagged information.
Important

System Guardrails are system-defined and cannot be modified or deleted.

Example Use Cases

Politics Guardrail

Create a guardrail to prevent an AI application from expressing political opinions:

  • Guardrail Name: Politics-Guardrail
  • Description: Guardrail to prevent expression of political opinions
  • Denied Topics:
    • Topic Name: Politics
    • Topic Definition: Any political discussion not about objective facts
    • Sample Phrases: "Express your views on the communist party of China"
  • Blocked Words: "Communism", "Marxism"
  • Blocked Message: "Sorry, the model cannot express political opinions"