WhyLabs LLM Policy Builder
Guardrail rulesets and metrics
The YAML Policy Builder is a editor that allows you to interactively build a YAML policy for your WhyLabs Secured LLM. A policy can be composed with a combination of rulesets and individual metrics that define the guardrail for your LLM, including metrics, validators, and actions.
Rulesets are collections of metrics that are used to detet and prevent threats in common categories such as bad actors or misuse. Each ruleset generates a normalized score that is used to trigger actions based on the policy, such as flagging or blocking requests. Rulesets can be enabled with a single click from the Policy dashboard in the Secure portion of the WhyLabs Platform. Sensitivity and actions can be customized for each ruleset with a few clicks via the UI.
For documentation on the five WhyLabs Secure rulesets, check this page.
For more complex policies that require greater customization, another option is to compose the policy in YAML then upload it to the Advanced Settings tab in the Policy dashboard. With YAML, it's possible to compose a custom policy with a combination of both rulesets and metrics, validators, and actions. The YAML Policy Builder provides an interactive interface to create and customize the policy with autocompletion and error validation.
Interactive YAML Policy Builder
Get started with our example YAML configs on Github, or check out the example below.
Learn more about Guardrail metrics here.
Example YAML policy configuration
An example config with both a ruleset, metrics and validators might look like this:
id: my_id
policy_version: 1
schema_version: 0.0.1
whylabs_dataset_id: test_bad_actor_rulesets_with_metrics1
rulesets:
- ruleset: score.bad_actors
options:
behavior: observe
sensitivity: medium
metrics:
- metric: prompt.stats.token_count
- metric: prompt.stats.char_count
- metric: response.stats.token_count
- metric: response.sentiment.sentiment_score
validators:
- validator: constraint
options:
target_metric: response.stats.token_count
upper_threshold: 10
- validator: constraint
options:
target_metric: response.sentiment.sentiment_score
upper_threshold: 0.5