WhyLabs Secure Policy

Overview

WhyLabs Secure Policy is the central place where you define the rules and actions that apply to your LLM applications. These constraints are stored in a policy document in YAML or JSON format. The policy is centrally managed and versioned in WhyLabs, and is used in the WhyLabs Guardrails deployment to enforce the rules and actions.

Guardrail Score

A Guardrail score is a normalized score from 0 to 100 that indicates how likely that an LLM interaction (typically prompt/response pair) might be a risky request for a certain type of behavior.

For example, a score of 1 means WhyLabs is quite certain the interaction is safe , while a score of 100 means WhyLabs is quite certain the interaction is a high risk one.

0: this score is applied when such metrics weren't computed
1-100: this is a range of scores that customers can use to control the sensitivity of a given behavior.

Scores are available to be used in rule expressions in the policy document to customize the actions.

Rule

A rule identifies a Langkit metric from which a normalized Guardrail score is calculated; a threshold which is applied to the Guardrail score, and one or more actions (callbacks) to apply if the threshold is exceeded.

Customers can create rules based on LangKit metrics and use them in the policy document to customize the actions and callbacks.

Actions

WhyLabs provides the following actions:

Observe: WhyLabs will capture the metrics and the traces even if the interaction is not risky.
Flag: WhyLabs will capture the metrics and the traces and flag the interaction as risky.
Block: WhyLabs will block the interaction and also capture the metrics and the traces.

Callbacks

Callbacks are custom actions that can be triggered when a rule is met. Callbacks can be used to trigger custom actions in your application, such as sending an alert or calling a webhook.

We have the following built-in callbacks:

Webhook: a JSON message is sent to a webhook URL.
Amazon SQS: a message is sent to an Amazon SQS queue.

Rulesets

Rulesets are built-in sets of rules managed by WhyLabs. Each ruleset defines a fixed set of LangKit metrics that are used to compute the set of normalized Guardrail scores. An overall ruleset score for the prompt and another for the response is then computed based on the maximum of these normalized scores.

A ruleset has a sensitivity (low, medium or high) which maps to a suitable threshold value determined by WhyLabs for the overall ruleset score.

More information on each ruleset can be found on their respective pages of this documentation:

Overview

Guardrail Score​

Rule​

Actions​

Callbacks​

Rulesets​

Guardrail Score

Rule

Actions

Callbacks

Rulesets