Skip to main content

Bad Actor Ruleset

Bad Actor is designed to prevent bad actors from interacting with your LLM applications through the detection of jailbreak and injection attacks.

The following yaml code can be added to your policy to enable the Bad Actor ruleset.

  - ruleset: score.bad_actors
options:
behavior: observe
sensitivity: medium

This ruleset adds the equivalent of the following metric section to your yaml policy and uses those metrics to compute an overall score.

metrics:
- metric: prompt.similarity.jailbreak
- metric: prompt.similarity.injection
Prefooter Illustration Mobile
Run AI With Certainty
Get started for free
Prefooter Illustration