Bad Actor Ruleset
Bad Actor is designed to prevent bad actors from interacting with your LLM applications through the detection of jailbreak and injection attacks.
The following yaml code can be added to your policy to enable the Bad Actor ruleset.
- ruleset: score.bad_actors
options:
behavior: observe
sensitivity: medium
This ruleset adds the equivalent of the following metric section to your yaml policy and uses those metrics to compute an overall score.
metrics:
- metric: prompt.similarity.jailbreak
- metric: prompt.similarity.injection