Benchmarks: WhyLabs Secure
Benchmarks
This benchmark shows the result of the WhyLabs Secure solution on various datasets.
Last Updated: May 30, 2024
Accuracy Benchmark Results
Metric | Accuracy | F1 | Precision | Recall | Dataset |
---|---|---|---|---|---|
Injections | 0.87 | 0.87 | tensor_trust (positive-only) | ||
Injections | 0.95 | 0.95 | JailBreakV-28k (positive-only) | ||
Injections | 0.73 | PurpleLlama - FRR (negative-only) | |||
Refusals | 0.95 | 0.82 | 0.90 | 0.75 | chatgpt refusals |
Sentiment | 0.7 | 0.74 | 0.65 | 0.86 | imdb_sentiment |
Toxicity (default model) | 0.77 | 0.76 | 0.78 | 0.74 | hsol |
Toxicity (detoxify) | 0.82 | 0.82 | 0.83 | 0.82 | hsol |
Datasets Information
tensor_trust
- Size: 361 samples
- Source: https://github.com/HumanCompatibleAI/tensor-trust-data (positive samples)
JailBreakV-28k
- Size: 1232 samples
- Source: JailbreakV-28K/JailBreakV-28k · Datasets at Hugging Face (positive samples)
PurpleLlama - FRR
- Size: 750 samples
- Source: PurpleLlama/CybersecurityBenchmarks at main · meta-llama/PurpleLlama
- Out-of-distribution: No part of this dataset was used for training purposes.
chatgpt_refusals
- Size: 2346 samples (346 positives, 2000 negatives)
- Source:
- Positive samples: https://github.com/maxwellreuter/chatgpt-refusals
- Negative samples: https://huggingface.co/datasets/alespalla/chatbot_instruction_prompts (train split)
imdb_sentiment
- Size: 5000 samples (2506 positive sentiment, 2494 negative sentiment)
- Source: https://huggingface.co/datasets/imdb
hsol
- Size: 5000 samples (2500 positives, 2500 negatives)
- Source: https://paperswithcode.com/dataset/hate-speech-and-offensive-language (train split)
Latency Benchmark Results
Latency benchmark
Notes:
- The latency is measured in seconds and milliseconds on an AWS c5.xlarge instance
- The P90, P95, and P99 columns represent the 90th, 95th, and 99th percentiles of the latency distribution.
- The latency is measured for a single request.
- The latency may vary depending on the load of the system and the network conditions.
This table shows the average latency of different metrics.
Certain metrics (e.g. Regexes) have a latency of 0 in milliseconds because they are not computationally intensive.
Metric | Average (milliseconds) | P90 | P95 | P99 |
---|---|---|---|---|
prompt.toxicity.toxicity_score | 102.158 | 0.136 | 0.143 | 0.15601 |
prompt.similarity.jailbreak | 64.494 | 0.125 | 0.144 | 0.161 |
response.toxicity.toxicity_score | 53.009 | 0.069 | 0.07 | 0.07601 |
prompt.topics.legal|prompt.topics.fishing|promp... | 38.675 | 0.047 | 0.048 | 0.054 |
prompt.topics.misuse1|prompt.topics.misuse2|promp... | 38.675 | 0.047 | 0.048 | 0.054 |
prompt.similarity.injection | 25.978 | 0.033 | 0.036 | 0.042 |
response.pii.phone_number|response.pii.email_ad... | 25.687 | 0.0281 | 0.044 | 0.064 |
prompt.pii.phone_number|prompt.pii.email_addres... | 23.449 | 0.024 | 0.024 | 0.02601 |
prompt.sentiment.sentiment_score | 2.648 | 0.003 | 0.003 | 0.04205 |
response.sentiment.sentiment_score | 1.032 | 0.001 | 0.001 | 0.002 |
response.regex.ssn | 0.04 | 0 | 0 | 0 |
response.stats.token_count | 0.032 | 0 | 0 | 0 |
prompt.stats.token_count | 0.002 | 0 | 0 | 0 |
response.similarity.refusal | 0.001 | 0 | 0 | 0 |
prompt.stats.flesch_reading_ease | 0.001 | 0 | 0 | 0 |
response.stats.syllable_count | 0 | 0 | 0 | 0 |
prompt.regex.credit_card_number | 0 | 0 | 0 | 0 |
prompt.regex.phone_number | 0 | 0 | 0 | 0 |
prompt.regex.ssn | 0 | 0 | 0 | 0 |
prompt.stats.difficult_words | 0 | 0 | 0 | 0 |
prompt.stats.letter_count | 0 | 0 | 0 | 0 |
prompt.stats.lexicon_count | 0 | 0 | 0 | 0 |
prompt.stats.char_count | 0 | 0 | 0 | 0 |
response.stats.char_count | 0 | 0 | 0 | 0 |
response.stats.flesch_reading_ease | 0 | 0 | 0 | 0 |
response.stats.sentence_count | 0 | 0 | 0 | 0 |
response.stats.flesch_kincaid_grade | 0 | 0 | 0 | 0 |
response.regex.mailing_address | 0 | 0 | 0 | 0 |
prompt.regex.email_address | 0 | 0 | 0 | 0 |
response.stats.letter_count | 0 | 0 | 0 | 0 |
response.stats.difficult_words | 0 | 0 | 0 | 0 |
prompt.stats.syllable_count | 0 | 0 | 0 | 0 |
response.regex.phone_number | 0 | 0 | 0 | 0 |
response.regex.email_address | 0 | 0 | 0 | 0 |
response.regex.credit_card_number | 0 | 0 | 0 | 0 |
prompt.stats.flesch_kincaid_grade | 0 | 0 | 0 | 0 |
response.similarity.prompt | 0 | 0 | 0 | 0 |
prompt.stats.sentence_count | 0 | 0 | 0 | 0 |
prompt.regex.mailing_address | 0 | 0 | 0 | 0 |
response.stats.lexicon_count | 0 | 0 | 0 | 0 |