Skip to main content

Benchmarks: WhyLabs Secure

Benchmarks

This benchmark shows the result of the WhyLabs Secure solution on various datasets.

Last Updated: May 30, 2024

Accuracy Benchmark Results

MetricAccuracyF1PrecisionRecallDataset
Injections0.860.720.770.67tensor_trust
Injections0.880.880.880.89jailbreak_bench
Injections0.870.800.8250.78Avg (tensor_trust + jb_bench)
Refusals0.950.820.900.75chatgpt refusals
Sentiment0.70.740.650.86imdb_sentiment
Toxicity (default model)0.770.760.780.74hsol
Toxicity (detoxify)0.820.820.830.82hsol

Datasets Information

tensor_trust

jailbreak_bench

chatgpt_refusals

imdb_sentiment

hsol

Latency Benchmark Results

Latency benchmark

Notes:

  • The latency is measured in seconds and milliseconds on an AWS c5.xlarge instance
  • The P90, P95, and P99 columns represent the 90th, 95th, and 99th percentiles of the latency distribution.
  • The latency is measured for a single request.
  • The latency may vary depending on the load of the system and the network conditions.

This table shows the average latency of different metrics.

Certain metrics (e.g. Regexes) have a latency of 0 in milliseconds because they are not computationally intensive.

MetricAverage (milliseconds)P90P95P99
prompt.toxicity.toxicity_score102.1580.1360.1430.15601
prompt.similarity.jailbreak64.4940.1250.1440.161
response.toxicity.toxicity_score53.0090.0690.070.07601
prompt.topics.legal|prompt.topics.fishing|promp...38.6750.0470.0480.054
prompt.topics.misuse1|prompt.topics.misuse2|promp...38.6750.0470.0480.054
prompt.similarity.injection25.9780.0330.0360.042
response.pii.phone_number|response.pii.email_ad...25.6870.02810.0440.064
prompt.pii.phone_number|prompt.pii.email_addres...23.4490.0240.0240.02601
prompt.sentiment.sentiment_score2.6480.0030.0030.04205
response.sentiment.sentiment_score1.0320.0010.0010.002
response.regex.ssn0.04000
response.stats.token_count0.032000
prompt.stats.token_count0.002000
response.similarity.refusal0.001000
prompt.stats.flesch_reading_ease0.001000
response.stats.syllable_count0000
prompt.regex.credit_card_number0000
prompt.regex.phone_number0000
prompt.regex.ssn0000
prompt.stats.difficult_words0000
prompt.stats.letter_count0000
prompt.stats.lexicon_count0000
prompt.stats.char_count0000
response.stats.char_count0000
response.stats.flesch_reading_ease0000
response.stats.sentence_count0000
response.stats.flesch_kincaid_grade0000
response.regex.mailing_address0000
prompt.regex.email_address0000
response.stats.letter_count0000
response.stats.difficult_words0000
prompt.stats.syllable_count0000
response.regex.phone_number0000
response.regex.email_address0000
response.regex.credit_card_number0000
prompt.stats.flesch_kincaid_grade0000
response.similarity.prompt0000
prompt.stats.sentence_count0000
prompt.regex.mailing_address0000
response.stats.lexicon_count0000
Prefooter Illustration Mobile
Run AI With Certainty
Get started for free
Prefooter Illustration