WhyLabs Overview

Reference the 30/3/30 https://docs.google.com/document/d/1-LKbgu7t14B7R_2VDl2fUoYQVXouYemuz5ubLkX2rc8/edit#heading=h.yldh91hcewas

Monitoring your ML models with WhyLabs#

WhyLabs is an AI observability platform that prevents model performance degradation by allowing you to monitor your machine learning models in production. If you deploy an ML model but don’t have visibility into its performance, you risk doing damage to your business because the model stops working and you don’t even know. With WhyLabs, you can prevent this performance degradation by monitoring your model with a platform that’s easy to use, privacy preserving, and cost efficient.

Model performance degradation#

Model performance degradation is caused by two problems: the changing data problem and the bad data problem. The changing data problem occurs when the data that a model is trained on isn’t representative of the data that the model makes inferences against. The bad data problem, by contrast, happens not because there’s a true difference between the data used for training and the data used for inference, but simply because of corruption in the inferencing data.

There are a few ways that we see the changing data problem affecting model performance. One that gets discussed often is “training-serving skew”, where training data is different from the data used during inference right off the bat. This happens when the data used for training is enriched or transformed in ways that it can’t be in production or when circumstances have changed between when data was initially collected and when the model is deployed. By contrast, the other way the “changing data” problem presents itself is via various forms of drift. Even if the training and inferencing data initially match, changes in the real world processes generating the data can render a model irrelevant over time.

The other cause of model performance degradation is the bad data problem. Bad data represents the data equivalent of typos. These typos are often introduced in the data engineering process. Examples of bad data include data with missing values, bugs in the representations of data types, outliers, and duplicates.

How WhyLabs helps#

WhyLabs prevents both the changing data problem and the bad data problem by allowing you to monitor the data being fed to your model and the predictions it makes. WhyLabs takes a novel approach within the ML monitoring space by separating the logging component of monitoring from the analysis and alerting component. This unique approach allows WhyLabs to be easy to use, privacy preserving, and highly scalable in a way that no other platform can be.

The WhyLabs Platform relies on profiles generated by the open source whylogs library. These profiles are statistical summaries of datasets and capture the key information about the distributions of data within those datasets. whylogs profiles are descriptive, lightweight, and mergeable, which makes them the perfect logs for monitoring data health and model health.

After you generate whylogs profiles, you can send them to the WhyLabs Platform, which has analytics and alerting capabilities that are key to monitoring your models in production. With the WhyLabs platform, you can monitor as many models as you need, with no configuration tuning necessary. You can prevent both the changing data problem and the bad data problem by automatically comparing newly generated profiles to their historical baselines, getting alerted if new data diverges from old data. This way, you can be confident that your model is delivering the results that you expect and run AI with certainty.

Prefooter Illustration Mobile
Run AI With Certainty
Get started for free
Prefooter Illustration