Skip to main content

FAQ

Q: I have backdated profiles, but I’m only seeing profiles from the last 7 days.

Profiles with dataset timestamps from the last 7 days will appear in the platform almost immediately. Profiles with timestamps older than this can take up to 24 hours to appear in the platform.



Q: My most recent profile looks like it has some anomalies, but no alerts were generated yet.

Alerts are generated at a regular cadence. It can take up to 24 hours for alerts to be generated for recently uploaded profiles. Users can utilize the Ad-Hoc Monitor feature to preview alerts before the monitor runs.



Q: I’m seeing more rows in WhyLabs than I had in my dataset.

This is usually a result of logging the same dataset multiple times. WhyLabs is designed to merge any profiles uploaded for the same day (or hour for models with hourly batch frequency). An uploaded profile will never overwrite the previous one.

If you encounter this issue, note that it will not have a significant impact on the alerts generated for that day since the shape of the distribution, descriptive statistics, ratios of missing values, etc. remain unchanged. If you wish to remove duplicate profiles, users can create a new model and backfill the model accordingly.



Q: What data does WhyLabs collect for monitoring purposes?

WhyLabs does not collect raw data from customers. We utilize an open source library called whylogs to generate statistical profiles of the raw data. All of the raw data processing happens on the customer's side. The generated statistical profiles for each monitored feature (for example, min, max, distribution, etc) are then sent to WhyLabs and we monitor those statistics for drift.

These profiles generally do not contain personally identifiable or proprietary information in any meaningful form. Some of the statistics we collect (such as Top-K Frequent Items) may result in the collection of data from sensitive features. For example, collecting Top-K on a user_email feature would lead us to store the most frequent email addresses appearing within this feature. Customers are free to disable collection of Top-K or any other statistics on sensitive features or exclude such features from monitoring entirely. Some customers may opt to utilize methods such as one way hashing or encryption to protect sensitive data instead, in order to preserve the ability to monitor these features for drift. A similar concern may arise with regards to names of the features within a model. If any of the feature names are considered sensitive for a model, this can be addressed in the same way as above.



Q: I keep getting blocked when attempting to upload profiles to WhyLabs

In some cases, the following endpoint may need to be whitelisted:

songbird-20201223060057342600000001.s3.us-west-2.amazonaws.com



Q: I’ve uploaded performance metrics, but I’m not seeing them in the performance tab.

Performance metrics can only be tracked if the model type is set to “Regression” or “Classification”. If your model type is currently set to “Unknown”, this can be updated from the model management tile in the settings section.



Q: My alerts are too noisy. How can I get a better signal-to-noise ratio?

There are several ways to fine tune your monitor settings. If you are using a static profile as your model baseline, be sure that this profile is actually representative of data you expect coming into your model.

Users can also upload profiles for several different slices of their dataset with various degrees of noise with the same dataset timestamp. These profiles will be merged and users can then point to this merged profile as their static reference profile.

Some monitors are based on the number of standard deviations a metric is from its mean. Within the monitor settings, users can set this number of standard deviations up to 2.0 for less sensitive alerts. Users can also override monitor settings with manual thresholds at the feature level when inspecting a particular feature within the inputs/outputs view. In the case of using trailing windows as a baseline, users are encouraged to experiment with different windows to find the optimal setting.

Each monitor type (distribution, missing values, etc.) can be toggled on and off. Users can disable monitors which they do not wish to track.



Q: How does WhyLabs measure drift?

WhyLabs uses Hellinger distance as the default metric for distributional distance because the metric is symmetric, handles missing values, and is easy to interpret. KL divergence, a popular metric, is less suitable for distance calculations because it lacks symmetry and become difficult to calculate when missing values. Submit a request if you would like to you an alternative metric for drift measurement.

Prefooter Illustration Mobile
Run AI With Certainty
Get started for free
Prefooter Illustration