Q: When inspecting anomalies, I can see the "Analysis is stale" message in the hover. What should I do?
This message means that more data was uploaded to WhyLabs after the monitor run. The analysis is executed on a schedule defined by the monitor configuration and by default it starts as soon as the time window defined by the batch frequency is over (e.g. at midnight UTC for a daily model). In this case some profiles arrived too late to be included in the analysis, so the results are considered stale.
- To refresh the results, you can call the DeleteAnalyzerResults API to clear the outdated results, which should get backfilled during the next monitor run. Please note that the backfill will cover as much data into the past as defined by the backfillGracePeriodDuration parameter.
- To fix this issue in your future analysis runs, make sure to add/update the dataReadinessDuration parameter to account for the delay introduced by your data processing pipelines.
In case of any questions, please file a support ticket and we'll be happy to assist you.
Q: My most recent profile looks like it has some anomalies, but no alerts were generated yet.
Alerts are generated at a regular cadence. It can take up to 24 hours for alerts to be generated for recently uploaded profiles. Users can utilize the Ad-Hoc Monitor feature to preview alerts before the monitor runs.
Q: I’m seeing more rows in WhyLabs than I had in my dataset.
This is usually a result of logging the same dataset multiple times. WhyLabs is designed to merge any profiles uploaded for the same day (or hour for models with hourly batch frequency). An uploaded profile will never overwrite the previous one.
If you encounter this issue, note that it will not have a significant impact on the alerts generated for that day since the shape of the distribution, descriptive statistics, ratios of missing values, etc. remain unchanged. If you wish to remove duplicate profiles, users can create a new model and backfill the model accordingly.
Q: What data does WhyLabs collect for monitoring purposes?
WhyLabs does not collect raw data from customers. We utilize an open source library called whylogs to generate statistical profiles of the raw data. All of the raw data processing happens on the customer's side. The generated statistical profiles for each monitored feature (for example, min, max, distribution, etc) are then sent to WhyLabs and we monitor those statistics for drift.
These profiles generally do not contain personally identifiable or proprietary information in any meaningful form. Some of the statistics we collect (such as Top-K Frequent Items) may result in the collection of data from sensitive features. For example, collecting Top-K on a
user_email feature would lead us to store the most frequent email addresses appearing within this feature. Customers are free to disable collection of Top-K or any other statistics on sensitive features or exclude such features from monitoring entirely. Some customers may opt to utilize methods such as one way hashing or encryption to protect sensitive data instead, in order to preserve the ability to monitor these features for drift.
A similar concern may arise with regards to names of the features within a model. If any of the feature names are considered sensitive for a model, this can be addressed in the same way as above.
Q: I keep getting blocked when attempting to upload profiles to WhyLabs
In some cases, the following endpoint may need to be whitelisted:
Q: I’ve uploaded performance metrics, but I’m not seeing them in the performance tab.
Performance metrics can only be tracked if the model type is set to “Regression” or “Classification”. If your model type is currently set to “Unknown”, this can be updated from the model management tile in the settings section.
Q: My alerts are too noisy. How can I get a better signal-to-noise ratio?
There are several ways to fine tune your monitor settings. If you are using a static profile as your model baseline, be sure that this profile is actually representative of data you expect coming into your model.
Users can also upload profiles for several different slices of their dataset with various degrees of noise with the same dataset timestamp. These profiles will be merged and users can then point to this merged profile as their static reference profile.
Some monitors are based on the number of standard deviations a metric is from its mean. Within the monitor settings, users can set this number of standard deviations up to 2.0 for less sensitive alerts. Users can also override monitor settings with manual thresholds at the feature level when inspecting a particular feature within the inputs/outputs view. In the case of using trailing windows as a baseline, users are encouraged to experiment with different windows to find the optimal setting.
Each monitor type (distribution, missing values, etc.) can be toggled on and off. Users can disable monitors which they do not wish to track.
Q: How does WhyLabs measure drift?
WhyLabs uses Hellinger distance as the default metric for distributional distance because the metric is symmetric, handles missing values, and is easy to interpret. KL divergence, a popular metric, is less suitable for distance calculations because it lacks symmetry and become difficult to calculate when missing values. Submit a request if you would like to you an alternative metric for drift measurement.
Q: I've uploaded some profiles based on incorrect data - how can I overwrite them?
In such a case you will have to delete those profiles and run the profiling again on the corrected data, uploading its profile for the same timestamp as the inital upload. This documentation section describes how to do that.
Q: Can I delete some profiles?
Yes, please refer to this documentation section on details how to do it.