Platform Overview
The following is a high level overview of the WhyLabs platform's capabilities. Please visit other articles in this section for a more detailed walkthrough of each of the platform’s features.
Project Dashboard
The Project Dashboard is the jump off point for many of the features available in the WhyLabs platform. The Project Dashboard home screen serves as a centralized location offering observability into all of your models and datasets within a custom date range.
This view contains:
- Global and resource-specific anomaly summary by day
- Resource type (Model or Dataset)
- Resource subtype (For example: Regression, Classification or Large language models; Source, Stream or Transform datasets)
- Global distribution of anomalies by type (data quality, drift, performance, etc.)
- Profile lineage for each resource
Overall Summary
The Overall Summary provides an at a glance aggregated view of the health of your project resources. It has separate tabs for your models and your datasets.
The datasets summary tab contains:
- Total dataset count with a breakdown by subtype
- Monitoring coverage across all datasets
- Total anomaly count across all datasets within the time range, with a breakdown by category
- Total record count across all datasets within the time range
- A table summary of all datasets, including their anomaly distribution, volume, lineage and freshness
The models summary tab contains:
- Total model count with a breakdown by subtype
- Monitoring coverage across all models
- Total anomaly count across all models within the time range, with a breakdown by category including performance
- Total inference count across all models within the time range
- A table summary of all models, including their anomaly distribution, inference volume, and lineage
Resource Summary
When clicking on a resource from the Project Dashboard, the "Summary" tab shows users various metrics specific to that resource for profiles within the selected date range.
For a Dataset, the summary cards include:
- Profile count and date range
- Monitoring coverage - which categories of monitoring are covered or not
- Integration health - whether profiles have been uploaded recently
- Columns Health - summary of changes in data volume and anomaly volume
- Segments - count of segments and changes in their anomaly volume
For a Model, the summary cards include:
- Profile count and date range
- Monitoring coverage - which categories of monitoring are covered or not
- Integration health - whether profiles have been uploaded recently
- Input & Output Health - summary of changes in data volume and anomaly volume
- Model Performance - summary of model performance metrics
- Segments - count of segments and changes in their anomaly volume
- Explainability information
Profiles
The "Profiles" tab allows users to compare individual profiles which belong to a specific resource. Users can:
- Compare multiple uploaded profiles directly
- Compare distributions from multiple profiles directly for any column
- Compare descriptive statistics for specific profiles
- Compare most frequent items from two profiles
By clicking on the "Insights" button, users can see a list of observations about the selected profiles which may help uncover unexpected conditions in the data.
See Profiles for more guidance on working with profiles.
Inputs, Outputs and Columns
The "Inputs" and "Outputs" tabs for Models and the "Columns" tab for Datasets provide a view of the anomalies by column through the selected time range for various monitored metrics. These metrics include:
- Inferred data type
- Total count
- Null fraction
- Distribution distance
- Estimated unique values
- Discreteness
- Data type count
From this view, users can click on an individual feature or column for a fine grained view of monitored metrics for that feature. From here, users can view:
- Distribution distance and drift
- Individual statistics for continuous features (mean, median, min, max)
- Distribution of most frequent values for discrete features
- Missing value count and ratio
- Estimated unique values count and ratio
- Inferred data type
The "Inputs" and "Outputs" tabs are displayed for all models. Inputs represent the data provided to the model or other columns useful in profiling (e.g., sensitive attributes for monitoring fairness, derived statistics about the input data such as toxicity or sentiment in large language models). Outputs represent inferences or other columns generated by the model, as well as statistics derived from the outputs.
Outputs are initially set to be any column with 'output' in the column name - this can be changed using the WhyLabs entity schema API.
A single "Columns" tab is displayed for datasets other than data transforms. "Inputs" and "Outputs" tabs are shown for data transforms, with inputs representing the data to be transformed and outputs representing the transformed data.
Drift Comparison
Drift is an important early indicator of possible model performance problems. On the "Inputs" tab, users can visualize the drift of the model inputs and compare it with the specific baseline or baselines against which it is being monitored.
Segments
When uploading profiles with whylogs, users can define segments they wish to slice their data on. This is reflected in the segments section.
The "Segments" tab contains all of the individual segments defined by users when uploading profiles.
Users can click on one of these segments to view the details tabs (e.g. "Inputs", "Performance") filtered to data within the segment.
See Segmenting Data for more details about segments.
Monitor Manager
The Monitor Manager tab allows users to customize their monitors for a particular resource. This includes:
- Choosing a monitor type
- Targeting specific features/segments
- Setting analysis type & thresholds
- Setting a baseline
- Configuring actions
Users can also choose a monitor from a variety of presets.
See the Monitor Manager Overview for more details.
Performance
The "Performance" tab contains a summary of performance metrics. Note that this view is only available for Model resources. To visualize the performance data, users must set a model subtype of "Regression" or "Classification" and upload performance metrics via whylogs.
Classification
- Total output and input count
- Accuracy
- ROC
- Precision-Recall chart
- Confusion Matrix
- Recall
- FPR (false positive rate)
- Precision
- F1
Regression
- Total output and input count
- Mean Squared Error
- Mean Absolute Error
- Root Mean Squared Error
See the Performance section for more information about working with performance metrics.
Tracing
The "Tracing" tab is available for Model resources with segmented data. Tracing lets users discover which segments within the data contribute negatively or positively towards model performance.
See the Performance Tracing section for more details.
Explainability
The "Explainability" tab is available for Model resources. It lets users view feature importance for a model's inputs, and compare them with other models.
See Explainability section for more details.
Explainability data can be uploaded using the Feature Weights API.
Anomalies Feed
The Anomalies Feed allows users to see a centralized feed of all anomalies for a given resource. It is located on the Monitor Manager tab. This view includes the anomaly timestamp, anomaly type, column, and anomaly description.
For more details, see the Anomalies section.
Organizations
An Organization is the highest level entity within the WhyLabs platform. An organization houses any number of WhyLogs models and contains any number of users. A model can only belong to one organization, but users can potentially be added to multiple organizations.
Upon creating a free account in WhyLabs, an organization will be created and your user will be added to that organization. Users belonging to multiple organizations can switch between organizations using the organization dropdown in the Model Dashboard
Settings
The settings section can be accessed from the hamburger button in the top left corner. From here, you can manage API tokens, models, notifications, and users. The settings section also contains a tool to assist with the process of setting up a new integration.
Access Token Management
Access to the WhyLabs API is controlled via Access Tokens. Uploading data and interacting with our platform via direct API calls requires a valid token. These tokens are managed by each organization's administrator.
Admins can create tokens and optionally set an expiration date for these tokens. Admins also have the ability to revoke existing tokens.
Resource Management
Whether you have one dataset in need of monitoring or a few hundred, WhyLabs makes it easy to add and begin monitoring new resources with just a few clicks.
Users are also able to rename resources from here or (in the case of Models) change the model type (regression, classification, unknown) by clicking "Edit Settings".
Model vs Dataset Types
In WhyLabs you can choose either a model or dataset type, there are a few primary differences between the two:
- Models refer to columns as input or output features, datasets refer to them all as columns.
- Models include performance metrics, performance tracing, and explainability. Datasets do not include these tabs.
- Model monitoring includes presets for performance monitoring. Datasets do not have these.
Notifications
WhyLabs Platform allows receiving regular updates about the state of your data via one of the supported messaging integrations (e.g. Slack, email, etc). These notifications include a summary of the data quality anomalies, and allow you to keep tabs on your data health metrics without having to manually check in on them in the Platform.
See Notifications and actions for more detail on managing notifications.
User Management
You define who gets access to your organization's data on WhyLabs. The platform makes it easy to add and remove users, enabling you to have full control over which team members can observe and monitor your data and ML model health metrics.
Role-Based Access Controls (RBAC)
From the User Management page, Enterprise customers can attached permission based roles to users added to their organization. The platform supports the following user roles:
- Admin: can manage all aspect of the platform's functionality including creation of API tokens and user management
- Member: read-only access with the ability to create and manage monitors
- Viewer: read-only access
See Role-based Access Control (RBAC) for more details.
Integration examples
From the main menu, users can access Integration Examples. This page contains tools to instantly generate code for several example integrations specific to models and datasets in your organization.
Other
Send feedback / Support Center
Users can submit support requests from directly within the WhyLabs Platform.
Privacy policy
Users can access the WhyLabs Privacy Policy from directly within the WhyLabs Platform.
Documentation
Users can access the documentation you’re reading now directly from the WhyLabs Platform 🙂