Introducing whylogs v1

whylogs v1 Launch Overview

We are excited to announce a huge step forward for our open source whylogs library: the release of whylogs v1. Since the first release of whylogs v0, we’ve been diligently collecting feedback from our users and are proud to respond with a new and improved whylogs version with the usability and usefulness our users have asked for. No actions are required for existing WhyLabs users. However, users can opt in to migrate to v1 to take advantage of the performance improvements and simplified API in the new whylogs version.

whylogs v1 is the default whylogs version as of May 31st, 2022. Users can install whylogs v1 using the following:

pip install whylogs

Alternatively, users can start experimenting with whylogs v1 right away by visiting this Google Colab Notebook.

While this release represents an exciting milestone for the whyogs library, there are some important implications for current users of whylogs v0. Most notably, whylogs v1 comes with a simplified and more intuitive API. This means that if you choose to upgrade to v1, code changes will be required. Furthermore, the changes described in this document will only be relevant for users of the Python implementation of whylogs (Python and PySpark). These changes will be reflected in a later version of the Java implementation.

Users can visit this migration guide to assist with migrating their code to whylogs v1.

Note that WhyLabs will continue to support profiles uploaded via whylogs v0 following the release of whylogs v1. Users are not required to upgrade immediately. If users have automatic library upgrades in place, they are recommended to disable these automatic upgrades to allow for a smooth transition to whylogs v1 by making the necessary code changes beforehand.

Please stay tuned for further updates regarding the whylogs v1 release as well as for resources to assist with transitioning to whylogs v1 and taking full advantage of its powerful improvements. Please feel free to reach out with any questions in the meantime, and be sure to check out a summary of the whylogs v1 improvements below.

What’s New With whylogs v1

Based on user feedback, the following 5 areas of improvement have been the focus for whylogs v1.

Performance Improvements

Users will see substantial improvements to performance, allowing larger datasets to be profiled in much less time.

API Simplification

Previously, profiling a dataset involved initializing a session, creating a logger within that session, and calling a logging function. With whylogs v1, this process is simplified and made more intuitive.

Profile Constraints

With whylogs v1, users can generate custom constraints on their dataset. For example, users can define a constraint requiring credit scores to be between 300 and 850. As a more advanced example, users can define a constraint which requires a particular feature to be JSON parseable.

Profile Visualizer

With the profile visualizer, users can generate interactive reports about their profiles (either a single profile or comparing profiles against each other) directly in a Jupyter notebook environment. This enables exploratory data analysis, data drift detection, and data observability.

Usability Refresh

A top priority of the whylogs v1 project has been maximizing usability to ensure that users can get up and running with whylogs v1 quickly and easily. This usability refresh will include an updated GitHub readme, automatically generated documentation of whylogs v1 code, and an updated suite of examples and resources.

The table below highlights these improvements with a comparison between whylogs v0 and v1.

whylogs v0	whylogs v1
Logging large datasets was a timely process due to operations taking place on the row level.	By introducing new columnar operations in the logging process, users will see substantial improvements to performance when generating profiles.
When profiling datasets, users needed to initialize a session, then initialize a logger, and then invoke logging methods.	Users can log profiles with a single “log” method.
Users can utilize GitHub actions to check for simple data constraints.	Data constraints can be implemented in whylogs directly and can include more advanced use cases.
Users can visualize individual profiles using a browser based visualization tool with limited interactivity.	Users can visualize multiple profiles in an interactive notebook based visualization tool. This enables exploratory data analysis, data drift detection, and data observability.
whylogs documentation maintenance was mostly manual.	whylogs documentation generation is fully automated. Requirements are built into the deployment pipeline to ensure that all new code is properly documented.
The whylogs core library contained numerous dependencies.	The whylogs v1 core library contains a far more lightweight set of dependencies which means fewer points of failure and conflicts, faster installs and updates, and a smaller memory footprint.
Config (yaml) files are used at various points throughout the whylogs v0 project.	Functions in whylogs v1 will rely on user provided parameters rather than config files.

Following v1 Launch

The following outlines key points and events following the launch of whylogs v1

whylogs v0 Support

WhyLabs will continue to accept profiles uploaded via whylogs v0 following the v1 release.
whylogs v0 will be put into maintenance mode and will continue to receive bug fixes, but new features and performance improvements will be available in v1 only.
Users installing or upgrading whylogs without version constraints will install a subversion of whylogs v1 by default. If existing whylogs users have automatic package upgrades in place, they are advised to limit upgrades to whylogs v0 subversions until they have made the necessary code changes for compatibility with whylogs v1. whylogs package upgrades can be limited to v0 sub-versions with the following pip command:

pip install --upgrade "whylogs<1.0"

Resources

This page will be iterated upon with additional detail.
Automatically generated whylogs API documentation will be available.
WhyLabs Platform Documentation will be updated to include code snippets from both v0 and v1.
Example notebooks using whylogs v1 will be available in the GitHub repository.
A migration guide will be provided to assist users with migrating from whylogs v0 to v1.
Video tutorials will be provided to walk users through whylogs v1 functionality.

whylogs v1 Preview

The following code example demonstrates several basic operations using whylogs v1.

Read a dataset into pandas DataFrame and generate a profile

import pandas as pd
import whylogs

#read in a dataset
df = pd.read_csv('path/to/data.csv')

#log a dataset
results = whylogs.log(pandas=df)

#grab the profile generated above
profile = results.profile()

Read in a 2nd dataset and merge it with the first profile.

#read in a new dataset
df_new = pd.read_csv('path/to/more/data.csv')

#profile a 2nd dataset and merge the result with the first
profile.track(pandas=df_new)

#view profile as dataframe
prof_view = profile.view()
prof_df = prof_view.to_pandas()
prof_df

Write/Read profiles to disk as binary files

#write profile to disk
whylogs.write(profile,"profile.bin")

#read a profile from binary file on disk
n_prof = whylogs.read("profile.bin")

whylogs v1 Launch Overview​

What’s New With whylogs v1​

Following v1 Launch​

whylogs v0 Support​

Resources​

whylogs v1 Preview​