Data Privacy
Data privacy and security are top priority at WhyLabs. Our principles-based approach aims to go beyond the traditional approach of monitoring and apply privacy preserving techniques at the point of data collection. We understand your concerns when you entrust us with your data, and we always strive to embrace your expectations and preferences.
This document provides detailed information about the privacy and security measures we take to protect you and your customers' data privacy. Our monitoring tools are data-agnostic; they don't require sensitive materials, and many of them don't require any personal data.
Ultimately, ensuring data privacy is a shared responsibility. The user is responsible for ensuring that their systems are appropriately set up and configured so that the systems don't send inappropriate personal data or sensitive materials to WhyLabs monitoring tools.
Privacy by design and by default
WhyLabs follows "privacy by design" principles as part of our overarching security program. Integration starts with the whylogs library, the open source library for data logging. The whylogs library emits profile objects that contain summary statistics about customers’ data. These summary statistics are designed to only provide aggregated information about the whole dataset or datastream; they don’t contain individual records.
Depending on the data types, whylogs captures the following statistics per feature:
- Simple counters: boolean, null values, data types.
- Summary statistics: sum, min, max, median, variance.
- Unique value counter or cardinality
- Histograms for numerical features
- Top frequent items (default is 128).
Data privacy: what you can do
By design, all statistics gathered by whylogs are configurable by the user. The resulting whylogs profiles do not contain sensitive information and can not be manipulated to reconstruct original data. For an additional layer of privacy, if the user runs whylogs on highly sensitive data, additional privacy options are available:
- Tokenize/encrypt feature names and categorical feature values before passing to whylogs
- During ingestion, WhyLabs can block fields based on customer-specified block list
All whylogs stats are auditable. The library provides utilities for decoding and visualizing data collected by whylogs to enable customer audit processes.
If users need guidance around how to further secure their data, please reach out to your account representative or the WhyLabs Community Slack channel for guidance.