Mergeability is a powerful property of whylogs profiles. When multiple profiles are uploaded for the same date (or same hour for hourly models), WhyLabs automatically combines these profiles into one.
This means that multiple fragments of your dataset can be profiled asynchronously and these profiles will be automatically aggregated to a single profile which is equivalent to a profile you would generate if you profiled the entire dataset in one operation.
It should be noted that whylogs profiles are designed such that this merging process is compatible with subprofiles of different row counts (we are not simply computing a straight mean of descriptive statistics when merging).
The mergeability property allows for a variety of benefits for users with different use cases.
Mergeability allows for easy profiling of data which lives in distributed pipelines.
In this example, each data partition can be profiled independently to produce a holistic view of your dataset within WhyLabs.
In the case of multi-modal models, users may have two distinct datasets which feed into their models. These datasets can be profiles independently and tracked as a single dataset within WhyLabs. In the example below, an image dataset is supplemented with tabular metadata which have their profiles merged within a single WhyLabs model.
Since mergeability is built into whylogs/WhyLabs, users must follow some best practices when uploading profiles to whylogs. Most importantly, a particular dataset should only be profiled once per day for daily models and once per hour for hourly models.
If multiple profiles are uploaded twice within one day, these profiles will always be merged. One profile will never overwrite another. This can result in changes to the value counts, but won’t result in false alerts since the distribution shape, null value fraction, etc. remain unchanged.