Skip to main content

Image Data

In addition to tabular and textual data, whylogs can generate profiles of image data. whylogs automatically computes a number of metrics relative to image data. These include the following.

  • Brightness (mean, standard deviation)
  • Hue (mean, standard deviation)
  • Saturation (mean, standard deviation)
  • Image Pixel Height & Width
  • Colorspace (e.g. RBG, HSV)

We will demonstrate WhyLabs image logging & monitoring capabilities using sets of images with some anomalies injected into the raw data. Consider the following sets of images.

Image Sets

Set 1- These images appear roughly uniform. We will assume this to be our baseline.

Set 2- These images appear to have had their color channels switched which may result from using a different library or library version to read in images or perhaps an issue introduced into the data pipeline.

Set 3- These images appear to have some that are darker than others, which could result from using different devices for taking photographs or different device settings such as shutter speed.

The following code demonstrates how to log a profile for a folder of images.

from whylogs.app import Session
from whylogs.app.writers import WhyLabsWriter
import os
os.environ["WHYLABS_API_KEY"] = #your api key
os.environ["WHYLABS_DEFAULT_ORG_ID"] = #your org-id
model_id= #your model-id
# Creating instance of the WhyLabs Writer to utilize WhyLabs platform
writer = WhyLabsWriter()
# Open a session
session = Session(project="demo-project", pipeline="demo-pipeline", writers=[writer])
# initialize a logger object and log each image in folder via for loop
with session.logger(tags={"datasetId": model_id}) as ylog:
for img_name in os.listdir('image_folder'):
ylog.log_image('image_folder/' + img_name)

Note that the log_image method only accepts a single image as input. Users are advised to loop through their collection of images within a single logger session to log an aggregate profile for their image set.

Once uploaded to WhyLabs, these metrics are tracked similarly to descriptive statistics generated when profiling tabular data. In the image below, we see that the mean brightness of logged images suddenly dropped on February 23rd, causing a spike in the distribution distance as compared to the reference profile. This anomaly is associated with images from Set 3.

Image Brightness

Similarly, we see a spike in the image hue on February 22nd, which corresponds to the swapped color channels in Set 2.

Image Hue

Multi-Modal Logging#

Furthermore, users can combine profiles from different data types into a single WhyLabs model due to the mergeability property of whylogs profiles. This is useful for cases in which multi-modal models are used, or if there is metadata available to supplement your dataset.

Suppose each image set was associated with a tabular dataset containing metadata like the shutter speed and library version associated with the camera used for the photograph and the image library used for pre-processing. This tabular metadata can be included in the logged profiles by adjusting the previous code as follows.

with session.logger(tags={"datasetId": model_id}) as ylog:
#read in tabular metadata
metadata_df= pd.read_csv('image_metadata.csv')
#log tabular metadata
ylog.log_dataframe(metadata_df)
for img_name in os.listdir('image_folder'):
ylog.log_image('image_folder/' + img_name)

When viewing the results in WhyLabs, we now see that the tabular metadata is monitored alongside the image data. In fact, this can be used to help troubleshoot in some cases. In the image below, we find that an anomaly in brightness correlates with an anomaly in the shutter speed used.

Alert Correlation

Exif Data#

Some images contain EXIF data which typically includes metadata stored by the device which took a photograph. If EXIF data is available for images profiled by whylogs, it will automatically be included in the image profile.

Custom Metrics#

Computer vision use cases can be highly specific. For this reason, users are able to define their own custom functions to operate on an image array when logging an image profile. In the following example, a custom function is built to extract the blue channel of the image.

class MyBlue:
def __call__(self, x):
_,_,b= x.split()
return np.array(b).reshape(-1,1)
def __repr__(self,):
return self.__class__.__name__
with session.logger(tags={"datasetId": model_id}) as ylog:
ylog.log_image('filename.png', feature_transforms = [MyBlue(), ComposeTransforms([MyBlue()])])

See more in this example notebook for image profiling.

Prefooter Illustration Mobile
Run AI With Certainty
Get started for free
Prefooter Illustration