Performance Tracing

Overview

Tracing, also known as performance or bias tracing, is key to discovering which segments within your data contribute negatively or positively towards your model performance.

Segments can each have their own performance metrics. Often when your performance metrics change, you would like to know which of your segments contributed most to the change. The Tracing Dashboard provides you this view.

The Tracing Dashboard allows you to compare up to two model performance metrics, a primary and a secondary, and see the breakdown by segment for those metrics.

Interface

There are two main components found on the Tracing Dashboard: 1) The timeseries chart which shows the selected metric values over the time range. On the right side of this chart you can see a single value for each metric aggregated over the time range. 2) The comparison bar charts, which show the metrics broken down by segment.

Performance tracing with accuracy

Classification models will default to Accuracy as the primary metric (dark blue), and Prediction count as the secondary (light blue). These defaults let you compare the volume of data points in each segment by looking at the secondary metric, which may affect the Accuracy of that segment as the primary metric. Regression models will default to Root Mean Square Error (RMSE) as the primary metric.

Comparing profiles

When you want to know how one time range compares to another, or how a single profile compares to a time range. You may select a comparison profile or time range for comparison. Comparison in tracing shows in orange. The primary metric will be dark orange, and the secondary metric light orange.

Performance tracing profile comparison

Reference threshold

The reference threshold is adjustable and is set automatically when you select a primary metric. By default it will be set to the aggregated value for that metric over the time range or profile. The reference threshold default provides a midline for which segments appear below, and which appear above that metric for the whole time range.

The reference threshold highlights the segments whose metric falls on one side of the reference threshold. Accuracy, Precision, Recall, and F1 score will highlight when the segment's metric is less than the threshold. False positive rate and Prediction count will highlight when the metric is greater.

Uploading whylogs segmented performance data

# For classification models
segmented_classification_results = log_classification_metrics(
  df,
  target_column = "TARGET_COLUMN_NAME",
  prediction_column = "PREDICTION_COLUMN_NAME",
  schema = DatasetSchema(segments=segment_on_column("SEGMENTED_COLUMN_NAME"))
)
# upload results to whylabs
segmented_classification_results.writer("whylabs").write()