whylogs can help monitor your ML datasets as part of your GitOps CI/CD pipeline. This document serves as an example of how to use github actions to check whether your data satisfies certain constraints you wish to enforce.
See this example in our github repo here.
See the full github action here.
NOTEThis example uses a Docker container which installs whylogs v0 within the container. An example using whylogs v1 is coming soon!
The github workflow in this example should be defined in a file with the path
.github/workflows/constraints.yml. It’s important that this convention is followed in order for a workflow to be recognized by github.
This workflow defines what actions to take whenever commits are pushed to your repo.
This directory contains whylogs constraints that are applied to a dataset as part of the Github action. Constraints assert that a logged value or summary statistic is within an expected range.
We define two steps in our action. The first runs a set of constraints that are expected to fail. That is done just to check that the constraint logic is working as expected. The second step applies a set of constraints that are expected to succeed.
uses: references the prepackaged action in the
That tells github how to run whylogs on parameters you supply. This is a tag common to all Github actions.
constraintsfile: points to a file of constraints defined in this repo.
datafile: points to a file containing data to which the constraints should be applied. Format
is anything that the pandas package can load, but CSV works well.
expect_failure: indicates whether the action is expected to fail or not. Actions are usually
written to expect success; we include this flag for completeness.
whylogs constraints are specified in JSON. Each constraint is bound to a column in the data, and each column may have multiple constraints. Standard boolean comparison operators are supported -- LT, LE, EQ, NE, GE, GT. We are actively extending whylogs to support other constraint operators, for example, to match regex on strings or to test image features.
This example shows the definition of two types of constraints;
Value constraints are applied to every value that is logged for a feature. At a minimum,
Value constraints must specify a comparison operator and a literal value.
Summary constraints are applied to Whylogs feature summaries.
They compare fields of the summary to static literals or to another field in the summary,
Constraints may be marked 'verbose' which will log every failure.
Verbose logging helps identify why a constraint is failing to validate, but can be excessive if there are a lot of failures.