This solution works well for anyone who already has a Kafka cluster setup to manage their data and does't want to integrate whylogs specific code into their application. The container is configured to listen to one or multiple topics in a cluster with one or multiple consumers on each instance of the container. The container can be configured to send profiles to s3, disk, or WhyLabs.
The Kafka Container is technically the same container that we use for our REST offering but configured to consume from a Kafka cluster (optionally in addition to functioning as a REST service). It takes the same configuration options as the REST container as well. The options that are important for running it as a Kafka consumer are highlighted below. See the rest container page for full configuration docs. Check out the full run instructions for running details.
vhe container expects each Kafka message to be a possibly nested JSON map of key value pairs. Nested values will either be ignored or flattened based on configuration. Take the following payload as an example.
If the container is configured to flatten data then it would effectively end up executing the following code against a whylogs profile.
The purpose of the container is to group data into buckets of profiles for you
outside of your application so that you don't have to embed whylogs specific
code in your main application path. If your container is configured to group
data into hourly profiles then the timestamp that Kafka stores for each message
will be used to determine what profile a data point gets rolled into, rather
timestamp field that might be present in the message itself.