Ray
⚠️ See the v0 docs for using Ray with whylogs v0.
There are a lot of ways to use Ray but in general, you'll be creating whylogs profiles on each of the nodes/processes that are created by the Ray scheduler and returning them as DatasetProfileView
instances that can be merged with other profiles. Make sure to return DatasetProfileView
types from functions because DatasetProfile
can't be serialized, and Ray needs to be able to serialize anything moving in between nodes.
Here is a simple example that makes use of Ray's remote decorator to execute a function that profiles a dataframe.
Here is an example that uses Ray's actor and pipeline abstractions to split up a dataset into 8 shards, profile them, and merge the results into a single profile.
Ray also has Ray Serve as a higher level serving library that utilizes its core functionality to serve models (or anything really). You can use the examples above to integrate whylogs into Ray Serve as well, or you check out our container based integration if you would rather keep whylogs separate from your Ray setup. That will allow you to forward data to a dedicated container endpoint instead of managing whylogs on Ray.
The examples are a bit contrived, but Ray is a very flexible platform. Stop by our community slack to give us more information about how you're using Ray and we can try to custom tailor the advice to your use case.