There are a lot of ways to use Ray but in general, you'll be creating whylogs profiles on each of the nodes/processes that are created by the Ray scheduler and returning them as protobuf serialized strings that can be serialized into profile objects and merged with other profiles. Currently, whylogs profiles don't pickle (the default serialization used by Ray) so it's necessary to serialize them using our protobuf format.
The snippet above will get you a profile that you can return from
remote calls or actor executions. Once you have many of these profiles, you can merge them down into a single one.
Here is an example that uses ray functions and pipelines. A few CSVs are wrapped in a pipeline and then batches of 1000 are converted into whylogs profiles in parallel and returned as serialized profiles that can be merged into a single profile.
Here is another example where pipelines are used to split data into 8 equal parts and then sent off to individual actors for some further processing.
Ray also has Ray Serve as a higher level serving library that utilizes its core functionality to serve models (or anything really). You can use the examples above to integrate whylogs into Ray Serve as well, or you check out our container based integration if you would rather keep whylogs separate from your Ray setup. That will allow you to forward data to a dedicated container endpoint instead of managing whylogs on Ray.
The examples are admittedly a bit contrived, but Ray is a very flexible platform. Stop by our community slack to give us more information about how you're using Ray and we can try to custom tailor the advice to your use case.