whylogs.util.dsketch

Define functions and classes for interfacing with datasketches

deserialize_kll_floats_sketch#

deserialize_kll_floats_sketch(x: bytes, kind: str = "float")

Deserialize a KLL floats sketch. Compatible with whylogs-java

whylogs histograms are serialized as kll floats sketches

Parameters#

x : bytes Serialized sketch kind : str, optional Specify type of sketch: 'float' or 'int'

Returns#

sketch : kll_floats_sketch, kll_ints_sketch, or None If x is an empty sketch, return None, else return the deserialized sketch.

deserialize_frequent_strings_sketch#

deserialize_frequent_strings_sketch(x: bytes)

Deserialize a frequent strings sketch. Compatible with whylogs-java

Wrapper for datasketches.frequent_strings_sketch.deserialize

Parameters#

x : bytes Serialized sketch

Returns#

sketch : datasketches.frequent_strings_sketch, None If x is an empty string sketch, returns None, else returns the deserialized string sketch

FrequentItemsSketch Objects#

class FrequentItemsSketch()

A class to implement frequent item counting for mixed data types.

Wraps datasketches.frequent_strings_sketch by encoding numbers as strings since the datasketches python implementation does not implement frequent number tracking.

Parameters#

lg_max_k : int, optional Parameter controlling the size and accuracy of the sketch. A larger number increases accuracy and the memory requirements for the sketch sketch : datasketches.frequent_strings_sketch, optional Initialize with an existing frequent strings sketch

get_apriori_error#

| get_apriori_error(lg_max_map_size: int, estimated_total_weight: int)

Return an apriori estimate of the uncertainty for various parameters

Parameters#

lg_max_map_size : int The lg_max_k value estimated_total_weight Total weight (see :func:FrequentItems.get_total_weight)

Returns#

error : float Approximate uncertainty

get_frequent_items#

| get_frequent_items(err_type: datasketches.frequent_items_error_type = None, threshold: int = 0, decode: bool = True)

Retrieve the frequent items.

Parameters#

err_type : datasketches.frequent_items_error_type Override default error type threshold : int Minimum count for returned items decode : bool (default=True) Decode the returned values. Internally, all items are encoded as strings.

Returns#

items : list A list of tuples of items: [(item, count)]

merge#

| merge(other)

Merge the item counts of this sketch with another.

This object will not be modified. This operation is commutative.

Parameters#

other: FrequentItemsSketch The other sketch

copy#

| copy()

Returns#

sketch : FrequentItemsSketch A copy of this sketch

serialize#

| serialize()

Serialize this sketch as a bytes string.

See also :func:FrequentItemsSketch.deserialize

Returns#

data : bytes Serialized object.

update#

| update(x, weight=1)

Track an item.

Parameters#

x : object Item to track weight : int Number of times the item appears

to_summary#

| to_summary(max_items=30, min_count=1)

Generate a protobuf summary. Returns None if there are no frequent items.

Parameters#

max_items : int Maximum number of items to return. The most frequent items will be returned min_count : int Minimum number counts for all returned items

Returns#

summary : FrequentItemsSummary Protobuf summary message

to_protobuf#

| to_protobuf()

Generate a protobuf representation of this object

from_protobuf#

| @staticmethod
| from_protobuf(message: FrequentItemsSketchMessage)

Initialize a FrequentItemsSketch from a protobuf FrequentItemsSketchMessage

deserialize#

| @staticmethod
| deserialize(x: bytes)

Deserialize a frequent numbers sketch.

If x is an empty sketch, None is returned

FrequentNumbersSketch Objects#

class FrequentNumbersSketch(FrequentItemsSketch)

A class to implement frequent number counting

copy#

| copy()

Returns#

self_copy : FrequentNumbersSketch A copy of this object

to_summary#

| to_summary(max_items=30, min_count=1)

Generate a protobuf summary. Returns None if there are no frequent items.

Parameters#

max_items : int Maximum number of items to return. The most frequent items will be returned min_count : int Minimum number counts for all returned items

Returns#

summary : FrequentNumbersSummary Protobuf summary message

to_protobuf#

| to_protobuf()

Generate a protobuf representation of this object

deserialize#

| @staticmethod
| deserialize(x: bytes)

Deserialize a frequent numbers sketch.

If x is an empty sketch, None is returned

flatten_summary#

| @staticmethod
| flatten_summary(summary: FrequentItemsSummary)

Flatten a FrequentNumbersSummary