Table of Contents
Define functions and classes for interfacing with datasketches
deserialize_kll_floats_sketch
deserialize_kll_floats_sketch(x: bytes, kind: str = "float")
Deserialize a KLL floats sketch. Compatible with whylogs-java
whylogs histograms are serialized as kll floats sketches
Parameters
x : bytes Serialized sketch kind : str, optional Specify type of sketch: 'float' or 'int'
Returns
sketch : kll_floats_sketch
, kll_ints_sketch
, or None
If x
is an empty sketch, return None, else return the deserialized
sketch.
deserialize_frequent_strings_sketch
deserialize_frequent_strings_sketch(x: bytes)
Deserialize a frequent strings sketch. Compatible with whylogs-java
Wrapper for datasketches.frequent_strings_sketch.deserialize
Parameters
x : bytes Serialized sketch
Returns
sketch : datasketches.frequent_strings_sketch
, None
If x
is an empty string sketch, returns None, else returns the
deserialized string sketch
FrequentItemsSketch Objects
class FrequentItemsSketch()
A class to implement frequent item counting for mixed data types.
Wraps datasketches.frequent_strings_sketch
by encoding numbers as
strings since the datasketches
python implementation does not implement
frequent number tracking.
Parameters
lg_max_k : int, optional Parameter controlling the size and accuracy of the sketch. A larger number increases accuracy and the memory requirements for the sketch sketch : datasketches.frequent_strings_sketch, optional Initialize with an existing frequent strings sketch
get_apriori_error
| get_apriori_error(lg_max_map_size: int, estimated_total_weight: int)
Return an apriori estimate of the uncertainty for various parameters
Parameters
lg_max_map_size : int
The lg_max_k
value
estimated_total_weight
Total weight (see :func:FrequentItems.get_total_weight
)
Returns
error : float Approximate uncertainty
get_frequent_items
| get_frequent_items(err_type: datasketches.frequent_items_error_type = None, threshold: int = 0, decode: bool = True)
Retrieve the frequent items.
Parameters
err_type : datasketches.frequent_items_error_type Override default error type threshold : int Minimum count for returned items decode : bool (default=True) Decode the returned values. Internally, all items are encoded as strings.
Returns
items : list
A list of tuples of items: [(item, count)]
merge
| merge(other)
Merge the item counts of this sketch with another.
This object will not be modified. This operation is commutative.
Parameters
other: FrequentItemsSketch The other sketch
copy
| copy()
Returns
sketch : FrequentItemsSketch A copy of this sketch
serialize
| serialize()
Serialize this sketch as a bytes string.
See also :func:FrequentItemsSketch.deserialize
Returns
data : bytes Serialized object.
update
| update(x, weight=1)
Track an item.
Parameters
x : object Item to track weight : int Number of times the item appears
to_summary
| to_summary(max_items=30, min_count=1)
Generate a protobuf summary. Returns None if there are no frequent items.
Parameters
max_items : int Maximum number of items to return. The most frequent items will be returned min_count : int Minimum number counts for all returned items
Returns
summary : FrequentItemsSummary Protobuf summary message
to_protobuf
| to_protobuf()
Generate a protobuf representation of this object
from_protobuf
| @staticmethod
| from_protobuf(message: FrequentItemsSketchMessage)
Initialize a FrequentItemsSketch from a protobuf FrequentItemsSketchMessage
deserialize
| @staticmethod
| deserialize(x: bytes)
Deserialize a frequent numbers sketch.
If x is an empty sketch, None is returned
FrequentNumbersSketch Objects
class FrequentNumbersSketch(FrequentItemsSketch)
A class to implement frequent number counting
copy
| copy()
Returns
self_copy : FrequentNumbersSketch A copy of this object
to_summary
| to_summary(max_items=30, min_count=1)
Generate a protobuf summary. Returns None if there are no frequent items.
Parameters
max_items : int Maximum number of items to return. The most frequent items will be returned min_count : int Minimum number counts for all returned items
Returns
summary : FrequentNumbersSummary Protobuf summary message
to_protobuf
| to_protobuf()
Generate a protobuf representation of this object
deserialize
| @staticmethod
| deserialize(x: bytes)
Deserialize a frequent numbers sketch.
If x is an empty sketch, None is returned
flatten_summary
| @staticmethod
| flatten_summary(summary: FrequentItemsSummary)
Flatten a FrequentNumbersSummary