Table of Contents
Defines the ColumnProfile class for tracking per-column statistics
ColumnProfile Objects
class ColumnProfile()
Statistics tracking for a column (i.e. a feature)
The primary method for
Parameters
name : str (required) Name of the column profile number_tracker : NumberTracker Implements numeric data statistics tracking string_tracker : StringTracker Implements string data-type statistics tracking schema_tracker : SchemaTracker Implements tracking of schema-related information counters : CountersTracker Keep count of various things frequent_items : FrequentItemsSketch Keep track of all frequent items, even for mixed datatype features cardinality_tracker : HllSketch Track feature cardinality (even for mixed data types) constraints : ValueConstraints Static assertions to be applied to numeric data tracked in this column
TODO:
- Proper TypedDataConverter type checking
- Multi-threading/parallelism
track
| track(value, character_list=None, token_method=None)
Add value
to tracking statistics.
to_summary
| to_summary()
Generate a summary of the statistics
Returns
summary : ColumnSummary Protobuf summary message.
merge
| merge(other)
Merge this columnprofile with another.
Parameters
other : ColumnProfile
Returns
merged : ColumnProfile A new, merged column profile.
to_protobuf
| to_protobuf()
Return the object serialized as a protobuf message
Returns
message : ColumnMessage
from_protobuf
| @staticmethod
| from_protobuf(message)
Load from a protobuf message
Returns
column_profile : ColumnProfile