Table of Contents
SCALAR_NAME_MAPPING
NOTE: I use ordered dicts here to control the ordering of generated columns dictionaries are also valid Define (some of) the mapping from dataset summary to flat table
flatten_summary
flatten_summary(dataset_summary: DatasetSummary) -> dict
Flatten a DatasetSummary
Parameters
dataset_summary : DatasetSummary Summary to flatten
Returns
data : dict A dictionary with the following keys:
summary : pandas.DataFrame
Per-column summary statistics
hist : pandas.Series
Series of histogram Series with (column name, histogram) key,
value pairs. Histograms are formatted as a pandas.Series
frequent_strings : pandas.Series
Series of frequent string counts with (column name, counts)
key, val pairs. counts
are a pandas Series.
Notes
Some relevant info on the summary mapping:
.. code-block:: python
>>> from whylogs.core.datasetprofile import SCALAR_NAME_MAPPING >>> import json >>> print(json.dumps(SCALAR_NAME_MAPPING, indent=2))
flatten_dataset_quantiles
flatten_dataset_quantiles(dataset_summary: DatasetSummary)
Flatten quantiles from a dataset summary
flatten_dataset_string_quantiles
flatten_dataset_string_quantiles(dataset_summary: DatasetSummary)
Flatten quantiles from a dataset summary
flatten_dataset_histograms
flatten_dataset_histograms(dataset_summary: DatasetSummary)
Flatten histograms from a dataset summary
flatten_dataset_frequent_numbers
flatten_dataset_frequent_numbers(dataset_summary: DatasetSummary)
Flatten frequent number counts from a dataset summary
flatten_dataset_frequent_strings
flatten_dataset_frequent_strings(dataset_summary: DatasetSummary)
Flatten frequent strings summaries from a dataset summary
get_dataset_frame
get_dataset_frame(dataset_summary: DatasetSummary, mapping: dict = None)
Get a dataframe from scalar values flattened from a dataset summary
Parameters
dataset_summary : DatasetSummary The dataset summary. mapping : dict, optional Override the default variable mapping.
Returns
summary : pd.DataFrame
Scalar values, flattened and re-named according to mapping