Table of Contents
whylogs logging session
Session Objects
class Session()
Parameters
project : str
The project name. We will default to the project name when logging
a dataset if the dataset name is not specified
pipeline : str
Name of the pipeline associated with this session
writers : list
configuration for the output writers. This is where the log data
will go
verbose : bool
enable verbose logging for not. Default is False
logger
| logger(dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str], str]] = None, profile_full_dataset: bool = False, with_rotation_time: str = None, cache_size: int = 1, constraints: DatasetConstraints = None) -> Logger
Create a new logger or return an existing one for a given dataset name. If no dataset_name is specified, we default to project name
Arguments:
dataset_name
- name of the datasetdataset_timestamp
- timestamp of the dataset. Default to nowsession_timestamp
- timestamp of the session. Inherits from the sessiontags
- metadata associated with the profilemetadata
- same as tags. Will be deprecatedsegments
- slice of data that the profile belongs toprofile_full_dataset
- when segmenting dataset, an option to keep the full unsegmented profile of the datasetwith_rotation_time
- rotation time in minutes our hours ("1m", "1h")cache_size
- size of the segment cacheconstraints
- whylogs contrainst to monitor against
log_dataframe
| log_dataframe(df: pd.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None, segments: Optional[Union[List[Dict], List[str], str]] = None, profile_full_dataset: bool = False, constraints: DatasetConstraints = None) -> Optional[DatasetProfile]
Perform statistics caluclations and log a pandas dataframe
Arguments:
Can be either:
- Autosegmentation source, one of ["auto", "local"]
- List of tag key value pairs for tracking data segments
- List of tag keys for which we will track every value
- None, no segments will be used
df
: the dataframe to profiledataset_name
: name of the datasetdataset_timestamp
: the timestamp for the datasetsession_timestamp
: the timestamp for the session. Override the default onetags
: the tags for the profile. Useful when mergingmetadata
: information about this current profile. Can be discarded when mergingsegments
:profile_full_dataset
: when segmenting dataset, an option to keep the full unsegmented profile of the dataset
Returns:
a dataset profile if the session is active
profile_dataframe
| profile_dataframe(df: pd.DataFrame, dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) -> Optional[DatasetProfile]
Profile a Pandas dataframe without actually writing data to disk. This is useful when you just want to quickly capture and explore a dataset profile.
Arguments:
df
: the dataframe to profiledataset_name
: name of the datasetdataset_timestamp
: the timestamp for the datasetsession_timestamp
: the timestamp for the session. Override the default onetags
: the tags for the profile. Useful when mergingmetadata
: information about this current profile. Can be discarded when merging
Returns:
a dataset profile if the session is active
new_profile
| new_profile(dataset_name: Optional[str] = None, dataset_timestamp: Optional[datetime.datetime] = None, session_timestamp: Optional[datetime.datetime] = None, tags: Dict[str, str] = None, metadata: Dict[str, str] = None) -> Optional[DatasetProfile]
Create an empty dataset profile with the metadata from the session.
Arguments:
dataset_name
: name of the datasetdataset_timestamp
: the timestamp for the datasetsession_timestamp
: the timestamp for the session. Override the default onetags
: the tags for the profile. Useful when mergingmetadata
: information about this current profile. Can be discarded when merging
Returns:
a dataset profile if the session is active
estimate_segments
| estimate_segments(df: pd.DataFrame, name: str, target_field: str = None, max_segments: int = 30, dry_run: bool = False) -> Optional[Union[List[Dict], List[str]]]
Estimates the most important features and values on which to segment data profiling using entropy-based methods.
Arguments:
to loggers with same dataset_name default 30
df
: the dataframe of data to profilename
: name for discovery in the logger, automatically appliedtarget_field
: target field (optional)max_segments
: upper threshold for total combinations of segments,dry_run
: run calculation but do not write results to metadata
Returns:
a list of segmentation feature names
close
| close()
Deactivate this session and flush all associated loggers
remove_logger
| remove_logger(dataset_name: str)
Remove a logger from the dataset. This is called by the logger when it's being closed
Parameters
dataset_name the name of the dataset. used to identify the logger
Returns None
session_from_config
session_from_config(config: SessionConfig) -> Session
Construct a whylogs session from a SessionConfig
reset_default_session
reset_default_session()
Reset and deactivate the global whylogs logging session.
get_or_create_session
get_or_create_session(path_to_config: Optional[str] = None, report_progress: Optional[bool] = False)
Retrieve the current active global session.
If no active session exists, attempt to load config and create a new session.
If an active session exists, return the session without loading new config.
Returns:
Session
: The global active session
Arguments:
path_to_config
(str
):
get_session
get_session()
Retrieve the logging session without altering or activating it.
Returns
session : Session The global session
get_logger
get_logger()
Retrieve the global session logger
Returns
ylog : whylogs.app.logger.Logger The global session logger