Initialization and Authentication with whylogs
The recommended way to initialize whylogs when you're sending profiles to WhyLabs is by calling why.init()
before
any of your profiling code.
import whylogs as why
import pandas as pd
why.init() # Automatically determines how to authenticate
df = pd.read_csv('data.csv') # get some data
profile = why.log(df) # profile the data and automatically upload to WhyLabs
The intent of why.init
is that you can always call it at the start of your program and not worry too much about the
details of authentication and initialization.
How why.init
works
When you call why.init
it will attempt to determine what should happen with your profiles by creating a session with a particular type.
A session isn't something you have to care about, it's mostly just the current program's lifespan, or the current notebook kernel's lifespan, etc.
A session can have three types:
- WhyLabs Authenticated (
WHYLABS
) - Assumes you will be eventually uploading the profiles you generate to WhyLabs. - WhyLabs Anonymous (
WHYLABS_ANONYMOUS
) - Assumes you will be eventually uploading to WhyLabs as well, but doesn't require an api key or organization id, it just creates a new anonymous session for you that can be viewed by anyone that has the links that are generated. You can share the links with whoever you like, or no one. - Local (
LOCAL
) - Doesn't do anything with your profiles automatically and doesn't require any credentials or configuration.
The session type is determined by looking at the current enviroment config, contents of the whylabs.ini config file, and hard coded (optional) config in your code. It is roughly as follows:
- If there is an api key directly supplied to init via
why.init(whylabs_api_key='...')
, then use it and authenticate the session asWHYLABS
. - If there is an api key in the environment variable
WHYLABS_API_KEY
, then use it and authenticate the session asWHYLABS
. - If there is an api key in the whylogs config file, then use it and authenticate the session as
WHYLABS
. - If there is an anonymous session id in the whylogs config file then use it and authenticate the session as
WHYLABS_ANONYMOUS
. - If we're in an interactive environment (notebook, colab, etc.) then prompt to pick a method explicitly.
- If we're not in an interactive environment and
allow_anonymous=True
, then authenticate session asWHYLABS_ANONYMOUS
. - If we're not in an interactive environment and
allow_local=True
, then authenticate session asLOCAL
.
First time use
If this is your first time using whylogs and WhyLabs then you'll probably want to let the interactive prompt guide you. You can do this by either using why.init
in a notebook or by running python -m whylogs.api.whylabs.session.why_init
from the command line in an environment that you've installed whylogs into.
$ python -m whylogs.api.whylabs.session.why_init
Initialing session with config /home/user/.config/whylogs/config.ini
❓ What kind of session do you want to use?
⤷ 1. WhyLabs. Use an api key to upload to WhyLabs.
⤷ 2. WhyLabs Anonymous. Upload data anonymously to WhyLabs and get a viewing url.
Enter a number from the list: 1
Enter your WhyLabs api key. You can find it at https://hub.whylabsapp.com/settings/access-tokens: xxxxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx:org-xxxxxx
[OPTIONAL] Enter a default dataset id to upload to: model-54
✅ Using session type: WHYLABS
⤷ org id: org-JpsdM6
⤷ api key: 1y6ltXaa6a
⤷ default dataset: model-54
The interactive prompt is the same whether its in a notebook or via the cli. The only difference is that running it from the cli will wipe out the current state of the whylogs.ini config file and start fresh. Once you exit the prompt successfully you'll have a new whylogs.ini config file with the information that you entered and that will be used to determine your authentication method the next time why.init
is run from any environment on that machine.
Logger output
After you initialize and use why.log
you'll notice styled output intended for human consumption that includes links to view your profiled data. This output only happens when you're using the top level why.log
method in an interactive environment (like a notebook). The usual method for profiling data in production is the rolling logger that accumulates data over time and uploads in the background. The rolling logger won't output any fancy summaries or links.
Anonymous usage
If you use whylogs with an anonymous session then the generated profiles will automatically be uploaded to WhyLabs under an anonymous org. This anonymous org looks just like a normal org but has restricted features. You can see an example of an anonymous org here.
Anonymous sessions are an easy way to get started with whylogs and WhyLabs without having create an account or deal with any configuration first. After you create an account you'll be able to claim the anonymous sessions you generated and import them into your new personal account by clicking on the signup banner at the top of the anonymous session.
If you've already generated an anonymous session then it's id will be stored locally in your whylogs.ini file and you'll continue to use it for new data until you claim it into your real account, if you care to at all.
Remember, anonymous sessions are just that: they allow anyone to view them without authentication if they have the link. Only you will have the generated links and session id of course, and you can share it with whoever you'd like. When you're ready to start profiling data that you don't want to be viewable via a link then you can create a WhyLabs account and reinitialize with python -m whylogs.api.whylabs.session.why_init
, and choose a WhyLabs account.
Production usage
Production usage works the same way local usage works except you likely won't be in an interactive environment, so there will be no prompting in the session logic. The recommended way to set up your credentials in production is via the environment variables WHYLABS_API_KEY
and WHYLABS_DEFAULT_DATASET_ID
. Those will be automatically picked up and used by why.init
. You can technically supply these directly as why.init(whylabs_api_key='..', default_dataset_id='...')
but we discourage that because it implies that you would be committing that information to source control as well, which is a bad security practice.
We do need to know your organization id, but our latest api key format includes that id as xxxx.xxxxxx:orgId
. If you have an older api key and you can't generate a new api key easily for whatever reason, you can also supply the WHYLABS_DEFAULT_ORG_ID
environment variable to explicitly set your organization id.
Customizing init behavior
The fallback logic for why.init
can be customized to a small extent. You can enable/disable the option to have anonymous WhyLabs sessions and local sessions. The result of which is that they are included/removed from the fallback logic executed in why.init
. For example, if you realy want to make sure that an anonymous session isn't possible then you can initialize with why.init(allow_anonymous=False)
.