Profile Store Service

Basic Usage

With the service up and running, you can call the API endpoints and manipulate the used Profile Store as desired. To list all existing profiles with Python's requests library, run:

import requests

response = requests.get(
    url = "http://localhost:8000/v0/profile/list"
)

print(response.content)

To write Profiles, you'll need to have a whylogs' DatasetProfileView object, and then post to the store Write endpoint, like the following:

import whylogs as why

profile_view = why.log(df).view()

response = requests.post(
    url = "http://localhost:8000/v0/profile/write",
    files = {"profile": profile_view.serialize()},
    params = {"dataset_id": "my_profile"}
)

print(response.content)

In order to get a written DatasetProfileView object, you will need to call the v0/profile/get endpoint, and pass the dataset_id, the start_date and end_date for the Store to make the query. In case you want the all-time merged DatasetProfileView, you can also pass only the dataset_id as the argument. For both queries, the profile will come as a binary string, so you can use whylogs' API to deserialize it back into an object, like the following example demonstrates:

url_get = 'http://localhost:8000/v0/profile/get'

# 1. Query Profile by name
resp = requests.get(url=url_get, params={"dataset_id": "my_profile"})
print(DatasetProfileView.deserialize(resp.content).to_pandas())

# 2. Query Profile by date range

start_date_ts = (datetime.utcnow() - timedelta(days=7)).timestamp()
end_date_ts = datetime.utcnow().timestamp()

resp_with_dates = requests.get(url=url_get, params={
    "dataset_id": "my_profile",
    "start_date": int(start_date_ts),
    "end_date": int(end_date_ts)
    }
)
print(DatasetProfileView.deserialize(resp_with_dates.content).to_pandas())

# 3. Query the profile from a single date - today
start_date_ts = datetime.utcnow().timestamp()

resp_with_today = requests.get(url=url_get, params={
    "dataset_id": "my_profile",
    "start_date": int(start_date_ts)
    }
)
print(DatasetProfileView.deserialize(resp_with_today.content).to_pandas())

Configuring the Profile Store Service

To adjust the deployment to your needs, you may change the config.ini file that lives on the root of this repository or set the same keys as environment variables. The priority is set to try to fetch environment variables. If they do not exist, what is on the config.ini file will take place instead.

You may configure the following variables to your project:

Variable	Description
MERGE_PROFILE_PERIOD_HOURS	How many hours does it take for the service to sync the DB with your s3 account. Default is 24.
SQLITE_STORE_LOCATION	What is the path you wish your DB file is placed. Default is /tmp/profile_store.db
PROFILE_STORE_TYPE	What is the Profile Store type you wish to use. Default is sqlite.
SQLITE_BACKUP_PATH	What is the prefix you wish your SQLite file should be stored to on s3. Default is store_backup_path
STORE_BUCKET_NAME	What is the name of the s3 bucket you will have your Profile Store be saved to. It has to exist before spinning up the service. Default is profile-store-bucket.
S3_PROFILE_NAME	Optional argument of an s3 Profile Name, in case you use one. Defaults to None.
AWS_ACCESS_KEY_ID	Optional Access Key, in case you will authenticate with S3 using Env Vars. Defaults to None.
AWS_SECRET_ACCESS_KEY	Optional Secret Access Key, in case you will authenticate with S3 using Env Vars. Defaults to None.

Basic Usage​

Configuring the Profile Store Service​

Basic Usage

Configuring the Profile Store Service