Profile Store Service
Basic Usage
With the service up and running, you can call the API endpoints and manipulate the used Profile Store as desired. To list all existing profiles with Python's requests library, run:
import requests
response = requests.get(
url = "http://localhost:8000/v0/profile/list"
)
print(response.content)
To write Profiles, you'll need to have a whylogs' DatasetProfileView object, and then post to the store Write endpoint, like the following:
import whylogs as why
profile_view = why.log(df).view()
response = requests.post(
url = "http://localhost:8000/v0/profile/write",
files = {"profile": profile_view.serialize()},
params = {"dataset_id": "my_profile"}
)
print(response.content)
In order to get a written DatasetProfileView object, you will need to call the v0/profile/get endpoint, and pass
the dataset_id, the start_date and end_date for the Store to make the query. In case you want the all-time
merged DatasetProfileView, you can also pass only the dataset_id as the argument. For both queries, the profile
will come as a binary string, so you can use whylogs' API to deserialize it back into an object, like the following
example demonstrates:
url_get = 'http://localhost:8000/v0/profile/get'
# 1. Query Profile by name
resp = requests.get(url=url_get, params={"dataset_id": "my_profile"})
print(DatasetProfileView.deserialize(resp.content).to_pandas())
# 2. Query Profile by date range
start_date_ts = (datetime.utcnow() - timedelta(days=7)).timestamp()
end_date_ts = datetime.utcnow().timestamp()
resp_with_dates = requests.get(url=url_get, params={
"dataset_id": "my_profile",
"start_date": int(start_date_ts),
"end_date": int(end_date_ts)
}
)
print(DatasetProfileView.deserialize(resp_with_dates.content).to_pandas())
# 3. Query the profile from a single date - today
start_date_ts = datetime.utcnow().timestamp()
resp_with_today = requests.get(url=url_get, params={
"dataset_id": "my_profile",
"start_date": int(start_date_ts)
}
)
print(DatasetProfileView.deserialize(resp_with_today.content).to_pandas())
Configuring the Profile Store Service
To adjust the deployment to your needs, you may change the config.ini file that lives on the root of this repository or set the same keys as environment variables. The priority is set to try to fetch environment variables. If they do not exist, what is on the config.ini file will take place instead.
You may configure the following variables to your project:
| Variable | Description |
|---|---|
| MERGE_PROFILE_PERIOD_HOURS | How many hours does it take for the service to sync the DB with your s3 account. Default is 24. |
| SQLITE_STORE_LOCATION | What is the path you wish your DB file is placed. Default is /tmp/profile_store.db |
| PROFILE_STORE_TYPE | What is the Profile Store type you wish to use. Default is sqlite. |
| SQLITE_BACKUP_PATH | What is the prefix you wish your SQLite file should be stored to on s3. Default is store_backup_path |
| STORE_BUCKET_NAME | What is the name of the s3 bucket you will have your Profile Store be saved to. It has to exist before spinning up the service. Default is profile-store-bucket. |
| S3_PROFILE_NAME | Optional argument of an s3 Profile Name, in case you use one. Defaults to None. |
| AWS_ACCESS_KEY_ID | Optional Access Key, in case you will authenticate with S3 using Env Vars. Defaults to None. |
| AWS_SECRET_ACCESS_KEY | Optional Secret Access Key, in case you will authenticate with S3 using Env Vars. Defaults to None. |