Profile Store Service
Basic Usage
With the service up and running, you can call the API endpoints and manipulate the used Profile Store as desired. To list all existing profiles with Python's requests library, run:
import requests
response = requests.get(
url = "http://localhost:8000/v0/profile/list"
)
print(response.content)
To write Profiles, you'll need to have a whylogs' DatasetProfileView
object, and then post to the store Write endpoint, like the following:
import whylogs as why
profile_view = why.log(df).view()
response = requests.post(
url = "http://localhost:8000/v0/profile/write",
files = {"profile": profile_view.serialize()},
params = {"dataset_id": "my_profile"}
)
print(response.content)
In order to get a written DatasetProfileView
object, you will need to call the v0/profile/get
endpoint, and pass
the dataset_id
, the start_date
and end_date
for the Store to make the query. In case you want the all-time
merged DatasetProfileView
, you can also pass only the dataset_id
as the argument. For both queries, the profile
will come as a binary string, so you can use whylogs' API to deserialize it back into an object, like the following
example demonstrates:
url_get = 'http://localhost:8000/v0/profile/get'
# 1. Query Profile by name
resp = requests.get(url=url_get, params={"dataset_id": "my_profile"})
print(DatasetProfileView.deserialize(resp.content).to_pandas())
# 2. Query Profile by date range
start_date_ts = (datetime.utcnow() - timedelta(days=7)).timestamp()
end_date_ts = datetime.utcnow().timestamp()
resp_with_dates = requests.get(url=url_get, params={
"dataset_id": "my_profile",
"start_date": int(start_date_ts),
"end_date": int(end_date_ts)
}
)
print(DatasetProfileView.deserialize(resp_with_dates.content).to_pandas())
# 3. Query the profile from a single date - today
start_date_ts = datetime.utcnow().timestamp()
resp_with_today = requests.get(url=url_get, params={
"dataset_id": "my_profile",
"start_date": int(start_date_ts)
}
)
print(DatasetProfileView.deserialize(resp_with_today.content).to_pandas())
Configuring the Profile Store Service
To adjust the deployment to your needs, you may change the config.ini
file that lives on the root of this repository or set the same keys as environment variables. The priority is set to try to fetch environment variables. If they do not exist, what is on the config.ini
file will take place instead.
You may configure the following variables to your project:
Variable | Description |
---|---|
MERGE_PROFILE_PERIOD_HOURS | How many hours does it take for the service to sync the DB with your s3 account. Default is 24. |
SQLITE_STORE_LOCATION | What is the path you wish your DB file is placed. Default is /tmp/profile_store.db |
PROFILE_STORE_TYPE | What is the Profile Store type you wish to use. Default is sqlite. |
SQLITE_BACKUP_PATH | What is the prefix you wish your SQLite file should be stored to on s3. Default is store_backup_path |
STORE_BUCKET_NAME | What is the name of the s3 bucket you will have your Profile Store be saved to. It has to exist before spinning up the service. Default is profile-store-bucket. |
S3_PROFILE_NAME | Optional argument of an s3 Profile Name, in case you use one. Defaults to None. |
AWS_ACCESS_KEY_ID | Optional Access Key, in case you will authenticate with S3 using Env Vars. Defaults to None. |
AWS_SECRET_ACCESS_KEY | Optional Secret Access Key, in case you will authenticate with S3 using Env Vars. Defaults to None. |