Data Model - Advanced¶
In this tutorial, you will dive deeper into more advanced topics of data models, which are required to prepare your process data model for further analyses. More specifically, you will learn:
- How to create relationships between data model tables via foreign keys
- How to set up a process configuration in your data model
- How to import a name mapping file into your data model
- How to specify different data model reload options
Prerequisites¶
To follow this tutorial, you should have created a data model and should have uploaded data into it. As we continue working with the SAP Purchase-to-Pay (P2P) tables from the Data Upload tutorial, it is recommended to complete the Data Upload tutorial first. Further, it is recommended to complete the Data Export tutorial to have a basic understanding how data is retrieved from a data model via PQL.
Tutorial¶
1. Import PyCelonis and connect to Celonis API¶
from pycelonis import get_celonis
celonis = get_celonis(permissions=False)
[2023-07-04 13:27:58,065] INFO: No `base_url` given. Using environment variable 'CELONIS_URL' [2023-07-04 13:27:58,066] INFO: No `api_token` given. Using environment variable 'CELONIS_API_TOKEN'
[2023-07-04 13:27:58,145] WARNING: KeyType is not set. Defaulted to 'APP_KEY'.
[2023-07-04 13:27:58,147] INFO: Initial connect successful! PyCelonis Version: 2.3.2
2. Find data model tables¶
Let's start by locating the data model and the corresponding Purchase-to-Pay (P2P) tables, which we created in the Data Upload tutorial:
data_pool = celonis.data_integration.get_data_pools().find("PyCelonis Tutorial Data Pool")
data_pool
DataPool(id='6c178afe-21e2-4f77-b862-e37653ae0b2e', name='PyCelonis Tutorial Data Pool')
data_model = data_pool.get_data_models().find("PyCelonis Tutorial Data Model")
data_model
DataModel(id='0caea823-104c-4555-9b58-678a727c62b2', name='PyCelonis Tutorial Data Model', pool_id='6c178afe-21e2-4f77-b862-e37653ae0b2e')
tables = data_model.get_tables()
tables
[ DataModelTable(id='9379c106-9e44-4e7a-b9a3-2edd24f2a9cb', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', name='ACTIVITIES', alias='ACTIVITIES', data_pool_id='6c178afe-21e2-4f77-b862-e37653ae0b2e'), DataModelTable(id='4ce53164-90cb-4743-b828-58440e69c606', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', name='EKPO', alias='EKPO', data_pool_id='6c178afe-21e2-4f77-b862-e37653ae0b2e'), DataModelTable(id='76eb78e3-3cdc-4100-b741-39bfcb0cae4e', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', name='EKKO', alias='EKKO', data_pool_id='6c178afe-21e2-4f77-b862-e37653ae0b2e'), DataModelTable(id='141c3b90-17f0-4581-8182-f3f8842f1154', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', name='LFA1', alias='LFA1', data_pool_id='6c178afe-21e2-4f77-b862-e37653ae0b2e') ]
activities = tables.find("ACTIVITIES")
ekpo = tables.find("EKPO")
ekko = tables.find("EKKO")
lfa1 = tables.find("LFA1")
3. Create relationships between data model tables via foreign keys¶
As known from the Data Export tutorial, it is possible to combine columns from different data model tables inside a PQL query. These columns will be aggregated into a single result table using PQL's implicit join functionality. However, in order for this to work, we need to specify how the tables in a data model are connected. This is achieved by creating foreign key relationships between pairs of tables.
3.1 The Celonis data model¶
Tables in Celonis are organized in a snowflake schema with 1:N relationships between tables. Hereby, the activity table serves as the central fact table, around which all other tables are organized. The activity table also serves as the base table when creating the single result table during PQL's implicit grouping. Other tables, such as the case table or master data tables, are then merged via a left-outer join with the N-table on the left and the 1-table on the right side.
3.2 Create foreign key relationships¶
To create a new relationship between two tables which can be used to perform implicit joins, we have to call the create_foreign_key()
method inside the data model. The method takes as input arguments:
Name | Type | Description | Default |
---|---|---|---|
source_table_id |
str |
ID of the source table (i.e. 1-table; right table in implicit join) | Required |
target_table_id |
str |
ID of the target table (i.e. N-table; left table in implicit join) | Required |
columns |
List[Tuple[str,str]] |
List of tuples in the format ("sourceColumn", "targetColumn") that specifies the foreign keys (i.e. over which columns the are tables connected) |
Required |
Let's start by creating a relationship between our activity table ACTIVITIES
and our case table EKPO
(i.e. Purchase Order Items). Hereby, EKPO
(1-table) is the source and ACTIVITIES
(N-table) is the target. The tables are connected via the foreign key _CASE_KEY
. During an implicit join, the 1-table EKPO
(right side) is then connected to the N-table ACTIVITIES
(left side) via a left-outer join:
ekpo_activities_fk = data_model.create_foreign_key(
source_table_id=ekpo.id,
target_table_id=activities.id,
columns=[("_CASE_KEY", "_CASE_KEY")]
)
ekpo_activities_fk
[2023-07-04 13:27:58,220] INFO: Successfully created foreign key with id '8cb6fa49-7423-4a86-9156-8710908d7c4e'
ForeignKey(id='8cb6fa49-7423-4a86-9156-8710908d7c4e', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', source_table_id='4ce53164-90cb-4743-b828-58440e69c606', target_table_id='9379c106-9e44-4e7a-b9a3-2edd24f2a9cb')
Next, we create a relationship between EKKO
(i.e. Purchase Order Header) and EKPO
(i.e. Purchase Order Items). Hereby, EKKO
(1-table) is the source and EKPO
(N-table) is the target. The tables are connected via the foreign keys EBELN
and MANDT
. During an implicit join, the 1-table EKKO
(right side) is then connected to the N-table EKPO
via a left-outer join:
ekko_ekpo_fk = data_model.create_foreign_key(
source_table_id=ekko.id,
target_table_id=ekpo.id,
columns=[("EBELN", "EBELN"), ("MANDT", "MANDT")]
)
ekko_ekpo_fk
[2023-07-04 13:27:58,231] INFO: Successfully created foreign key with id '5a75050f-a1c8-496a-8897-9aa460969d72'
ForeignKey(id='5a75050f-a1c8-496a-8897-9aa460969d72', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', source_table_id='76eb78e3-3cdc-4100-b741-39bfcb0cae4e', target_table_id='4ce53164-90cb-4743-b828-58440e69c606')
Lastly, we create a relationship between EKKO
(i.e. Purchase Order Header) and LFA1
(i.e. Vendor Master Data). Hereby, LFA1
(1-table) is the source and EKKO
(N-table) is the target. The tables are connected via the foreign keys LIFNR
and MANDT
. During an implicit join, the 1-table LFA1
(right side) is then connected to the N-table EKKO
(left side) via a left-outer join:
lfa1_ekko_kf = data_model.create_foreign_key(
source_table_id=lfa1.id,
target_table_id=ekko.id,
columns=[("LIFNR", "LIFNR"), ("MANDT", "MANDT")]
)
lfa1_ekko_kf
[2023-07-04 13:27:58,242] INFO: Successfully created foreign key with id 'dce12120-1129-4bcc-9fb6-aafaf4172083'
ForeignKey(id='dce12120-1129-4bcc-9fb6-aafaf4172083', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', source_table_id='141c3b90-17f0-4581-8182-f3f8842f1154', target_table_id='76eb78e3-3cdc-4100-b741-39bfcb0cae4e')
We can verify that the newly-created foreign key relationships exist by calling the get_foreign_keys()
method from the data model:
data_model.get_foreign_keys()
[ ForeignKey(id='8cb6fa49-7423-4a86-9156-8710908d7c4e', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', source_table_id='4ce53164-90cb-4743-b828-58440e69c606', target_table_id='9379c106-9e44-4e7a-b9a3-2edd24f2a9cb'), ForeignKey(id='5a75050f-a1c8-496a-8897-9aa460969d72', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', source_table_id='76eb78e3-3cdc-4100-b741-39bfcb0cae4e', target_table_id='4ce53164-90cb-4743-b828-58440e69c606'), ForeignKey(id='dce12120-1129-4bcc-9fb6-aafaf4172083', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', source_table_id='141c3b90-17f0-4581-8182-f3f8842f1154', target_table_id='76eb78e3-3cdc-4100-b741-39bfcb0cae4e') ]
Important:
In order for the table relationships to be effective, we have to reload the data model. Otherwise, we will receive a No common table
error when querying across columns in multiple tables.
data_model.reload()
[2023-07-04 13:27:58,269] INFO: Successfully triggered data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2' [2023-07-04 13:27:58,269] INFO: Wait for execution of data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2'
3.3 Querying across multiple data model tables¶
Let's retrieve the PQL query from the Data Export tutorial, which we used to get data from the EMS. In this example, we could only query columns from a single table, i.e. ACTIVITIES
, as the tables in the data model were not connected to each other:
from pycelonis.pql import PQL, PQLColumn, PQLFilter, OrderByColumn
query = PQL(distinct=False, limit=3, offset=3)
query += PQLColumn(name="_CASE_KEY", query=""" "ACTIVITIES"."_CASE_KEY" """)
query += PQLColumn(name="ACTIVITY_EN", query=""" "ACTIVITIES"."ACTIVITY_EN" """)
query += PQLColumn(name="EVENTTIME", query=""" "ACTIVITIES"."EVENTTIME" """)
query += PQLColumn(name="_SORTING", query=""" "ACTIVITIES"."_SORTING" """)
query += PQLFilter(query=""" FILTER "ACTIVITIES"."_CASE_KEY" = '800000000006800001'; """)
query += OrderByColumn(query=""" "ACTIVITIES"."EVENTTIME" """)
query += OrderByColumn(query=""" "ACTIVITIES"."_SORTING" """)
result_df = data_model.export_data_frame(query)
result_df
[2023-07-04 13:27:58,315] INFO: Successfully created data export with id '598a2f3b-583b-42cf-9b00-0d9b3beba012' [2023-07-04 13:27:58,316] INFO: Wait for execution of data export with id '598a2f3b-583b-42cf-9b00-0d9b3beba012'
[2023-07-04 13:27:58,331] INFO: Export result chunks for data export with id '598a2f3b-583b-42cf-9b00-0d9b3beba012'
_CASE_KEY | ACTIVITY_EN | EVENTTIME | _SORTING | |
---|---|---|---|---|
0 | 800000000006800001 | Receive Goods | 2009-01-12 07:44:05 | 30.0 |
1 | 800000000006800001 | Scan Invoice | 2009-01-20 07:44:05 | NaN |
2 | 800000000006800001 | Book Invoice | 2009-01-30 07:44:05 | NaN |
Since the tables in the data model are now connected via foreign key relationships, we can also add columns from other tables to the result table. Let's add the material text from the purchase order item table EKPO
, the document type from the purchase order header table EKKO
, and the vendor name from the vendor master data table LFA1
:
query += PQLColumn(name="Material Text (MAKT_MAKTX)", query=""" "EKPO"."Material Text (MAKT_MAKTX)" """)
query += PQLColumn(name="Document Type Text (EKKO_BSART)", query=""" "EKKO"."Document Type Text (EKKO_BSART)" """)
query += PQLColumn(name="NAME1", query=""" "LFA1"."NAME1" """)
result_df = data_model.export_data_frame(query)
result_df
[2023-07-04 13:27:58,386] INFO: Successfully created data export with id 'cee31eb7-0ac7-4380-b754-b770843b804f' [2023-07-04 13:27:58,387] INFO: Wait for execution of data export with id 'cee31eb7-0ac7-4380-b754-b770843b804f'
[2023-07-04 13:27:58,402] INFO: Export result chunks for data export with id 'cee31eb7-0ac7-4380-b754-b770843b804f'
_CASE_KEY | ACTIVITY_EN | EVENTTIME | _SORTING | Material Text (MAKT_MAKTX) | Document Type Text (EKKO_BSART) | NAME1 | |
---|---|---|---|---|---|---|---|
0 | 800000000006800001 | Receive Goods | 2009-01-12 07:44:05 | 30.0 | Shafting assembly | Electronic commerce | eSupplier, Inc |
1 | 800000000006800001 | Scan Invoice | 2009-01-20 07:44:05 | NaN | Shafting assembly | Electronic commerce | eSupplier, Inc |
2 | 800000000006800001 | Book Invoice | 2009-01-30 07:44:05 | NaN | Shafting assembly | Electronic commerce | eSupplier, Inc |
4. Set up process configuration¶
After having defined the foreign key relationships, the next step in preparing the Celonis data model for further analyses is to set up a process configuration. This involves specifying:
- Which tables in our data model are the activity and case table
- Which columns of our activity table denote the Case ID, Activity Name, Timestamp, and Sorting columns
This can be done by calling the create_process_configuration()
method from our data model:
process_configuration = data_model.create_process_configuration(
activity_table_id=activities.id,
case_id_column="_CASE_KEY",
activity_column="ACTIVITY_EN",
timestamp_column="EVENTTIME",
sorting_column="_SORTING",
case_table_id=ekpo.id
)
process_configuration
[2023-07-04 13:27:58,431] INFO: Successfully created process configuration with id '519b2169-a1ed-4fb8-818c-e00295a06ef7'
ProcessConfiguration(id='519b2169-a1ed-4fb8-818c-e00295a06ef7', data_model_id='0caea823-104c-4555-9b58-678a727c62b2', activity_table_id='9379c106-9e44-4e7a-b9a3-2edd24f2a9cb', case_table_id='4ce53164-90cb-4743-b828-58440e69c606', case_id_column='_CASE_KEY', activity_column='ACTIVITY_EN', timestamp_column='EVENTTIME', sorting_column='_SORTING')
In order for the changes to be effective, we need to reload the data model again:
data_model.reload()
[2023-07-04 13:27:58,450] INFO: Successfully triggered data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2' [2023-07-04 13:27:58,451] INFO: Wait for execution of data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2'
5. Data Model Reload Advanced¶
5.1 Partial data model reload¶
A last topic in setting up our data model is to decide, how the tables should be loaded from the data pool into our data model. We need to specify whether all tables should be reloaded or only selected tables:
data_model.reload()
[2023-07-04 13:27:58,500] INFO: Successfully triggered data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2' [2023-07-04 13:27:58,500] INFO: Wait for execution of data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2'
data_model.partial_reload(data_model_table_ids=[lfa1.id, ekko.id])
[2023-07-04 13:27:58,553] INFO: Successfully triggered data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2' [2023-07-04 13:27:58,553] INFO: Wait for execution of data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2'
5.2. Data model reload without wait¶
Lastly, we can specify whether we want to wait for the data model reload (wait
). If wait=True
, the method waits until the reload is successfully completed and raises an error if the reload fails. If wait=False
, the method does not wait for the reload and does not raise an error in case of a failed reload:
data_model.reload(wait=False)
[2023-07-04 13:27:58,611] INFO: Successfully triggered data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2'
data_model.partial_reload(wait=False, data_model_table_ids=[ekko.id])
[2023-07-04 13:28:10,758] INFO: Successfully triggered data model reload for data model with id '0caea823-104c-4555-9b58-678a727c62b2'
Conclusion¶
Congratulations! You have successfully learned how to prepare your process data model for further analyses. In the next tutorial Data Upload & Export Advanced, you will dive deeper into advanced topics of data pushs/exports, such as chunking, different export/import types, and specifying custom column configurations.