Adding new data end-to-end¶
#To get a copy of this notebook in your current working dirtectory, run:
from pycelonis.notebooks import api_tutorial
1. Connect to Celonis¶
from pycelonis import get_celonis
celonis = get_celonis()
2. Find Data Model¶
We can now add this table to a Data Model and make it available to the Celonis Analyses, Stduio etc.
celonis.datamodels
[<Datamodel, id bac5d83f-5849-40a6-9bc8-c090219f07ba, name TEST_DATAMODELzzz_pycelonis_test5575704>, <Datamodel, id 475e45fc-8e18-4dfc-be5e-8d8d3cd3bda1, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id ce8f08c9-457e-451f-aaae-b12ecf062b26, name TEST_DATAMODELzzz_pycelonis_test3086618>, <Datamodel, id f404c22d-885d-4a7d-b728-0961e229c64a, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id 30c9a417-b4c2-40e6-b1c5-9145253b5269, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id 77a336d8-a1b8-4a8a-9863-2a4004def82b, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id a40c7f9d-6893-46ed-9dbb-6a356e621444, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id 17dcad74-3fab-4e18-ae21-0efd186ea9b6, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id 47a1929d-6c0e-47bd-bd70-fa201b2775f8, name SAP ECC - Accounts Payable Data Model>, <Datamodel, id b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864, name SAP ECC - Accounts Payable Data Model>,]
data_model = celonis.datamodels.find("b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864")
data_model
<Datamodel, id b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864, name SAP ECC - Accounts Payable Data Model>
data_model.tables
[<DatamodelTable, id 43fe10d1-1de4-417f-8f67-6720c66bbd4d, name ACTIVITY_TABLE>, <DatamodelTable, id 552c3a26-7370-4a21-81fb-0417c53cadcd, name CASE_TABLE>,]
3. Connect to Data Pool¶
To add a new table to the Data Model you first need to create this table in the respective Data Pool. The pool can be accessed as seen below.
data_pool = data_model.pool
data_pool
<Pool, id ac350720-a9d7-4683-a3c8-d7fb98c7d8bb, name zzz_pycelonis_test9300460>
4. Create a pandas dataframe¶
import pandas as pd
df = pd.DataFrame({'A': [2, 4, 8, 0], 'B': [2, 0, 0, 0], 'C_KEY': ["A", "B", "C", "D"]})
df.head()
A | B | C_KEY | |
---|---|---|---|
0 | 2 | 2 | A |
1 | 4 | 0 | B |
2 | 8 | 0 | C |
3 | 0 | 0 | D |
5. Creat Table in Data Pool¶
Pushing and editing data in the current data pool
The following functions have required and optional parameters. The required parameters are listed and explained in detail. The optional parameters can be checked by pressing SHIFT + TAB, while the curser is in the function
Create new table
The create_table(...) function creates a new table in the current data pool. In order to execute this operation the data pool does not have an existing table with the same table name. Otherwise an error is raised. In case the target table already exists, use replace_table(), which is explained in the next few steps.
Important parameters:
- df_or_path: Either a pandas data frame or a path to a parquet file(s) that should be pushed to Celonis.
- table_name: The name that the table in the data pool should have.
- if_exists: One of 'error' or 'drop'. If you dont want to overwrite an existing table, leave at default: 'error'
data_pool.create_table(df_or_path=df, table_name="MY_NEW_TABLE")
2020-12-11 10:54:53 - pycelonis: Data push job started... 2020-12-11 10:55:09 - pycelonis: Data push job status: RUNNING...
{'id': '17ea88f7-6421-4e56-b1ba-25d58def5089', 'targetName': 'MY_NEW_TABLE', 'lastModified': 1607684093606, 'lastPing': None, 'status': 'DONE', 'type': None, 'fileType': None, 'targetSchema': 'ac350720-a9d7-4683-a3c8-d7fb98c7d8bb', 'upsertStrategy': 'UPSERT_WITH_UNCHANGED_METADATA', 'fallbackVarcharLength': None, 'dataPoolId': 'ac350720-a9d7-4683-a3c8-d7fb98c7d8bb', 'connectionId': None, 'keys': [], 'tableSchema': None, 'mirrorTargetNames': [], 'logs': ['2020-12-11T10:54:53 - Starting execution, status set to RUNNING', '2020-12-11T10:55:18 - Done loading chunks', '2020-12-11T10:55:18 - Status set to DONE', '2020-12-11T10:55:18 - Execution done'], 'csvParsingOptions': None, 'optionalTenantId': None}
The table is now in the data pool and can be added to any data model in that pool.
6. Adding a table from Pool to Data Model.¶
In order to add the table to the datamodel, you can use the function "add_table_from_pool", which can execute the following steps:
- Add table from Pool to Data Model
- Set an alias for the table name
- Connect the new table to an existing table of the Data Model (Optional)
- Reload the datamodel (Optional)
Required parameters:
- table_name: Name of the pool table to be added.
Optional parameters:
- alias: The name that the table has in the Data Model. If not provided alias will be equals to table name.
- added_table_join_type: str, optional Add table to be pushed as 'source' table or as 'target' table regarding the connection. The possible entry values are ['source', 'target'], by default None
- new_foreign_key_to_table: str, optional Set connection to or from this table, by default None
- foreign_key_columns: List of 2-dimensional tuples [(str, str), ..], optional Each 2D-tuple consists of a 'sourceColumnName' and 'targetColumnName' which represents the foreign_key e.g. foreign_key_columns=[('Col1', 'Col3'), ('Col2', 'Col2'), ..] between two tables, by default None
- connection: DataConnection or str, optional
The :class:
DataConnection
that the table is in, uses Global if not specified, by default None - reload: str, optional Reload can be one of "FROM_CACHE", "FORCE_COMPLETE", or "PARTIAL_ON_TABLE", by default None
Option 1
# OPTION 1:
data_model.add_table_from_pool(table_name="MY_NEW_TABLE",
alias= "MY_NEW_TABLE",
added_table_join_type="source",
new_foreign_key_to_table="CASE_TABLE",
foreign_key_columns=[("C_KEY","KEY")],
reload="PARTIAL_ON_TABLE")
2020-12-11 10:56:14 - pycelonis: Partial Data Model reload started...
[{'id': '7923f271-8340-4ff5-9237-7479dd7353f8', 'dataModelId': 'b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864', 'dataSourceId': None, 'name': 'MY_NEW_TABLE', 'alias': 'MY_NEW_TABLE', 'columns': [], 'useDirectStorage': False, 'aliasOrName': 'MY_NEW_TABLE'}]
Option 2
# OPTION 2: Same result as OPTION 1, but broken down in multiple steps
# only add table
data_model.add_table_from_pool(table_name="MY_NEW_TABLE",
alias= "MY_NEW_TABLE")
# create foreign key
data_model.create_foreign_key(source_table="MY_NEW_TABLE",target_table="CASE_TABLE",columns=[("C_KEY","KEY")])
# reload datamodel
data_model.reload(tables='MY_TABLE')