Data Model: Add new data¶

In [1]:

            
                Copied!
                
#To get a copy of this notebook in your current working dirtectory, run:
from pycelonis.notebooks import api_tutorial
#To get a copy of this notebook in your current working dirtectory, run:
from pycelonis.notebooks import api_tutorial

1. Connect to Celonis¶

In [2]:

            
                Copied!
                
from pycelonis import get_celonis
celonis = get_celonis()
from pycelonis import get_celonis
celonis = get_celonis()

2. Find Data Model¶

We can now add this table to a Data Model and make it available to the Celonis Analyses, Stduio etc.

In [3]:

            
                Copied!
                
celonis.datamodels
celonis.datamodels

Out[3]:

[<Datamodel, id bac5d83f-5849-40a6-9bc8-c090219f07ba, name TEST_DATAMODELzzz_pycelonis_test5575704>,
<Datamodel, id 475e45fc-8e18-4dfc-be5e-8d8d3cd3bda1, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id ce8f08c9-457e-451f-aaae-b12ecf062b26, name TEST_DATAMODELzzz_pycelonis_test3086618>,
<Datamodel, id f404c22d-885d-4a7d-b728-0961e229c64a, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id 30c9a417-b4c2-40e6-b1c5-9145253b5269, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id 77a336d8-a1b8-4a8a-9863-2a4004def82b, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id a40c7f9d-6893-46ed-9dbb-6a356e621444, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id 17dcad74-3fab-4e18-ae21-0efd186ea9b6, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id 47a1929d-6c0e-47bd-bd70-fa201b2775f8, name SAP ECC - Accounts Payable Data Model>,
<Datamodel, id b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864, name SAP ECC - Accounts Payable Data Model>,]

In [4]:

            
                Copied!
                
data_model = celonis.datamodels.find("b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864")
data_model
data_model = celonis.datamodels.find("b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864")
data_model

Out[4]:

<Datamodel, id b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864, name SAP ECC - Accounts Payable Data Model>

In [5]:

            
                Copied!
                
data_model.tables
data_model.tables

Out[5]:

[<DatamodelTable, id 43fe10d1-1de4-417f-8f67-6720c66bbd4d, name ACTIVITY_TABLE>,
<DatamodelTable, id 552c3a26-7370-4a21-81fb-0417c53cadcd, name CASE_TABLE>,]

3. Connect to Data Pool¶

To add a new table to the Data Model you first need to create this table in the respective Data Pool. The pool can be accessed as seen below.

In [6]:

            
                Copied!
                
data_pool = data_model.pool
data_pool
data_pool = data_model.pool
data_pool

Out[6]:

<Pool, id ac350720-a9d7-4683-a3c8-d7fb98c7d8bb, name zzz_pycelonis_test9300460>

4. Create a pandas dataframe¶

In [7]:

            
                Copied!
                
import pandas as pd

df = pd.DataFrame({'A': [2, 4, 8, 0], 'B': [2, 0, 0, 0], 'C_KEY': ["A", "B", "C", "D"]})
df.head()
import pandas as pd

df = pd.DataFrame({'A': [2, 4, 8, 0], 'B': [2, 0, 0, 0], 'C_KEY': ["A", "B", "C", "D"]})
df.head()

Out[7]:

	A	B	C_KEY
0	2	2	A
1	4	0	B
2	8	0	C
3	0	0	D

5. Creat Table in Data Pool¶

Pushing and editing data in the current data pool

The following functions have required and optional parameters. The required parameters are listed and explained in detail. The optional parameters can be checked by pressing SHIFT + TAB, while the curser is in the function

Create new table

The create_table(...) function creates a new table in the current data pool. In order to execute this operation the data pool does not have an existing table with the same table name. Otherwise an error is raised. In case the target table already exists, use replace_table(), which is explained in the next few steps.

Important parameters:

df_or_path: Either a pandas data frame or a path to a parquet file(s) that should be pushed to Celonis.
table_name: The name that the table in the data pool should have.
if_exists: One of 'error' or 'drop'. If you dont want to overwrite an existing table, leave at default: 'error'

In [8]:

            
                Copied!
                
data_pool.create_table(df_or_path=df, table_name="MY_NEW_TABLE")
data_pool.create_table(df_or_path=df, table_name="MY_NEW_TABLE")

2020-12-11 10:54:53 - pycelonis: Data push job started...
2020-12-11 10:55:09 - pycelonis: Data push job status: RUNNING...

Out[8]:

{'id': '17ea88f7-6421-4e56-b1ba-25d58def5089',
 'targetName': 'MY_NEW_TABLE',
 'lastModified': 1607684093606,
 'lastPing': None,
 'status': 'DONE',
 'type': None,
 'fileType': None,
 'targetSchema': 'ac350720-a9d7-4683-a3c8-d7fb98c7d8bb',
 'upsertStrategy': 'UPSERT_WITH_UNCHANGED_METADATA',
 'fallbackVarcharLength': None,
 'dataPoolId': 'ac350720-a9d7-4683-a3c8-d7fb98c7d8bb',
 'connectionId': None,
 'keys': [],
 'tableSchema': None,
 'mirrorTargetNames': [],
 'logs': ['2020-12-11T10:54:53 - Starting execution, status set to RUNNING',
  '2020-12-11T10:55:18 - Done loading chunks',
  '2020-12-11T10:55:18 - Status set to DONE',
  '2020-12-11T10:55:18 - Execution done'],
 'csvParsingOptions': None,
 'optionalTenantId': None}

The table is now in the data pool and can be added to any data model in that pool.

6. Adding a table from Pool to Data Model.¶

In order to add the table to the datamodel, you can use the function "add_table_from_pool", which can execute the following steps:

Add table from Pool to Data Model
Set an alias for the table name
Connect the new table to an existing table of the Data Model (Optional)
Reload the datamodel (Optional)

Required parameters:

table_name: Name of the pool table to be added.

Optional parameters:

alias: The name that the table has in the Data Model. If not provided alias will be equals to table name.
added_table_join_type: str, optional Add table to be pushed as 'source' table or as 'target' table regarding the connection. The possible entry values are ['source', 'target'], by default None
new_foreign_key_to_table: str, optional Set connection to or from this table, by default None
foreign_key_columns: List of 2-dimensional tuples [(str, str), ..], optional Each 2D-tuple consists of a 'sourceColumnName' and 'targetColumnName' which represents the foreign_key e.g. foreign_key_columns=[('Col1', 'Col3'), ('Col2', 'Col2'), ..] between two tables, by default None
connection: DataConnection or str, optional The :class:DataConnection that the table is in, uses Global if not specified, by default None
reload: str, optional Reload can be one of "FROM_CACHE", "FORCE_COMPLETE", or "PARTIAL_ON_TABLE", by default None

Option 1

In [9]:

            
                Copied!
                
                    
                    
                
                

        
# OPTION 1:
data_model.add_table_from_pool(table_name="MY_NEW_TABLE",
                               alias= "MY_NEW_TABLE",
                               added_table_join_type="source",
                               new_foreign_key_to_table="CASE_TABLE",
                               foreign_key_columns=[("C_KEY","KEY")],
                               reload="PARTIAL_ON_TABLE")
# OPTION 1:
data_model.add_table_from_pool(table_name="MY_NEW_TABLE",
                               alias= "MY_NEW_TABLE",
                               added_table_join_type="source",
                               new_foreign_key_to_table="CASE_TABLE",
                               foreign_key_columns=[("C_KEY","KEY")],
                               reload="PARTIAL_ON_TABLE")

2020-12-11 10:56:14 - pycelonis: Partial Data Model reload started...

Out[9]:

[{'id': '7923f271-8340-4ff5-9237-7479dd7353f8',
  'dataModelId': 'b5e4f8c2-71fd-4ffa-9b1f-d6efa4adf864',
  'dataSourceId': None,
  'name': 'MY_NEW_TABLE',
  'alias': 'MY_NEW_TABLE',
  'columns': [],
  'useDirectStorage': False,
  'aliasOrName': 'MY_NEW_TABLE'}]

Option 2

In [10]:

            
                Copied!
                
# OPTION 2: Same result as OPTION 1, but broken down in multiple steps

# only add table
data_model.add_table_from_pool(table_name="MY_NEW_TABLE",
                               alias= "MY_NEW_TABLE")

# create foreign key
data_model.create_foreign_key(source_table="MY_NEW_TABLE",target_table="CASE_TABLE",columns=[("C_KEY","KEY")])

# reload datamodel
data_model.reload(tables='MY_TABLE')
# OPTION 2: Same result as OPTION 1, but broken down in multiple steps

# only add table
data_model.add_table_from_pool(table_name="MY_NEW_TABLE",
                               alias= "MY_NEW_TABLE")

# create foreign key
data_model.create_foreign_key(source_table="MY_NEW_TABLE",target_table="CASE_TABLE",columns=[("C_KEY","KEY")])

# reload datamodel
data_model.reload(tables='MY_TABLE')