Data Upload¶

In this tutorial, you will learn how to upload existing data from your local Python project into the EMS. More specifically, you will learn:

How to create a new table in the EMS with your data
How to append and upsert your data to existing tables in the EMS
How to add a table from your data pool to your data model
How to reload a data model

Prerequisites¶

To follow this tutorial, you should have already created a data pool and a data model inside your pool. If you haven't done this yet, please complete the Data Integration - Introduction tutorial first.

Tutorial¶

1. Import PyCelonis and connect to Celonis API¶

In [3]:

Copied!

from pycelonis import get_celonis
celonis = get_celonis(permissions=False)
from pycelonis import get_celonis
celonis = get_celonis(permissions=False)

[2024-11-12 15:04:52,659] INFO: No `base_url` given. Using environment variable 'CELONIS_URL'

[2024-11-12 15:04:52,660] INFO: No `api_token` given. Using environment variable 'CELONIS_API_TOKEN'

[2024-11-12 15:04:52,661] INFO: No `key_type` given. Using environment variable 'CELONIS_KEY_TYPE'

[2024-11-12 15:04:52,700] INFO: Initial connect successful! PyCelonis Version: 2.11.1

2. Select data pool to upload data into¶

Before we can upload data into the EMS, we first have to select a data pool, into which the data should be uploaded. Here, we use the data pool, which we created in the Data Integration - Introduction tutorial:

In [4]:

Copied!

data_pool = celonis.data_integration.get_data_pools().find("PyCelonis Tutorial Data Pool")
data_pool
data_pool = celonis.data_integration.get_data_pools().find("PyCelonis Tutorial Data Pool")
data_pool

Out[4]:

DataPool(id='be065bab-bc94-4f2e-81d3-f4df1e126e2a', name='PyCelonis Tutorial Data Pool')

With the get_tables() method, we can verify that we currently don't have any tables in the data pool:

In [5]:

Copied!

data_pool.get_tables()
data_pool.get_tables()

Out[5]:

[]

3. Upload data into the EMS¶

Data can be uploaded into Celonis in two formats: as Pandas dataframes or as Parquet files. In this tutorial, we will focus on pushing data as Pandas dataframes. If you want to push data as Parquet files, please refer to the Data Push & Export Advanced tutorial.

In this tutorial, we will use a sample dataset for the SAP Purchase-to-Pay (P2P) process, which depicts the process of procuring materials from vendors. Below, is an overview of the most important tables for this process:

Table	Description
_CEL_P2P_ACTIVITIES_EN	Activity Table
EKPO	Purchasing Document Item (i.e. Case Table)
EKKO	Purchasing Document Header
LFA1	Vendor Master Data

Let's start by importing the tables of this dataset as Pandas dataframes:

In [6]:

Copied!





import pandas as pd
activity_df = pd.read_parquet("../../../assets/_CEL_P2P_ACTIVITIES_EN.parquet", engine="pyarrow")
print(activity_df.shape)
activity_df.head()
import pandas as pd
activity_df = pd.read_parquet("../../../assets/_CEL_P2P_ACTIVITIES_EN.parquet", engine="pyarrow")
print(activity_df.shape)
activity_df.head()

(60, 13)

Out[6]:

	_CASE_KEY	ACTIVITY_EN	ACTIVITY_DE	EVENTTIME	_SORTING	USER_TYPE	CHANGED_TABLE	CHANGED_FIELD	CHANGED_FROM	CHANGED_TO	CHANGED_FROM_FLOAT	CHANGED_TO_FLOAT	CHANGE_NUMBER
0	800000000006800001	Create Purchase Requisition Item	Lege BANF Position an	2008-12-31 07:44:05	0.0	B	None	None	None	None	NaN	NaN	None
1	800000000006800001	Create Purchase Order Item	Lege Bestellposition an	2009-01-02 07:44:05	10.0	B	None	None	None	None	NaN	NaN	None
2	800000000006800001	Print and Send Purchase Order	Sende Bestellung	2009-01-05 07:44:05	NaN	B	None	None	None	None	NaN	NaN	None
3	800000000006800001	Receive Goods	Wareneingang	2009-01-12 07:44:05	30.0	A	None	None	None	None	NaN	NaN	None
4	800000000006800001	Scan Invoice	Scanne Rechnung	2009-01-20 07:44:05	NaN	A	None	None	None	None	NaN	NaN	None

In [7]:

Copied!

item_df = pd.read_parquet("../../../assets/EKPO.parquet", engine="pyarrow")
print(item_df.shape)
item_df.head()
item_df = pd.read_parquet("../../../assets/EKPO.parquet", engine="pyarrow")
print(item_df.shape)
item_df.head()

(10, 34)

Out[7]:

	_CASE_KEY	MANDT	LOEKZ	STATU	AEDAT	MATNR	BUKRS	WERKS	LGORT	MATKL	...	AUDAT	Material Text (MAKT_MAKTX)	Company Code Text (EKPO_BUKRS)	Plant Text (EKPO_WERKS)	Stor Location Text (EKPO_LGORT)	EBELN	EBELP	Item Category Text(EKPO_PSTYP)	Material Group Text (MATKL_TEXT)	Net Value(NETWR_EUR)
0	800000000006800001	800	None	None	2009-01-02	WL-1000	3000	3200	0001	001	...	NaT	Shafting assembly	IDES US INC	Atlanta	Warehouse 0001	0000000068	00001	Standard	Metal processing	3.48000
1	800000000006800002	800	None	None	2009-01-02	None	3000	3200	0001	00202	...	NaT		IDES US INC	Atlanta	Warehouse 0001	0000000068	00002	Standard	Motherboards	3.46260
2	800000000006800003	800	None	None	2009-01-02	DG-1000	3000	3200	0001	001	...	NaT	Rubber Seal	IDES US INC	Atlanta	Warehouse 0001	0000000068	00003	Standard	Metal processing	0.29000
3	800000000006800004	800	None	None	2009-01-02	I-1100	3000	3200	0001	007	...	NaT	Pump Installation	IDES US INC	Atlanta	Warehouse 0001	0000000068	00004	Standard	Services	2.90000
4	800000000006800005	800	None	None	2009-01-02	None	3000	3200	0001	001	...	NaT		IDES US INC	Atlanta	Warehouse 0001	0000000068	00005	Standard	Metal processing	0.27608

5 rows × 34 columns

In [8]:

Copied!

header_df = pd.read_parquet("../../../assets/EKKO.parquet", engine="pyarrow")
print(header_df.shape)
header_df.head()
header_df = pd.read_parquet("../../../assets/EKKO.parquet", engine="pyarrow")
print(header_df.shape)
header_df.head()

(2, 33)

Out[8]:

	MANDT	BUKRS	BSTYP	BSART	LOEKZ	STATU	AEDAT	ERNAM	LIFNR	ZTERM	...	FRGZU	Document Category Text (EKKO_BSTYP)	RFQ status Text(EKKO_STATU)	Document Type Text (EKKO_BSART)	Purchasing Organization Text (EKKO_EKORG)	Company Name (EKKO_BUTXT)	Country Key (EKKO_LAND1)	Currency Key (EKKO_WAERS)	EBELN	Company Code Text (EKKO_BUKRS)
14312	800	3000	F	EC	None	I	2009-01-02	PURCHMANAGER	0000003701	NT30	...	None	Purchase order	None	Electronic commerce	IDES Deutschland	IDES US INC	US	USD	0000000068	IDES US INC
14338	800	3000	F	EC	None	I	2009-02-03	MILLERJ	0000003701	NT30	...	None	Purchase order	None	Electronic commerce	IDES Deutschland	IDES US INC	US	USD	0000000069	IDES US INC

2 rows × 33 columns

In [9]:

Copied!

master_df = pd.read_parquet("../../../assets/LFA1.parquet", engine="pyarrow")
print(master_df.shape)
master_df.head()
master_df = pd.read_parquet("../../../assets/LFA1.parquet", engine="pyarrow")
print(master_df.shape)
master_df.head()

(1, 22)

Out[9]:

	MANDT	LIFNR	LAND1	NAME1	ORT01	PFACH	PSTL2	PSTLZ	REGIO	SORTL	...	BEGRU	ERDAT	ERNAM	SPRAS	TELBX	TELF1	TELF2	TELFX	TELTX	TELX1
263	800	0000003701	US	eSupplier, Inc	WILMINGTON	None	None	19801	DE	EBP	...	None	2001-12-07	STANKOVICH	E	None	302-656-0196	None	302-656-0001	None	None

1 rows × 22 columns

After having the dataframes in place, we can upload them into the EMS by either creating new tables in the data pool or by appending/upserting the data into existing tables of the data pool.

3.1 Create new table in the EMS¶

New tables can be created in the EMS with the create_table() method. The method takes as input arguments:

Name	Type	Description	Default
`df`	`DataFrame`	A pandas dataframe containing the data	Required
`table_name`	`str`	Name that the table in the data pool should have	Required
`drop_if_exists`	`bool`	Specifies how to handle situations when a table with the same name already exists in the data pool (True = replace existing table, False = raise error and keep existing table)	`False`

Further, we can give a column_config as input argument, which specifies the names and column types of our table. Specifying a custom column_config is especially important if we use tables with longer text values, as by default, strings are cut off after 80 characters during the data upload. For a guide on how to specify a custom column_config, refer to the Data Push & Export Advanced tutorial.

In [10]:

Copied!





data_pool.create_table(df=activity_df, table_name="ACTIVITIES", drop_if_exists=False)
data_pool.create_table(df=item_df, table_name="EKPO", drop_if_exists=False)
data_pool.create_table(df=header_df, table_name="EKKO", drop_if_exists=False)
data_pool.create_table(df=master_df, table_name="LFA1", drop_if_exists=False)
data_pool.create_table(df=activity_df, table_name="ACTIVITIES", drop_if_exists=False)
data_pool.create_table(df=item_df, table_name="EKPO", drop_if_exists=False)
data_pool.create_table(df=header_df, table_name="EKKO", drop_if_exists=False)
data_pool.create_table(df=master_df, table_name="LFA1", drop_if_exists=False)

[2024-11-12 15:04:52,865] WARNING: STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

[2024-11-12 15:04:52,869] INFO: Successfully created data push job with id '01bc64bc-ea84-4725-9cff-2970eb259df9'

[2024-11-12 15:04:52,870] INFO: Add data frame as file chunks to data push job with id '01bc64bc-ea84-4725-9cff-2970eb259df9'

[2024-11-12 15:04:52,880] INFO: Successfully upserted file chunk to data push job with id '01bc64bc-ea84-4725-9cff-2970eb259df9'

[2024-11-12 15:04:52,886] INFO: Successfully triggered execution for data push job with id '01bc64bc-ea84-4725-9cff-2970eb259df9'

[2024-11-12 15:04:52,887] INFO: Wait for execution of data push job with id '01bc64bc-ea84-4725-9cff-2970eb259df9'

[2024-11-12 15:04:52,898] INFO: Successfully created table 'ACTIVITIES' in data pool

[2024-11-12 15:04:52,901] INFO: Successfully deleted data push job with id '01bc64bc-ea84-4725-9cff-2970eb259df9'

[2024-11-12 15:04:52,907] WARNING: STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

[2024-11-12 15:04:52,911] INFO: Successfully created data push job with id 'd631ebc5-27b6-4284-af96-de40d52bb400'

[2024-11-12 15:04:52,912] INFO: Add data frame as file chunks to data push job with id 'd631ebc5-27b6-4284-af96-de40d52bb400'

[2024-11-12 15:04:52,927] INFO: Successfully upserted file chunk to data push job with id 'd631ebc5-27b6-4284-af96-de40d52bb400'

[2024-11-12 15:04:52,935] INFO: Successfully triggered execution for data push job with id 'd631ebc5-27b6-4284-af96-de40d52bb400'

[2024-11-12 15:04:52,936] INFO: Wait for execution of data push job with id 'd631ebc5-27b6-4284-af96-de40d52bb400'

[2024-11-12 15:04:52,956] INFO: Successfully created table 'EKPO' in data pool

[2024-11-12 15:04:52,960] INFO: Successfully deleted data push job with id 'd631ebc5-27b6-4284-af96-de40d52bb400'

[2024-11-12 15:04:52,970] WARNING: STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

[2024-11-12 15:04:52,975] INFO: Successfully created data push job with id '5df0be50-16ac-4ce1-92ed-6ac246949e18'

[2024-11-12 15:04:52,976] INFO: Add data frame as file chunks to data push job with id '5df0be50-16ac-4ce1-92ed-6ac246949e18'

[2024-11-12 15:04:52,993] INFO: Successfully upserted file chunk to data push job with id '5df0be50-16ac-4ce1-92ed-6ac246949e18'

[2024-11-12 15:04:53,003] INFO: Successfully triggered execution for data push job with id '5df0be50-16ac-4ce1-92ed-6ac246949e18'

[2024-11-12 15:04:53,004] INFO: Wait for execution of data push job with id '5df0be50-16ac-4ce1-92ed-6ac246949e18'

[2024-11-12 15:04:53,030] INFO: Successfully created table 'EKKO' in data pool

[2024-11-12 15:04:53,036] INFO: Successfully deleted data push job with id '5df0be50-16ac-4ce1-92ed-6ac246949e18'

[2024-11-12 15:04:53,047] WARNING: STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

[2024-11-12 15:04:53,053] INFO: Successfully created data push job with id 'dfc63881-116b-4986-aeb4-ba8ea48e6baa'

[2024-11-12 15:04:53,054] INFO: Add data frame as file chunks to data push job with id 'dfc63881-116b-4986-aeb4-ba8ea48e6baa'

[2024-11-12 15:04:53,069] INFO: Successfully upserted file chunk to data push job with id 'dfc63881-116b-4986-aeb4-ba8ea48e6baa'

[2024-11-12 15:04:53,081] INFO: Successfully triggered execution for data push job with id 'dfc63881-116b-4986-aeb4-ba8ea48e6baa'

[2024-11-12 15:04:53,082] INFO: Wait for execution of data push job with id 'dfc63881-116b-4986-aeb4-ba8ea48e6baa'

[2024-11-12 15:04:53,113] INFO: Successfully created table 'LFA1' in data pool

[2024-11-12 15:04:53,120] INFO: Successfully deleted data push job with id 'dfc63881-116b-4986-aeb4-ba8ea48e6baa'

Out[10]:

DataPoolTable(name='LFA1', data_source_id=None, columns=[], schema_name='be065bab-bc94-4f2e-81d3-f4df1e126e2a', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a')

We can verify that the newly-created tables exist in our data pool by calling the get_tables() method:

In [11]:

Copied!

data_pool.get_tables()
data_pool.get_tables()

Out[11]:

[
	DataPoolTable(name='ACTIVITIES', data_source_id=None, columns=[], schema_name='be065bab-bc94-4f2e-81d3-f4df1e126e2a', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a'),
	DataPoolTable(name='EKKO', data_source_id=None, columns=[], schema_name='be065bab-bc94-4f2e-81d3-f4df1e126e2a', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a'),
	DataPoolTable(name='EKPO', data_source_id=None, columns=[], schema_name='be065bab-bc94-4f2e-81d3-f4df1e126e2a', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a'),
	DataPoolTable(name='LFA1', data_source_id=None, columns=[], schema_name='be065bab-bc94-4f2e-81d3-f4df1e126e2a', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a')
]

3.2 Append data to existing table in the EMS¶

We can also choose to append our data to an already existing table in our data pool with the append() method. The method takes as input arguments:

Name	Type	Description	Default
`df`	`DataFrame`	A pandas dataframe containing the data	Required

Important:
The column types and names of our dataframe must be the same as in the target table in the data pool, otherwise the append operation will fail.

Let's create a new activity table ACTIVITIES_APPEND in our data pool, to which we want to append a new dataframe:

In [12]:

Copied!

data_pool_table = data_pool.create_table(df=activity_df, table_name="ACTIVITIES_APPEND", drop_if_exists=False)
data_pool_table = data_pool.create_table(df=activity_df, table_name="ACTIVITIES_APPEND", drop_if_exists=False)

[2024-11-12 15:04:53,157] WARNING: STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

[2024-11-12 15:04:53,165] INFO: Successfully created data push job with id '86c43a0c-2d77-40f8-88a1-b6b639a87a77'

[2024-11-12 15:04:53,166] INFO: Add data frame as file chunks to data push job with id '86c43a0c-2d77-40f8-88a1-b6b639a87a77'

[2024-11-12 15:04:53,178] INFO: Successfully upserted file chunk to data push job with id '86c43a0c-2d77-40f8-88a1-b6b639a87a77'

[2024-11-12 15:04:53,193] INFO: Successfully triggered execution for data push job with id '86c43a0c-2d77-40f8-88a1-b6b639a87a77'

[2024-11-12 15:04:53,193] INFO: Wait for execution of data push job with id '86c43a0c-2d77-40f8-88a1-b6b639a87a77'

[2024-11-12 15:04:53,222] INFO: Successfully created table 'ACTIVITIES_APPEND' in data pool

[2024-11-12 15:04:53,230] INFO: Successfully deleted data push job with id '86c43a0c-2d77-40f8-88a1-b6b639a87a77'

We can now append another dataframe to the already existing table by calling the append() method:

In [13]:

Copied!

data_pool_table.append(activity_df)
data_pool_table.append(activity_df)

[2024-11-12 15:04:53,245] WARNING: No column configuration set. String columns are cropped to 80 characters if not configured

[2024-11-12 15:04:53,254] INFO: Successfully created data push job with id '72ab0f31-7030-4ad7-a69e-cd78e9e59885'

[2024-11-12 15:04:53,255] INFO: Add data frame as file chunks to data push job with id '72ab0f31-7030-4ad7-a69e-cd78e9e59885'

[2024-11-12 15:04:53,268] INFO: Successfully upserted file chunk to data push job with id '72ab0f31-7030-4ad7-a69e-cd78e9e59885'

[2024-11-12 15:04:53,284] INFO: Successfully triggered execution for data push job with id '72ab0f31-7030-4ad7-a69e-cd78e9e59885'

[2024-11-12 15:04:53,285] INFO: Wait for execution of data push job with id '72ab0f31-7030-4ad7-a69e-cd78e9e59885'

[2024-11-12 15:04:53,326] INFO: Successfully deleted data push job with id '72ab0f31-7030-4ad7-a69e-cd78e9e59885'

[2024-11-12 15:04:53,327] INFO: Successfully appended rows to table 'ACTIVITIES_APPEND' in data pool

3.3 Upsert data to existing table in the EMS¶

Lastly, we can choose to upsert our data into an already existing data pool table with the upsert() method. Upsert works similar to the append operation (i.e. it adds rows from a dataframe into a table) but replaces rows if they already exist. For this, we have to specify in keys a list of column names according to which to check for equality. If two rows have the same values in all columns specified in keys, they are marked as duplicates and replaced.

The upsert() method takes the following input arguments:

Name	Type	Description	Default
`df`	`DataFrame`	A pandas dataframe containing the data	Required
`keys`	`List[str]`	List of column names according to which to check for equality	Required

Let's create a new activity table ACTIVITIES_UPSERT in our data pool, to which we want to upsert a new dataframe:

In [14]:

Copied!

data_pool_table = data_pool.create_table(df=activity_df, table_name="ACTIVITIES_UPSERT", drop_if_exists=False)
data_pool_table = data_pool.create_table(df=activity_df, table_name="ACTIVITIES_UPSERT", drop_if_exists=False)

[2024-11-12 15:04:53,341] WARNING: STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

[2024-11-12 15:04:53,351] INFO: Successfully created data push job with id 'fb606aac-ce9d-4694-807f-9f48a6f96841'

[2024-11-12 15:04:53,352] INFO: Add data frame as file chunks to data push job with id 'fb606aac-ce9d-4694-807f-9f48a6f96841'

[2024-11-12 15:04:53,366] INFO: Successfully upserted file chunk to data push job with id 'fb606aac-ce9d-4694-807f-9f48a6f96841'

[2024-11-12 15:04:53,385] INFO: Successfully triggered execution for data push job with id 'fb606aac-ce9d-4694-807f-9f48a6f96841'

[2024-11-12 15:04:53,385] INFO: Wait for execution of data push job with id 'fb606aac-ce9d-4694-807f-9f48a6f96841'

[2024-11-12 15:04:53,423] INFO: Successfully created table 'ACTIVITIES_UPSERT' in data pool

[2024-11-12 15:04:53,434] INFO: Successfully deleted data push job with id 'fb606aac-ce9d-4694-807f-9f48a6f96841'

We can now upsert another dataframe by calling the upsert() method. Here, we specify _CASE_KEY and ACTIVITY_EN as the columns according to which to check for equality:

In [15]:

Copied!

data_pool_table.upsert(activity_df, keys=["_CASE_KEY", "ACTIVITY_EN"])
data_pool_table.upsert(activity_df, keys=["_CASE_KEY", "ACTIVITY_EN"])

[2024-11-12 15:04:53,450] WARNING: No column configuration set. String columns are cropped to 80 characters if not configured

[2024-11-12 15:04:53,461] INFO: Successfully created data push job with id '8cf419e2-1b63-4c6b-bc51-1ebb0a8fd8bc'

[2024-11-12 15:04:53,462] INFO: Add data frame as file chunks to data push job with id '8cf419e2-1b63-4c6b-bc51-1ebb0a8fd8bc'

[2024-11-12 15:04:53,479] INFO: Successfully upserted file chunk to data push job with id '8cf419e2-1b63-4c6b-bc51-1ebb0a8fd8bc'

[2024-11-12 15:04:53,501] INFO: Successfully triggered execution for data push job with id '8cf419e2-1b63-4c6b-bc51-1ebb0a8fd8bc'

[2024-11-12 15:04:53,502] INFO: Wait for execution of data push job with id '8cf419e2-1b63-4c6b-bc51-1ebb0a8fd8bc'

[2024-11-12 15:04:53,575] INFO: Successfully deleted data push job with id '8cf419e2-1b63-4c6b-bc51-1ebb0a8fd8bc'

[2024-11-12 15:04:53,576] INFO: Successfully upserted rows to table 'ACTIVITIES_UPSERT' in data pool

4. Add table to a data model¶

After having uploaded our data into the data pool (either as a new table or by appending/upserting into an existing one), we can add the table into a data model.

For this, we navigate to the data model, which we created in the Data Integration - Introduction tutorial:

In [16]:

Copied!

data_model = data_pool.get_data_models().find("PyCelonis Tutorial Data Model")
data_model
data_model = data_pool.get_data_models().find("PyCelonis Tutorial Data Model")
data_model

Out[16]:

DataModel(id='68682a56-5bc4-4bfb-be4e-2e588335549c', name='PyCelonis Tutorial Data Model', pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a')

To add a table from the data pool into the data model, we have to call the add_table() method. The method takes as input arguments:

Name	Type	Description	Default
`name`	`str`	Name of the table inside our data pool	Required
`alias`	`str`	Alias for the table, i.e. how the name should be displayed inside our data model	`None`

In [17]:

Copied!





data_model.add_table(name="ACTIVITIES", alias="ACTIVITIES")
data_model.add_table(name="EKPO", alias="EKPO")
data_model.add_table(name="EKKO", alias="EKKO")
data_model.add_table(name="LFA1", alias="LFA1")
data_model.add_table(name="ACTIVITIES", alias="ACTIVITIES")
data_model.add_table(name="EKPO", alias="EKPO")
data_model.add_table(name="EKKO", alias="EKKO")
data_model.add_table(name="LFA1", alias="LFA1")

[2024-11-12 15:04:53,616] INFO: Successfully added data model table with id 'e86d10ab-6636-4e36-9695-0e29b4a4f5c3' to data model

[2024-11-12 15:04:53,629] INFO: Successfully added data model table with id '913addb7-aeda-4645-9508-75a183283095' to data model

[2024-11-12 15:04:53,642] INFO: Successfully added data model table with id '4a59374a-bdd6-4a08-a036-87429391cbbf' to data model

[2024-11-12 15:04:53,655] INFO: Successfully added data model table with id 'a29074bc-240a-4bc3-b050-0b35e7f0000f' to data model

Out[17]:

DataModelTable(id='a29074bc-240a-4bc3-b050-0b35e7f0000f', data_model_id='68682a56-5bc4-4bfb-be4e-2e588335549c', name='LFA1', alias='LFA1', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a')

The method will take the table from our data pool and create a reference to it inside our data model, including all properties, such as column names and data types. To verify that the tables exist in our data model, we can call the get_tables() method:

In [18]:

Copied!

data_model.get_tables()
data_model.get_tables()

Out[18]:

[
	DataModelTable(id='4a59374a-bdd6-4a08-a036-87429391cbbf', data_model_id='68682a56-5bc4-4bfb-be4e-2e588335549c', name='EKKO', alias='EKKO', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a'),
	DataModelTable(id='e86d10ab-6636-4e36-9695-0e29b4a4f5c3', data_model_id='68682a56-5bc4-4bfb-be4e-2e588335549c', name='ACTIVITIES', alias='ACTIVITIES', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a'),
	DataModelTable(id='913addb7-aeda-4645-9508-75a183283095', data_model_id='68682a56-5bc4-4bfb-be4e-2e588335549c', name='EKPO', alias='EKPO', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a'),
	DataModelTable(id='a29074bc-240a-4bc3-b050-0b35e7f0000f', data_model_id='68682a56-5bc4-4bfb-be4e-2e588335549c', name='LFA1', alias='LFA1', data_pool_id='be065bab-bc94-4f2e-81d3-f4df1e126e2a')
]

5. Reload data model¶

Note that the add_table() method only creates a reference of the data pool table inside our data model but does not reload the data model. For that, we have to call the reload() method:

In [19]:

Copied!

data_model.reload()
data_model.reload()

[2024-11-12 15:04:53,711] INFO: Successfully triggered data model reload for data model with id '68682a56-5bc4-4bfb-be4e-2e588335549c'

[2024-11-12 15:04:53,712] INFO: Wait for execution of data model reload for data model with id '68682a56-5bc4-4bfb-be4e-2e588335549c'

This method will load the data for all tables inside our data model. However, if we only want to load the data for selected tables, we can also perform a partial_reload() and specify inside data_model_table_ids the table IDs, for which we want to load the data:

In [20]:

Copied!





tables = data_model.get_tables()
ekko = tables.find("EKKO")
ekpo = tables.find("EKPO")
data_model.partial_reload(data_model_table_ids=[ekko.id, ekpo.id])
tables = data_model.get_tables()
ekko = tables.find("EKKO")
ekpo = tables.find("EKPO")
data_model.partial_reload(data_model_table_ids=[ekko.id, ekpo.id])

[2024-11-12 15:04:53,857] INFO: Successfully triggered data model reload for data model with id '68682a56-5bc4-4bfb-be4e-2e588335549c'

[2024-11-12 15:04:53,858] INFO: Wait for execution of data model reload for data model with id '68682a56-5bc4-4bfb-be4e-2e588335549c'

Conclusion¶

Congratulations! You have learned how to upload data from your local Python project as tables inside your data pool, how to add those tables into a data model, and how to reload a data model in order to populate the model tables with data from the pool tables. In the next tutorial Data Export, you will learn how to export data from the Celonis EMS into your local Python project.