Migration from PyCelonis 1.X to PyCelonis 2.X¶

In this tutorial, you will learn how to migrate your existing script based on PyCelonis 1.X to PyCelonis 2.X. In PyCelonis 2.X we set out to improve a lot of things. Unfortunately, this was only possible through breaking changes. Therefore, we decided to completely rewrite Pycelonis. Below are some of the major improvements compared to PyCelonis 1.X:

New interfaces aligned with the EMS navigation: apps, data integration, studio...
Explicit update and sync operations to improve performance
Improved semantics for data upload
Additional features for PQL handling
Streamlined data export through data model
Clearer separation between Studio and Apps
Removal of redundant APIs
Consistent interface for classes and methods

Tutorial¶

1. Improved interface¶

For better alignment with the EMS we updated the whole interface of PyCelonis. In PyCelonis 1.X most objects where accessible through the celonis object:

# PyCelonis 1.X
from pycelonis import get_celonis

celonis = get_celonis(permissions=False)

data_pools = celonis.pools
spaces = celonis.spaces
datamodels = celonis.datamodels
...

In PyCelonis 2.X these can be accessed through their respective sub module (similar to how it's done in the UI). Available sub modules are:

data_integration
studio
apps
team

Instead of accessing resources through a property (e.g. celonis.pools), a method has to be called (e.g. celonis.data_integration.get_data_pools()):

In [2]:

Copied!

from pycelonis import get_celonis

celonis = get_celonis(permissions=False)

data_pools = celonis.data_integration.get_data_pools()
spaces = celonis.studio.get_spaces()
from pycelonis import get_celonis

celonis = get_celonis(permissions=False)

data_pools = celonis.data_integration.get_data_pools()
spaces = celonis.studio.get_spaces()

To avoid performance issues, accessing objects within other resources (e.g. data models) has to be done through their parent object in PyCelonis 2.X:

In [3]:

Copied!





data_pool = celonis.data_integration.get_data_pools().find("PyCelonis Tutorial Data Pool")
data_pool = celonis.data_integration.get_data_pool(data_pool.id)
data_models = data_pool.get_data_models()

space = celonis.studio.get_spaces().find("PyCelonis Tutorial Space")
space = celonis.studio.get_space(space.id)
packages = space.get_packages()
data_pool = celonis.data_integration.get_data_pools().find("PyCelonis Tutorial Data Pool")
data_pool = celonis.data_integration.get_data_pool(data_pool.id)
data_models = data_pool.get_data_models()

space = celonis.studio.get_spaces().find("PyCelonis Tutorial Space")
space = celonis.studio.get_space(space.id)
packages = space.get_packages()

We have kept the find method as a convenient way to find specific objects based on their name.

# PyCelonis 1.X
data_pool = data_pools.find("PyCelonis Tutorial Data Pool")
data_pool = data_pools.find("341b450c-7f92-26e6-b79c-2d38a1a87d38")

In PyCelonis 2.X the find method by default only searches for a given name and not for the id. If you intend to retrieve an object by its id use the get_<object> function instead to improve performance:

In [4]:

Copied!

data_pool = data_pools.find("PyCelonis Tutorial Data Pool")
data_pool = celonis.data_integration.get_data_pool(data_pool.id)
data_pool = data_pools.find("PyCelonis Tutorial Data Pool")
data_pool = celonis.data_integration.get_data_pool(data_pool.id)

2. Working with EMS resource objects¶

In PyCelonis 1.X, resources were always in sync with the EMS and whenever a property was accessed an API call was made in the background to fetch the latest state:

# PyCelonis 1.X
print(data_pool.name) # Triggers API call to fetch latest data pool name

Since this behaviour triggers a lot of API calls it can have a detrimental impact on performance and also might cause stability issues. Therefore, in PyCelonis 2.X resource objects are not automatically synced with the EMS anymore:

In [5]:

Copied!

print("Data pool name:", data_pool.name) # Does not trigger an API call and only reads the name from memory
print("Data pool name:", data_pool.name) # Does not trigger an API call and only reads the name from memory

Data pool name: PyCelonis Tutorial Data Pool

If you want to fetch the latest updates from the EMS you can explicitly call the sync method to get the latest state for any resource:

In [6]:

Copied!

data_pool.sync() # Triggers API call to fetch latest data pool data
print("Data pool name:", data_pool.name)
data_pool.sync() # Triggers API call to fetch latest data pool data
print("Data pool name:", data_pool.name)

Data pool name: PyCelonis Tutorial Data Pool

Setting a property in PyCelonis 1.X automatically triggered an update in the EMS:

# PyCelonis 1.X
data_pool.name = "NEW_NAME" # Triggers API call to update data pool name

For PyCelonis 2.X we introduced the update method to explicitly push the current state of a resource to the EMS to avoid accidental updates and allow bulk updates for multiple properties:

In [7]:

Copied!

data_pool.name = "PyCelonis Tutorial Pool" # Only updates the name of the data pool locally
data_pool.update() # Pushes the current state of the data pool to the EMS to update the name
data_pool.name = "PyCelonis Tutorial Pool" # Only updates the name of the data pool locally
data_pool.update() # Pushes the current state of the data pool to the EMS to update the name

Lastly, accessing specific properties of a resource for PyCelonis 1.X required using the data dictionary:

# PyCelonis 1.X
print(data_pool.data["timeStamp"])

For PyCelonis 2.X, we introduced PyDantic data classes for resource objects and made all properties directly accessible. With this change we now fully support type hinting. This results in a much better development experience as we fully support code completion (via key):

In [8]:

Copied!

print(data_pool.time_stamp)
print(data_pool.time_stamp)

2024-09-16 12:59:21.277000+00:00

Also, it's possible to get a dictionary or json representation of any resource if needed:

In [9]:

Copied!

data_pool.dict()
data_pool.dict()

Out[9]:

{'permissions': ['EDIT_DATA_POOL_RESTRICTED',
  'USE_ALL_DATA_MODELS',
  'DATA_PUSH_API',
  'ADMIN',
  'VIEW_DATA_POOL',
  'CONTINUOUS_DATA_PUSH_API',
  'CREATE_DATA_POOL',
  'EDIT_ALL_DATA_POOLS'],
 'id': 'be065bab-bc94-4f2e-81d3-f4df1e126e2a',
 'name': 'PyCelonis Tutorial Pool',
 'description': None,
 'time_stamp': datetime.datetime(2024, 9, 16, 12, 59, 21, 277000, tzinfo=datetime.timezone.utc),
 'configuration_status': 'CONFIGURED',
 'locked': False,
 'content_id': None,
 'content_version': 0,
 'tags': [],
 'original_id': None,
 'monitoring_target': False,
 'custom_monitoring_target': False,
 'custom_monitoring_target_active': False,
 'exported': False,
 'monitoring_message_columns_migrated': False,
 'creator_user_id': 'ba629456-ff60-4acb-a8c9-b92703954e7e',
 'object_id': 'be065bab-bc94-4f2e-81d3-f4df1e126e2a'}

In [10]:

Copied!

data_pool.json()
data_pool.json()

Out[10]:

'{"permissions": ["EDIT_DATA_POOL_RESTRICTED", "USE_ALL_DATA_MODELS", "DATA_PUSH_API", "ADMIN", "VIEW_DATA_POOL", "CONTINUOUS_DATA_PUSH_API", "CREATE_DATA_POOL", "EDIT_ALL_DATA_POOLS"], "id": "be065bab-bc94-4f2e-81d3-f4df1e126e2a", "name": "PyCelonis Tutorial Pool", "description": null, "time_stamp": 1726491561277, "configuration_status": "CONFIGURED", "locked": false, "content_id": null, "content_version": 0, "tags": [], "original_id": null, "monitoring_target": false, "custom_monitoring_target": false, "custom_monitoring_target_active": false, "exported": false, "monitoring_message_columns_migrated": false, "creator_user_id": "ba629456-ff60-4acb-a8c9-b92703954e7e", "object_id": "be065bab-bc94-4f2e-81d3-f4df1e126e2a"}'

3. Uploading data into a data pool¶

In PyCelonis 1.X you can create, append, and upsert data to a table by using a data pool:

# PyCelonis 1.X
data_pool.create_table(df, "TABLE_NAME")
data_pool.append_table(df, "TABLE_NAME")
data_pool.upsert_table(df, "TABLE_NAME", keys=["ID"])

We considered this a bad interface because it allowed you to try uploading data to a table that didn't exists. Now, PyCelonis 2.X offers the same functionality with the difference that append and upsert must be called directly on the table object :

In [11]:

Copied!

import pandas as pd

df = pd.DataFrame({"ID": [1,2,3,4]})

table = data_pool.create_table(df, "DATA_PUSH_TEST")
table.append(df)
table.upsert(df, keys=["ID"])
import pandas as pd

df = pd.DataFrame({"ID": [1,2,3,4]})

table = data_pool.create_table(df, "DATA_PUSH_TEST")
table.append(df)
table.upsert(df, keys=["ID"])

STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.

No column configuration set. String columns are cropped to 80 characters if not configured

No column configuration set. String columns are cropped to 80 characters if not configured

It is also possible to create a table based on a parquet file in PyCelonis 2.X by manually creating and executing a data push job:

In [12]:

Copied!





from pycelonis.ems import JobType

data_push_job = data_pool.create_data_push_job(
    target_name="DATA_PUSH_TEST_JOB",
    type_=JobType.REPLACE,
)

with open("../../../assets/_CEL_P2P_ACTIVITIES_EN.parquet", "rb") as file:
    data_push_job.add_file_chunk(file)
    
data_push_job.execute(wait=True)
from pycelonis.ems import JobType

data_push_job = data_pool.create_data_push_job(
    target_name="DATA_PUSH_TEST_JOB",
    type_=JobType.REPLACE,
)

with open("../../../assets/_CEL_P2P_ACTIVITIES_EN.parquet", "rb") as file:
    data_push_job.add_file_chunk(file)
    
data_push_job.execute(wait=True)

4. PQL Handling¶

In PyCelonis 1.X you can query data using the PQL class:

# PyCelonis 1.X
from pycelonis.pql import PQL, PQLColumn, PQLFilter

q = PQL()
query += PQLColumn(name="ACTIVITY_EN", query=""" "ACTIVITIES"."ACTIVITY_EN" """)
query += PQLColumn(name="EVENTTIME", query=""" "ACTIVITIES"."EVENTTIME" """)
query += PQLFilter(""" FILTER "ACTIVITIES"."_CASE_KEY" = '800000000006800001'; """)

PyCelonis 2.X offers a similar interface but now additionally supports OrderByColumn and additional operations through SaolaPy. To migrate existing queries simply use the from_pql method:

In [13]:

Copied!

import pycelonis.pql as pql

data_model = data_pool.get_data_models().find("PyCelonis Tutorial Data Model")

query = pql.PQL(distinct=True)
query += pql.PQLColumn(name="ACTIVITY_EN", query=""" "ACTIVITIES"."ACTIVITY_EN" """)
query += pql.PQLColumn(name="EVENTTIME", query=""" "ACTIVITIES"."EVENTTIME" """)

query += pql.PQLFilter(query=""" FILTER "ACTIVITIES"."_CASE_KEY" = '800000000006800001'; """)
query += pql.OrderByColumn(query=""" "ACTIVITIES"."EVENTTIME" """)

df = pql.DataFrame.from_pql(query, data_model=data_model)
import pycelonis.pql as pql

data_model = data_pool.get_data_models().find("PyCelonis Tutorial Data Model")

query = pql.PQL(distinct=True)
query += pql.PQLColumn(name="ACTIVITY_EN", query=""" "ACTIVITIES"."ACTIVITY_EN" """)
query += pql.PQLColumn(name="EVENTTIME", query=""" "ACTIVITIES"."EVENTTIME" """)

query += pql.PQLFilter(query=""" FILTER "ACTIVITIES"."_CASE_KEY" = '800000000006800001'; """)
query += pql.OrderByColumn(query=""" "ACTIVITIES"."EVENTTIME" """)

df = pql.DataFrame.from_pql(query, data_model=data_model)

5. Exporting data from a data model¶

With PyCelonis 1.X you can export data both through an analysis as well as the data model:

# PyCelonis 1.X

# Export using data model
df = data_model.get_data_frame(q)

# Export using analysis
component = analysis.draft.components.find("Vendors")
df = component.get_data_frame()

This has changed in PyCelonis 2.X since the analysis export had several limitations. Therefore, all data exports must be triggered via the data model using SaolaPy:

In [14]:

Copied!





from pycelonis.pql.saola_connector import KnowledgeModelSaolaConnector

df = pql.DataFrame(
    {
        "ACTIVITIES": """ "ACTIVITIES"."_CASE_KEY" """,
    },
    data_model=data_model
)
df.head()
from pycelonis.pql.saola_connector import KnowledgeModelSaolaConnector

df = pql.DataFrame(
    {
        "ACTIVITIES": """ "ACTIVITIES"."_CASE_KEY" """,
    },
    data_model=data_model
)
df.head()

Out[14]:

	ACTIVITIES
Index
0	800000000006800001
1	800000000006800001
2	800000000006800001
3	800000000006800001
4	800000000006800001

In PyCelonis 2.X, it is now possible to get the queries of existing knowledge models and analyses and also query data using any KPI or variables from these components.

For knowledge models, you can get the query of knowledge model filters, record attributes and identifiers.

In this example, we get the PQLColumn of the record attribute with the get_column method.

In [16]:

Copied!

km_record = knowledge_model.get_content().records.find_by_id('ACTIVITIES')
km_attribute = km_record.attributes.find_by_id('ACTIVITY_EN')
km_record = knowledge_model.get_content().records.find_by_id('ACTIVITIES')
km_attribute = km_record.attributes.find_by_id('ACTIVITY_EN')

In [17]:

Copied!

attribute_query = km_attribute.get_column()
attribute_query
attribute_query = km_attribute.get_column()
attribute_query

Out[17]:

PQLColumn(name='ACTIVITY_EN', query='"ACTIVITIES"."ACTIVITY_EN"')

Instead of passing a data model to our data frame we have to specify a KnowledgeModelSaolaConnector which resolves knowledge model variables and KPIs:

In [18]:

Copied!





from pycelonis.pql.saola_connector import KnowledgeModelSaolaConnector

df = pql.DataFrame(
    {
        "ACTIVITY_EN": attribute_query,
    },
    saola_connector=KnowledgeModelSaolaConnector(data_model, knowledge_model)
)
df.head()
from pycelonis.pql.saola_connector import KnowledgeModelSaolaConnector

df = pql.DataFrame(
    {
        "ACTIVITY_EN": attribute_query,
    },
    saola_connector=KnowledgeModelSaolaConnector(data_model, knowledge_model)
)
df.head()

Out[18]:

	ACTIVITY_EN
Index
0	Create Purchase Requisition Item
1	Create Purchase Order Item
2	Print and Send Purchase Order
3	Receive Goods
4	Scan Invoice

Similar to knowledge models, we can query data using any KPI or variables from the analysis. To get the analysis component query, we first need to specify the analysis sheet and component.

In [20]:

Copied!

published_sheet = published_analysis.get_content().draft.document.sheets[0]
published_sheet = published_analysis.get_content().draft.document.sheets[0]

In this example, we find a OLAP table component on the analysis sheet.

In [21]:

Copied!

olap_table = published_sheet.components.find("#{OLAP Table}", search_attribute="title")
olap_table = published_sheet.components.find("#{OLAP Table}", search_attribute="title")

The get_query method returns the query of the OLAP table.

In [22]:

Copied!

olap_query = olap_table.get_query()
olap_query
olap_query = olap_table.get_query()
olap_query

Out[22]:

PQL(columns=[PQLColumn(name='ACTIVITY_EN', query='"ACTIVITIES"."ACTIVITY_EN"'), PQLColumn(name='Count Table', query='COUNT_TABLE("ACTIVITIES")')], filters=[PQLFilter(query='FILTER "ACTIVITIES"."ACTIVITY_EN" = \'ACTIVITY1\';\nFILTER "ACTIVITIES"."ACTIVITY_EN" != \'ACTIVITY2\';')], order_by_columns=[OrderByColumn(query='"ACTIVITIES"."ACTIVITY_EN"', ascending=True)], distinct=False, limit=None, offset=None)

We resolve all analysis variables and KPIs of this query using the AnalysisSaolaConnector and by creating a DataFrame from the full olap_query:

In [23]:

Copied!





from pycelonis.pql.saola_connector import AnalysisSaolaConnector

df = pql.DataFrame.from_pql(
    olap_query,
    saola_connector=AnalysisSaolaConnector(data_model, published_analysis)
)
df.head()
from pycelonis.pql.saola_connector import AnalysisSaolaConnector

df = pql.DataFrame.from_pql(
    olap_query,
    saola_connector=AnalysisSaolaConnector(data_model, published_analysis)
)
df.head()

Out[23]:

	ACTIVITY_EN	Count Table
Index

Furthermore, in Pycelonis 2.X we added a new option to directly export data from a data model if you only have "USE" permissions (on the data model) by specifying the pool id and data model id:

In [24]:

Copied!





from pycelonis.ems import DataModel

df = pql.DataFrame(
    {
        "ACTIVITY_EN": attribute_query,
    },
    data_model=DataModel(client=celonis.client, pool_id=data_model.pool_id, id=data_model.id)
)
df.head()
from pycelonis.ems import DataModel

df = pql.DataFrame(
    {
        "ACTIVITY_EN": attribute_query,
    },
    data_model=DataModel(client=celonis.client, pool_id=data_model.pool_id, id=data_model.id)
)
df.head()

Out[24]:

	ACTIVITY_EN
Index
0	Create Purchase Requisition Item
1	Create Purchase Order Item
2	Print and Send Purchase Order
3	Receive Goods
4	Scan Invoice

6. Studio and apps¶

PyCelonis 1.X only allows you to work with the draft of an asset from Studio but not with the published version of the asset via Apps. Implementing scripts based on unpublished drafts is not considered a best practice because they can be changed at any time, while working with published versions ensures consistency. Therefore, in PyCelonis 2.X we introduced support for Apps and implemented a clear separation between the two. You can still work with both an unpublished draft in Studio as well as a published version in Apps:

In [25]:

Copied!

studio_space = celonis.studio.get_spaces().find("PyCelonis Tutorial Space")
app_space = celonis.apps.get_space(space.id)
studio_space = celonis.studio.get_spaces().find("PyCelonis Tutorial Space")
app_space = celonis.apps.get_space(space.id)

The studio space contains all packages and the draft version of the different assets:

In [26]:

Copied!

studio_space.get_packages()
studio_space.get_packages()

Out[26]:

[
	Package(id='6b929009-3d19-43ef-a0ce-16d7a0491e9a', key='pycelonis_tutorial_package', name='PyCelonis Tutorial Package', root_node_key='pycelonis_tutorial_package', space_id='21393dac-210a-4038-b1c4-8533e4fa136e')
]

In [27]:

Copied!

package = studio_space.get_packages().find("PyCelonis Tutorial Package")
package = studio_space.get_packages().find("PyCelonis Tutorial Package")

The app space only contains packages that were published and contains the published version of the asset. After publishing the package it shows up in Apps as well:

In [29]:

Copied!

package.publish()

app_space.get_packages()
package.publish()

app_space.get_packages()

Out[29]:

[
	PublishedPackage(id='6b929009-3d19-43ef-a0ce-16d7a0491e9a', key='pycelonis_tutorial_package', name='PyCelonis Tutorial Package', root_node_key='pycelonis_tutorial_package', space_id='21393dac-210a-4038-b1c4-8533e4fa136e')
]

Conclusion¶

Congratulations! In this guide we only covered the most important differences between PyCelonis 1.X and 2.X and explained how to migrate the different functionalities. For further information refer to the API Reference and the other PyCelonis 2.X tutorials.