Migration from PyCelonis 1.X to PyCelonis 2.X¶
In this tutorial, you will learn how to migrate your existing script based on PyCelonis 1.X to PyCelonis 2.X. In PyCelonis 2.X we set out to improve a lot of things. Unfortunately, this was only possible through breaking changes. Therefore, we decided to completely rewrite Pycelonis. Below are some of the major improvements compared to PyCelonis 1.X:
- New interfaces aligned with the EMS navigation: apps, data integration, studio...
- Explicit update and sync operations to improve performance
- Improved semantics for data upload
- Additional features for PQL handling
- Streamlined data export through data model
- Clearer separation between Studio and Apps
- Removal of redundant APIs
- Consistent interface for classes and methods
Tutorial¶
1. Improved interface¶
For better alignment with the EMS we updated the whole interface of PyCelonis. In PyCelonis 1.X most objects where accessible through the celonis
object:
# PyCelonis 1.X
from pycelonis import get_celonis
celonis = get_celonis(permissions=False)
data_pools = celonis.pools
spaces = celonis.spaces
datamodels = celonis.datamodels
...
In PyCelonis 2.X these can be accessed through their respective sub module (similar to how it's done in the UI). Available sub modules are:
data_integration
studio
apps
team
Instead of accessing resources through a property (e.g. celonis.pools
), a method has to be called (e.g. celonis.data_integration.get_data_pools()
):
from pycelonis import get_celonis
celonis = get_celonis(permissions=False)
data_pools = celonis.data_integration.get_data_pools()
spaces = celonis.studio.get_spaces()
KeyType is not set. Defaulted to 'APP_KEY'.
No access key provided. Please set the CELONIS_MLOPS_ACCESS_KEY environment variable.
To avoid performance issues, accessing objects within other resources (e.g. data models) has to be done through their parent object in PyCelonis 2.X:
data_pool = celonis.data_integration.get_data_pools().find("PyCelonis Tutorial Data Pool")
data_pool = celonis.data_integration.get_data_pool(data_pool.id)
data_models = data_pool.get_data_models()
space = celonis.studio.get_spaces().find("PyCelonis Tutorial Space")
space = celonis.studio.get_space(space.id)
packages = space.get_packages()
We have kept the find
method as a convenient way to find specific objects based on their name.
# PyCelonis 1.X
data_pool = data_pools.find("PyCelonis Tutorial Data Pool")
data_pool = data_pools.find("341b450c-7f92-26e6-b79c-2d38a1a87d38")
In PyCelonis 2.X the find
method by default only searches for a given name and not for the id. If you intend to retrieve an object by its id use the get_<object>
function instead to improve performance:
data_pool = data_pools.find("PyCelonis Tutorial Data Pool")
data_pool = celonis.data_integration.get_data_pool(data_pool.id)
2. Working with EMS resource objects¶
In PyCelonis 1.X, resources were always in sync with the EMS and whenever a property was accessed an API call was made in the background to fetch the latest state:
# PyCelonis 1.X
print(data_pool.name) # Triggers API call to fetch latest data pool name
Since this behaviour triggers a lot of API calls it can have a detrimental impact on performance and also might cause stability issues. Therefore, in PyCelonis 2.X resource objects are not automatically synced with the EMS anymore:
print("Data pool name:", data_pool.name) # Does not trigger an API call and only reads the name from memory
Data pool name: PyCelonis Tutorial Data Pool
If you want to fetch the latest updates from the EMS you can explicitly call the sync
method to get the latest state for any resource:
data_pool.sync() # Triggers API call to fetch latest data pool data
print("Data pool name:", data_pool.name)
Data pool name: PyCelonis Tutorial Data Pool
Setting a property in PyCelonis 1.X automatically triggered an update in the EMS:
# PyCelonis 1.X
data_pool.name = "NEW_NAME" # Triggers API call to update data pool name
For PyCelonis 2.X we introduced the update
method to explicitly push the current state of a resource to the EMS to avoid accidental updates and allow bulk updates for multiple properties:
data_pool.name = "PyCelonis Tutorial Pool" # Only updates the name of the data pool locally
data_pool.update() # Pushes the current state of the data pool to the EMS to update the name
Lastly, accessing specific properties of a resource for PyCelonis 1.X required using the data
dictionary:
# PyCelonis 1.X
print(data_pool.data["timeStamp"])
For PyCelonis 2.X, we introduced PyDantic data classes for resource objects and made all properties directly accessible.
With this change we now fully support type hinting.
This results in a much better development experience as we fully support code completion (via
print(data_pool.time_stamp)
2024-05-15 13:06:25.119000+00:00
Also, it's possible to get a dictionary or json representation of any resource if needed:
data_pool.dict()
{'permissions': ['EDIT_DATA_POOL_RESTRICTED', 'USE_ALL_DATA_MODELS', 'DATA_PUSH_API', 'ADMIN', 'VIEW_DATA_POOL', 'CONTINUOUS_DATA_PUSH_API', 'EDIT_ALL_DATA_POOLS', 'CREATE_DATA_POOL'], 'id': '1bcc67dc-935a-4b7f-940a-9c0b81a7135a', 'name': 'PyCelonis Tutorial Pool', 'description': None, 'time_stamp': datetime.datetime(2024, 5, 15, 13, 6, 25, 119000, tzinfo=datetime.timezone.utc), 'configuration_status': 'CONFIGURED', 'locked': False, 'content_id': None, 'content_version': 0, 'tags': [], 'original_id': None, 'monitoring_target': False, 'custom_monitoring_target': False, 'custom_monitoring_target_active': False, 'exported': False, 'monitoring_message_columns_migrated': False, 'creator_user_id': 'ba629456-ff60-4acb-a8c9-b92703954e7e', 'object_id': '1bcc67dc-935a-4b7f-940a-9c0b81a7135a'}
data_pool.json()
'{"permissions": ["EDIT_DATA_POOL_RESTRICTED", "USE_ALL_DATA_MODELS", "DATA_PUSH_API", "ADMIN", "VIEW_DATA_POOL", "CONTINUOUS_DATA_PUSH_API", "EDIT_ALL_DATA_POOLS", "CREATE_DATA_POOL"], "id": "1bcc67dc-935a-4b7f-940a-9c0b81a7135a", "name": "PyCelonis Tutorial Pool", "description": null, "time_stamp": 1715778385119, "configuration_status": "CONFIGURED", "locked": false, "content_id": null, "content_version": 0, "tags": [], "original_id": null, "monitoring_target": false, "custom_monitoring_target": false, "custom_monitoring_target_active": false, "exported": false, "monitoring_message_columns_migrated": false, "creator_user_id": "ba629456-ff60-4acb-a8c9-b92703954e7e", "object_id": "1bcc67dc-935a-4b7f-940a-9c0b81a7135a"}'
3. Uploading data into a data pool¶
In PyCelonis 1.X you can create, append, and upsert data to a table by using a data pool:
# PyCelonis 1.X
data_pool.create_table(df, "TABLE_NAME")
data_pool.append_table(df, "TABLE_NAME")
data_pool.upsert_table(df, "TABLE_NAME", keys=["ID"])
We considered this a bad interface because it allowed you to try uploading data to a table that didn't exists.
Now, PyCelonis 2.X offers the same functionality with the difference that append and upsert must be called directly on the table
object :
import pandas as pd
df = pd.DataFrame({"ID": [1,2,3,4]})
table = data_pool.create_table(df, "DATA_PUSH_TEST")
table.append(df)
table.upsert(df, keys=["ID"])
STRING columns are by default stored as VARCHAR(80) and therefore cut after 80 characters. You can specify a custom field length for each column using the `column_config` parameter.
No column configuration set. String columns are cropped to 80 characters if not configured
No column configuration set. String columns are cropped to 80 characters if not configured
It is also possible to create a table based on a parquet file in PyCelonis 2.X by manually creating and executing a data push job:
from pycelonis.ems import JobType
data_push_job = data_pool.create_data_push_job(
target_name="DATA_PUSH_TEST_JOB",
type_=JobType.REPLACE,
)
with open("../../../assets/_CEL_P2P_ACTIVITIES_EN.parquet", "rb") as file:
data_push_job.add_file_chunk(file)
data_push_job.execute(wait=True)
4. PQL Handling¶
In PyCelonis 1.X you can query data using the PQL class:
# PyCelonis 1.X
from pycelonis.pql import PQL, PQLColumn, PQLFilter
q = PQL()
query += PQLColumn(name="ACTIVITY_EN", query=""" "ACTIVITIES"."ACTIVITY_EN" """)
query += PQLColumn(name="EVENTTIME", query=""" "ACTIVITIES"."EVENTTIME" """)
query += PQLFilter(""" FILTER "ACTIVITIES"."_CASE_KEY" = '800000000006800001'; """)
PyCelonis 2.X offers a similar interface but now additionally supports OrderByColumn
and additional operations through SaolaPy. To migrate existing queries simply use the from_pql
method:
import pycelonis.pql as pql
data_model = data_pool.get_data_models().find("PyCelonis Tutorial Data Model")
query = pql.PQL(distinct=True)
query += pql.PQLColumn(name="ACTIVITY_EN", query=""" "ACTIVITIES"."ACTIVITY_EN" """)
query += pql.PQLColumn(name="EVENTTIME", query=""" "ACTIVITIES"."EVENTTIME" """)
query += pql.PQLFilter(query=""" FILTER "ACTIVITIES"."_CASE_KEY" = '800000000006800001'; """)
query += pql.OrderByColumn(query=""" "ACTIVITIES"."EVENTTIME" """)
df = pql.DataFrame.from_pql(query, data_model=data_model)
5. Exporting data from a data model¶
With PyCelonis 1.X you can export data both through an analysis as well as the data model:
# PyCelonis 1.X
# Export using data model
df = data_model.get_data_frame(q)
# Export using analysis
component = analysis.draft.components.find("Vendors")
df = component.get_data_frame()
This has changed in PyCelonis 2.X since the analysis export had several limitations. Therefore, all data exports must be triggered via the data model using SaolaPy:
from pycelonis.pql.saola_connector import KnowledgeModelSaolaConnector
df = pql.DataFrame(
{
"ACTIVITIES": """ "ACTIVITIES"."_CASE_KEY" """,
},
data_model=data_model
)
df.head()
ACTIVITIES | |
---|---|
Index | |
0 | 800000000006800001 |
1 | 800000000006800001 |
2 | 800000000006800001 |
3 | 800000000006800001 |
4 | 800000000006800001 |
In PyCelonis 2.X, it is now possible to get the queries of existing knowledge models and analyses and also query data using any KPI or variables from these components.
For knowledge models, you can get the query of knowledge model filters, record attributes and identifiers.
In this example, we get the PQLColumn
of the record attribute with the get_column
method.
km_record = knowledge_model.get_content().records.find_by_id('ACTIVITIES')
km_attribute = km_record.attributes.find_by_id('ACTIVITY_EN')
attribute_query = km_attribute.get_column()
attribute_query
PQLColumn(name='ACTIVITY_EN', query='"ACTIVITIES"."ACTIVITY_EN"')
Instead of passing a data model to our data frame we have to specify a KnowledgeModelSaolaConnector
which resolves knowledge model variables and KPIs:
from pycelonis.pql.saola_connector import KnowledgeModelSaolaConnector
df = pql.DataFrame(
{
"ACTIVITY_EN": attribute_query,
},
saola_connector=KnowledgeModelSaolaConnector(data_model, knowledge_model)
)
df.head()
ACTIVITY_EN | |
---|---|
Index | |
0 | Create Purchase Requisition Item |
1 | Create Purchase Order Item |
2 | Print and Send Purchase Order |
3 | Receive Goods |
4 | Scan Invoice |
Similar to knowledge models, we can query data using any KPI or variables from the analysis. To get the analysis component query, we first need to specify the analysis sheet and component.
published_sheet = published_analysis.get_content().draft.document.sheets[0]
In this example, we find a OLAP table component on the analysis sheet.
olap_table = published_sheet.components.find("#{OLAP Table}", search_attribute="title")
The get_query
method returns the query of the OLAP table.
olap_query = olap_table.get_query()
olap_query
PQL(columns=[PQLColumn(name='ACTIVITY_EN', query='"ACTIVITIES"."ACTIVITY_EN"'), PQLColumn(name='Count Table', query='COUNT_TABLE("ACTIVITIES")')], filters=[PQLFilter(query='FILTER "ACTIVITIES"."ACTIVITY_EN" = \'ACTIVITY1\';\nFILTER "ACTIVITIES"."ACTIVITY_EN" != \'ACTIVITY2\';')], order_by_columns=[OrderByColumn(query='"ACTIVITIES"."ACTIVITY_EN"', ascending=True)], distinct=False, limit=None, offset=None)
We resolve all analysis variables and KPIs of this query using the AnalysisSaolaConnector
and by creating a DataFrame from the full olap_query
:
from pycelonis.pql.saola_connector import AnalysisSaolaConnector
df = pql.DataFrame.from_pql(
olap_query,
saola_connector=AnalysisSaolaConnector(data_model, published_analysis)
)
df.head()
ACTIVITY_EN | Count Table | |
---|---|---|
Index |
Furthermore, in Pycelonis 2.X we added a new option to directly export data from a data model if you only have "USE" permissions (on the data model) by specifying the pool id and data model id:
from pycelonis.ems import DataModel
df = pql.DataFrame(
{
"ACTIVITY_EN": attribute_query,
},
data_model=DataModel(client=celonis.client, pool_id=data_model.pool_id, id=data_model.id)
)
df.head()
ACTIVITY_EN | |
---|---|
Index | |
0 | Create Purchase Requisition Item |
1 | Create Purchase Order Item |
2 | Print and Send Purchase Order |
3 | Receive Goods |
4 | Scan Invoice |
6. Studio and apps¶
PyCelonis 1.X only allows you to work with the draft of an asset from Studio but not with the published version of the asset via Apps. Implementing scripts based on unpublished drafts is not considered a best practice because they can be changed at any time, while working with published versions ensures consistency. Therefore, in PyCelonis 2.X we introduced support for Apps and implemented a clear separation between the two. You can still work with both an unpublished draft in Studio as well as a published version in Apps:
studio_space = celonis.studio.get_spaces().find("PyCelonis Tutorial Space")
app_space = celonis.apps.get_space(space.id)
The studio space contains all packages and the draft version of the different assets:
studio_space.get_packages()
[ Package(id='bb68960d-4f9d-4367-83a1-f9d888e4ce60', key='pycelonis_tutorial_package', name='PyCelonis Tutorial Package', root_node_key='pycelonis_tutorial_package', space_id='d9618285-da25-4131-81ba-9d45e82b1724') ]
package = studio_space.get_packages().find("PyCelonis Tutorial Package")
The app space only contains packages that were published and contains the published version of the asset. After publishing the package it shows up in Apps as well:
package.publish()
app_space.get_packages()
[ PublishedPackage(id='bb68960d-4f9d-4367-83a1-f9d888e4ce60', key='pycelonis_tutorial_package', name='PyCelonis Tutorial Package', root_node_key='pycelonis_tutorial_package', space_id='d9618285-da25-4131-81ba-9d45e82b1724') ]
Conclusion¶
Congratulations! In this guide we only covered the most important differences between PyCelonis 1.X and 2.X and explained how to migrate the different functionalities. For further information refer to the API Reference and the other PyCelonis 2.X tutorials.