Celonis Basics¶
In this tutorial, you will learn how to use PyCelonis for basic interactions with the Celonis EMS. More specifically, you will learn:
- How to establish a connection to the Celonis API
- How to perform basic interactions with Celonis objects, such as data models or data jobs
- How to set up additional options, such as logging
Prerequisites¶
To follow this tutorial, you should have PyCelonis installed inside your ML Workbench app. If you have not done this yet, please follow the tutorial Installation first.
Tutorial¶
1. Establish connection to the Celonis EMS API¶
1.1 Specify API credentials and permissions¶
Before we can establish a connection to the Celonis EMS via PyCelonis, we need to have login credentials for the Celonis API in place. More specifically, the following information are required:
- A base url for the API call:
https://<team>.<realm>.celonis.cloud/
- A token for authentication with all permissions assigned to specific Celonis resources
PyCelonis supports two types of token:
Key Type | Description |
---|---|
USER_KEY |
User-specific API key. Grants our Python script the same permissions as we have as a Celonis EMS user. To create a new API key, we need to go to Edit Profile -> API-Keys . |
APP_KEY |
User-independent API key. Allows us to specify custom permissions for our Python script. If using the ML Workbench, an API key is automatically generated for us. Otherwise, we can create a new API key under Admin & Settings -> Applications . We can then assign permissions to our app under Admin & Settings -> Permissions depending on our use case. |
1.2 Connect to the EMS¶
To establish a connection to the Celonis API and interact with its services, we have to create an instance of the Celonis
class. The Celonis
instance serves as the central entry point to the EMS, over which we can access all other objects of the Celonis API.
To create a new Celonis
instance, we have to call the get_celonis()
method of PyCelonis. This method performs an initial connect to the Celonis API with the specified login credentials and returns the Celonis
instance, which can be stored in a variable. If working inside the ML Workbench, the get_celonis()
function automatically gets the API credentials from the environment variables CELONIS_URL
, CELONIS_API_TOKEN
and CELONIS_KEY_TYPE
, so we don't have to specify them explicitly (if no key_type is given, it is automatically inferred):
from pycelonis import get_celonis
celonis = get_celonis()
[2024-08-09 08:57:36,687] INFO: No `base_url` given. Using environment variable 'CELONIS_URL'
[2024-08-09 08:57:36,688] INFO: No `api_token` given. Using environment variable 'CELONIS_API_TOKEN'
[2024-08-09 08:57:36,723] WARNING: KeyType is not set. Defaulted to 'APP_KEY'.
[2024-08-09 08:57:36,724] INFO: Initial connect successful! PyCelonis Version: 2.10.1
No access key provided. Please set the CELONIS_MLOPS_ACCESS_KEY environment variable.
[2024-08-09 08:57:36,740] INFO: `semantic-layer` permissions: []
[2024-08-09 08:57:36,740] INFO: `package-manager` permissions: ['$ACCESS_CHILD', 'EDIT_ALL_SPACES', 'MANAGE_PERMISSIONS', 'CREATE_SPACE', 'DELETE_ALL_SPACES']
[2024-08-09 08:57:36,741] INFO: `compute-live` permissions: []
[2024-08-09 08:57:36,742] INFO: `action-engine` permissions: []
[2024-08-09 08:57:36,743] INFO: `team` permissions: []
[2024-08-09 08:57:36,744] INFO: `process-repository` permissions: []
[2024-08-09 08:57:36,745] INFO: `transformation-center` permissions: []
[2024-08-09 08:57:36,746] INFO: `storage-manager` permissions: ['DELETE', 'CREATE', 'GET', '$ACCESS_CHILD', 'ADMIN', 'LIST']
[2024-08-09 08:57:36,747] INFO: `event-collection` permissions: ['USE_ALL_DATA_MODELS', '$ACCESS_CHILD', 'EDIT_ALL_DATA_POOLS', 'CREATE_DATA_POOL']
[2024-08-09 08:57:36,747] INFO: `user-provisioning` permissions: []
[2024-08-09 08:57:36,748] INFO: `ml-workbench` permissions: ['DELETE_SCHEDULERS', 'EDIT_SCHEDULERS', 'USE_ALL_SCHEDULERS', '$ACCESS_CHILD', 'USE_ALL_APPS', 'CREATE_SCHEDULERS', 'MANAGE_ALL_APPS', 'CREATE_WORKSPACES', 'MANAGE_SCHEDULERS_PERMISSIONS', 'VIEW_CONFIGURATION', 'CREATE_APPS', 'MANAGE_ALL_MLFLOWS', 'CREATE_MLFLOWS', 'USE_ALL_MLFLOWS', 'MANAGE_ALL_WORKSPACES']
[2024-08-09 08:57:36,749] INFO: `workflows` permissions: []
If working outside the ML Workbench or if we want to use custom API credentials, we can also specify them as input parameters of the get_celonis()
method:
url = "https://<team>.<realm>.celonis.cloud/"
api_token = "<your-api-token>"
key_type = "APP_KEY"
from pycelonis import get_celonis
celonis = get_celonis(base_url=url, api_token=api_token, key_type=key_type)
[2024-08-09 08:57:36,797] INFO: Initial connect successful! PyCelonis Version: 2.10.1
No access key provided. Please set the CELONIS_MLOPS_ACCESS_KEY environment variable.
[2024-08-09 08:57:36,801] INFO: `semantic-layer` permissions: []
[2024-08-09 08:57:36,801] INFO: `package-manager` permissions: ['$ACCESS_CHILD', 'EDIT_ALL_SPACES', 'MANAGE_PERMISSIONS', 'CREATE_SPACE', 'DELETE_ALL_SPACES']
[2024-08-09 08:57:36,802] INFO: `compute-live` permissions: []
[2024-08-09 08:57:36,803] INFO: `action-engine` permissions: []
[2024-08-09 08:57:36,803] INFO: `team` permissions: []
[2024-08-09 08:57:36,804] INFO: `process-repository` permissions: []
[2024-08-09 08:57:36,805] INFO: `transformation-center` permissions: []
[2024-08-09 08:57:36,805] INFO: `storage-manager` permissions: ['DELETE', 'CREATE', 'GET', '$ACCESS_CHILD', 'ADMIN', 'LIST']
[2024-08-09 08:57:36,806] INFO: `event-collection` permissions: ['USE_ALL_DATA_MODELS', '$ACCESS_CHILD', 'EDIT_ALL_DATA_POOLS', 'CREATE_DATA_POOL']
[2024-08-09 08:57:36,806] INFO: `user-provisioning` permissions: []
[2024-08-09 08:57:36,807] INFO: `ml-workbench` permissions: ['DELETE_SCHEDULERS', 'EDIT_SCHEDULERS', 'USE_ALL_SCHEDULERS', '$ACCESS_CHILD', 'USE_ALL_APPS', 'CREATE_SCHEDULERS', 'MANAGE_ALL_APPS', 'CREATE_WORKSPACES', 'MANAGE_SCHEDULERS_PERMISSIONS', 'VIEW_CONFIGURATION', 'CREATE_APPS', 'MANAGE_ALL_MLFLOWS', 'CREATE_MLFLOWS', 'USE_ALL_MLFLOWS', 'MANAGE_ALL_WORKSPACES']
[2024-08-09 08:57:36,807] INFO: `workflows` permissions: []
Further, the get_celonis()
method runs a sanity check to confirm if the api_token
is valid and the correct key_type
is set. If we are sure that our login details are correct, we can also disable this function:
celonis = get_celonis(connect=False)
[2024-08-09 08:57:36,818] INFO: No `base_url` given. Using environment variable 'CELONIS_URL'
[2024-08-09 08:57:36,819] INFO: No `api_token` given. Using environment variable 'CELONIS_API_TOKEN'
[2024-08-09 08:57:36,855] WARNING: KeyType is not set. Defaulted to 'APP_KEY'.
No access key provided. Please set the CELONIS_MLOPS_ACCESS_KEY environment variable.
[2024-08-09 08:57:36,859] INFO: `semantic-layer` permissions: []
[2024-08-09 08:57:36,859] INFO: `package-manager` permissions: ['$ACCESS_CHILD', 'EDIT_ALL_SPACES', 'MANAGE_PERMISSIONS', 'CREATE_SPACE', 'DELETE_ALL_SPACES']
[2024-08-09 08:57:36,860] INFO: `compute-live` permissions: []
[2024-08-09 08:57:36,861] INFO: `action-engine` permissions: []
[2024-08-09 08:57:36,862] INFO: `team` permissions: []
[2024-08-09 08:57:36,862] INFO: `process-repository` permissions: []
[2024-08-09 08:57:36,863] INFO: `transformation-center` permissions: []
[2024-08-09 08:57:36,864] INFO: `storage-manager` permissions: ['DELETE', 'CREATE', 'GET', '$ACCESS_CHILD', 'ADMIN', 'LIST']
[2024-08-09 08:57:36,864] INFO: `event-collection` permissions: ['USE_ALL_DATA_MODELS', '$ACCESS_CHILD', 'EDIT_ALL_DATA_POOLS', 'CREATE_DATA_POOL']
[2024-08-09 08:57:36,865] INFO: `user-provisioning` permissions: []
[2024-08-09 08:57:36,865] INFO: `ml-workbench` permissions: ['DELETE_SCHEDULERS', 'EDIT_SCHEDULERS', 'USE_ALL_SCHEDULERS', '$ACCESS_CHILD', 'USE_ALL_APPS', 'CREATE_SCHEDULERS', 'MANAGE_ALL_APPS', 'CREATE_WORKSPACES', 'MANAGE_SCHEDULERS_PERMISSIONS', 'VIEW_CONFIGURATION', 'CREATE_APPS', 'MANAGE_ALL_MLFLOWS', 'CREATE_MLFLOWS', 'USE_ALL_MLFLOWS', 'MANAGE_ALL_WORKSPACES']
[2024-08-09 08:57:36,866] INFO: `workflows` permissions: []
The get_celonis()
method outputs the permissions that are associated with our API token. If we don't want to output the permissions, we can also disable this functionality:
celonis = get_celonis(permissions=False)
[2024-08-09 08:57:36,877] INFO: No `base_url` given. Using environment variable 'CELONIS_URL'
[2024-08-09 08:57:36,878] INFO: No `api_token` given. Using environment variable 'CELONIS_API_TOKEN'
[2024-08-09 08:57:36,916] WARNING: KeyType is not set. Defaulted to 'APP_KEY'.
[2024-08-09 08:57:36,920] INFO: Initial connect successful! PyCelonis Version: 2.10.1
No access key provided. Please set the CELONIS_MLOPS_ACCESS_KEY environment variable.
Note:
A common issue when using PyCelonis are missing permissions for Celonis services, such as data integration, studio, apps, etc. So, we need to make sure to assign the correct permissions to our API token before proceeding. For a more detailed overview on permissions and how to assign them, refer to the following help page.
2. Interact with Celonis objects¶
Interactions with Celonis objects, such as data pools, tables, or analyses, over PyCelonis all follow a similar syntax: celonis.<api-service>.<object-path>.<method>
Hereby, the Celonis
instance serves as the entry point to the EMS, which is then followed by a specific API service. Below, is an overview, over which API service we can access which Celonis object:
API Service | Celonis Object |
---|---|
data_integration |
Data pools, data models, data jobs, tables, tasks |
studio |
Spaces, packages, analyses, action flows, views, folders, knowledge models, simulations, skills |
apps |
Same objects as in Studio but in read-only mode |
Next, an <object-path>
is defined that determines which particular Celonis object to access and, lastly, a <method>
is called on that object. In the subsequent sections, we will learn how to exactly specify the <object-path>
to access a particular Celonis object and which standard methods exist that most of the Celonis objects support, namely create, get, update, sync, and delete.
2.1 Create new Celonis objects inside the EMS¶
New objects inside the EMS are created via the create_<object>()
method in PyCelonis, whereby the input arguments depend on the type of Celonis object. For an overview, which arguments are required to create which object, refer to the API Reference. The method creates the new Celonis object inside the EMS and returns it, so that it can be stored in a variable.
The create_<object>()
method is typically called from the preceding parent-object. For instance, data pools are created by calling the method from the preceding Data Integration object:
data_pool = celonis.data_integration.create_data_pool("PyCelonis Tutorial Pool")
data_pool
[2024-08-09 08:57:36,933] INFO: Successfully created data pool with id 'a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3'
DataPool(id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3', name='PyCelonis Tutorial Pool')
Data models are created by calling the method from the preceding data pool object, in which the data model should be stored:
data_model = data_pool.create_data_model("PyCelonis Tutorial Model")
data_model
[2024-08-09 08:57:36,945] INFO: Successfully created data model with id '30add069-ddb9-4d16-8089-04ffcd4788f6'
DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3')
This logic can be applied to any type of Celonis object. For instance:
- Data jobs are created by calling the method from the preceding data pool object, in which the data job should be stored
- Spaces are created by calling the method from the preceding studio object
- etc.
2.2 Retrieve Celonis objects from the EMS¶
Resources in Celonis are based on an object-relational mapping. Each object contains an ID that uniquely identifies it and a reference to the parent-object in which it is stored. For instance, the data model created in the previous section has a unique ID as well as a reference to the ID of the data pool, in which it is stored. This unique ID and reference can be used to retrieve Celonis objects from the EMS and store them in local variables for further processing.
data_model
DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3')
In general, Celonis objects can be accessed in two ways:
- Retrieve a specific Celonis object by its ID via the
get_<object>("id")
method - Retrieve all Celonis objects of a specific type as a
CelonisCollection
via theget_<object>s()
method
2.2.1 Retrieve a specific Celonis object¶
A specific Celonis object can be accessed by calling the get_<object>()
method and passing the unique ID as input argument. The method is called from the preceding parent-object and returns the Celonis resource as an object, which can be stored in a variable:
data_pool_id = "<your-data-pool-id>"
data_pool = celonis.data_integration.get_data_pool(data_pool_id)
data_pool
DataPool(id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3', name='PyCelonis Tutorial Pool')
When accessing a specifc Celonis object, it is possible to specify the entire object path starting from the Celonis
instance:
data_pool_id = "<your-data-pool-id>"
data_model_id = "<your-data-model-id>"
data_model = celonis.data_integration\
.get_data_pool(data_pool_id)\
.get_data_model(data_model_id)
data_model
DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3')
Or we can specify a relative object path starting from a preceding parent-object, in which the Celonis object is stored:
data_pool = celonis.data_integration.get_data_pool(data_pool_id)
data_model = data_pool.get_data_model(data_model_id)
data_model
DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3')
If all required parameters of an object are know it can also be directly initialized and then synced to fetch the latest state. E.g. for a data model:
from pycelonis.ems.data_integration.data_model import DataModel
data_model = DataModel(client=celonis.client, id=data_model.id, pool_id=data_pool_id)
data_model.sync()
2.2.2 Retrieve all Celonis objects of a certain type¶
It is also possible to retrieve all Celonis objects that are stored in a specific parent-object (e.g. all data models inside a data pool) as a CelonisCollection
via the get_<object>s()
method. A CelonisCollection
is a list-like data structure, which stores Celonis objects of a specific type.
data_models = data_pool.get_data_models()
data_models
[ DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3') ]
The CelonisCollection
offers a method find()
, with which we can search for a specific Celonis object based on different search criteria. By default, the find()
method searches for Celonis objects with a certain name:
result = data_models.find("PyCelonis Tutorial Model")
result
DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3')
However, we can also specify another search attribute, according to which we want to search the CelonisCollection
:
result = data_models.find(search_term=data_pool_id, search_attribute="pool_id")
result
DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3')
Note:
While it is possible to access a specific Celonis object by retrieving a CelonisCollection
via get_<object>s()
and then searching for that object via find("name")
, it is recommended to directly access the specific object via its ID by calling the get_<object>()
method due to better performance and because IDs are unique and cannot change, whereas other search attributes are not unique and may change over time.
It is also possible to search via the find_all()
method for multiple Celonis objects in a CelonisCollection
. This method will return another CelonisCollection
with all Celonis object that fulfill the specific search criterium:
results = data_models.find_all(search_term="PyCelonis Tutorial Model", search_attribute="name")
results
[ DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3') ]
2.3 Access properties of Celonis objects¶
Each Celonis object has a set of properties that depends on the specific object type. Please refer to the API Reference for an overview, which Celonis object has which properties. To access a property of an object, we can use the <object>.<property>
command:
data_pool.name
'PyCelonis Tutorial Pool'
To print out all properties of a Celonis object, we can use the <object>.dict()
command:
data_pool.dict()
{'permissions': ['EDIT_DATA_POOL_RESTRICTED', 'USE_ALL_DATA_MODELS', 'DATA_PUSH_API', 'ADMIN', 'VIEW_DATA_POOL', 'CONTINUOUS_DATA_PUSH_API', 'EDIT_ALL_DATA_POOLS', 'CREATE_DATA_POOL'], 'id': 'a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3', 'name': 'PyCelonis Tutorial Pool', 'description': None, 'time_stamp': datetime.datetime(2024, 5, 15, 13, 6, 18, 394000, tzinfo=datetime.timezone.utc), 'configuration_status': 'CONFIGURED', 'locked': False, 'content_id': None, 'content_version': 0, 'tags': [], 'original_id': None, 'monitoring_target': False, 'custom_monitoring_target': False, 'custom_monitoring_target_active': False, 'exported': False, 'monitoring_message_columns_migrated': False, 'creator_user_id': 'ba629456-ff60-4acb-a8c9-b92703954e7e', 'object_id': 'a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3'}
2.4 Update properties of Celonis objects¶
We can also assign these data properties new values via the command <object>.<property> = <value>
:
print("Name Before Update: " + data_pool.name)
data_pool.name = "PyCelonis Tutorial Data Pool"
print("Name After Update: " + data_pool.name)
Name Before Update: PyCelonis Tutorial Pool Name After Update: PyCelonis Tutorial Data Pool
However, these changes are currently only made in our local object of the Celonis resource but not in the EMS. To push these changes into the EMS, we have to call the <object>.update()
method:
data_pool.update()
data_pool
[2024-08-09 08:57:37,150] INFO: Successfully updated data pool with id 'a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3'
DataPool(id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3', name='PyCelonis Tutorial Data Pool')
2.5 Sync properties of Celonis objects with data from the EMS¶
Conversely, if we want to pull property values from the EMS for a specific resource and update our local object with them, we can call the <object>.sync()
method. This method will overwrite the local properties of our Celonis objects with values from the EMS.
As an example, let's change the name of our data pool object. This updated name is currently only available in our local object but not in the Celonis EMS:
data_pool.name = "PyCelonis Tutorial Pool"
data_pool.name
'PyCelonis Tutorial Pool'
When we now call the sync()
method, the name property of our local object will be overwritten with the value from the EMS object:
data_pool.sync()
print(f"Name After Sync: {data_pool.name}")
Name After Sync: PyCelonis Tutorial Data Pool
2.6 Delete Celonis objects from the EMS¶
Lastly, we can delete Celonis objects from the EMS via the <object>.delete()
method:
print(f"Data Models Before Deletion: {data_pool.get_data_models()}")
data_model.delete()
print(f"Data Models After Deletion: {data_pool.get_data_models()}")
Data Models Before Deletion: [ DataModel(id='30add069-ddb9-4d16-8089-04ffcd4788f6', name='PyCelonis Tutorial Model', pool_id='a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3') ] [2024-08-09 08:57:37,191] INFO: Successfully deleted data model with id '30add069-ddb9-4d16-8089-04ffcd4788f6'
Data Models After Deletion: []
data_pool.delete()
[2024-08-09 08:57:37,206] INFO: Successfully deleted data pool with id 'a25a9f07-8c83-4e9c-9a7e-b4a7143a94f3'
Note that if we delete a certain object, all child-objects are deleted as well. For instance:
- If we delete a data pool, all objects inside the pool (e.g. data models, data jobs, tables) will be deleted.
- If we delete a space, all objects inside the space (e.g. packages, knowledge models, views) will be deleted.
- etc.
2.7 Concluding Remarks¶
Note that some Celonis objects may not support all of the methods described above. Further, some Celonis objects have additional methods not described here. During the subsequent tutorials, we will learn more about which methods can be applied to specific Celonis objects, such as data models, spaces, or analyses. For an overview of all commands for each Celonis object, refer to the API Reference.
3. Set up additional options¶
3.1 Logging¶
By default, PyCelonis runs in the logging mode INFO
in the ML Workbench and in the logging mode WARNING
outside the ML Workbench. We can adjust the verbosity of the log output with the command below. This can be, for instance, useful to debug our Python script when using PyCelonis.
The following logging modes are available for PyCelonis:
Logging Mode | Description |
---|---|
WARNING |
Shows deprecations, new versions available, default behavior that might not be obvious |
INFO |
Shows changes to resources in the EMS, longer-running functions |
DEBUG |
Shows exceptions and API requests |
import logging
logging.getLogger("pycelonis").setLevel(logging.DEBUG)
logging.basicConfig(level=logging.DEBUG)
Conclusion¶
Congratulations! You have learned how to establish a connection to the Celonis EMS API via PyCelonis, how to perform basic interactions with Celonis objects (create, get, update, sync, and delete), and how to activate additional options for debugging. In the next tutorial Data Integration - Introduction, you will dive deeper into how to use PyCelonis to interact with Celonis objects in Data Integration, such as data pools, data models, and data jobs.