Introduction¶
In this tutorial, you will dive deeper into objects of the Celonis Data Integration service. More specifically, you will learn how to use PyCelonis to interact with data pools, data models, and data jobs. These three objects serve as places to organize other Celonis objects, such as tables and tasks, and are thus a good starting point when working on a new PyCelonis project.
Prerequisites¶
To follow this tutorial, you should have set up an app inside the ML Workbench and should have PyCelonis installed. Further, you should know how to establish a connection to the Celonis API and how to perform basic interactions with Celonis objects, such as create, get, update, sync, and delete. If you don't know how to do this, please complete the Celonis Basics tutorial first.
Tutorial¶
1. Import PyCelonis and connect to Celonis API¶
from pycelonis import get_celonis
celonis = get_celonis(permissions=False)
[2024-08-07 12:11:49,713] INFO: No `base_url` given. Using environment variable 'CELONIS_URL'
[2024-08-07 12:11:49,714] INFO: No `api_token` given. Using environment variable 'CELONIS_API_TOKEN'
[2024-08-07 12:11:49,746] WARNING: KeyType is not set. Defaulted to 'APP_KEY'.
[2024-08-07 12:11:49,747] INFO: Initial connect successful! PyCelonis Version: 2.10.0
No access key provided. Please set the CELONIS_MLOPS_ACCESS_KEY environment variable.
2. Create a data pool¶
When we work on a new PyCelonis project and don't have any Data Integration objects inside our EMS, the first step will be to create a new data pool. Data pools are the main structural element of the Data Integration service and are used to organize other Celonis objects, such as data models, tables, data jobs, and tasks. It is required to have a data pool in place in order to create other Data Integration objects.
data_pool = celonis.data_integration.create_data_pool("PyCelonis Tutorial Pool")
data_pool
[2024-08-07 12:11:49,766] INFO: Successfully created data pool with id '1bcc67dc-935a-4b7f-940a-9c0b81a7135a'
DataPool(id='1bcc67dc-935a-4b7f-940a-9c0b81a7135a', name='PyCelonis Tutorial Pool')
3. Add Data Integration objects (data model and job) into the data pool¶
Once we have a data pool in place, we can start adding other Data Integration objects to it. For now, we will create a new (empty) data model and a new (empty) data job into the data pool. These two objects will be populated with tasks and tables in subsequent tutorials.
data_model = data_pool.create_data_model("PyCelonis Tutorial Model")
data_model
[2024-08-07 12:11:49,777] INFO: Successfully created data model with id 'a13de415-ab63-4faf-a392-daab5ec74795'
DataModel(id='a13de415-ab63-4faf-a392-daab5ec74795', name='PyCelonis Tutorial Model', pool_id='1bcc67dc-935a-4b7f-940a-9c0b81a7135a')
data_job = data_pool.create_job("PyCelonis Tutorial Job")
data_job
[2024-08-07 12:11:49,787] INFO: Successfully created job with id 'a2c77d73-4758-47f1-a59e-157f061fa35f'
Job(id='a2c77d73-4758-47f1-a59e-157f061fa35f', name='PyCelonis Tutorial Job', data_pool_id='1bcc67dc-935a-4b7f-940a-9c0b81a7135a')
We can always verify which objects are currently in our data pool by calling the get_<object>s()
method:
data_pool.get_data_models()
[ DataModel(id='a13de415-ab63-4faf-a392-daab5ec74795', name='PyCelonis Tutorial Model', pool_id='1bcc67dc-935a-4b7f-940a-9c0b81a7135a') ]
data_pool.get_jobs()
[ Job(id='a2c77d73-4758-47f1-a59e-157f061fa35f', name='PyCelonis Tutorial Job', data_pool_id='1bcc67dc-935a-4b7f-940a-9c0b81a7135a') ]
4. Change Data Integration objects¶
As known from the Celonis Basics tutorial, we can always update, sync, and delete objects inside Data Integration. For now, we will stick to updating the names of the different objects and pushing the changes into the Celonis EMS:
data_pool.name = "PyCelonis Tutorial Data Pool"
data_pool.update()
data_model.name = "PyCelonis Tutorial Data Model"
data_model.update()
data_job.name = "PyCelonis Tutorial Data Job"
data_job.update()
[2024-08-07 12:11:49,817] INFO: Successfully updated data pool with id '1bcc67dc-935a-4b7f-940a-9c0b81a7135a'
[2024-08-07 12:11:49,822] INFO: Successfully updated data model with id 'a13de415-ab63-4faf-a392-daab5ec74795'
[2024-08-07 12:11:49,825] INFO: Successfully updated job with id 'a2c77d73-4758-47f1-a59e-157f061fa35f'
5. Access Data Model with only use permissions¶
In general, accessing a data model through the data pool requires admin permissions for the pool. To access a data model with only use permissions for that data model it can directly be instantiated:
from pycelonis.ems.data_integration.data_model import DataModel
data_model = DataModel(client=celonis.client, id=data_model.id, pool_id=data_pool.id)
data_model.sync()
Conclusion¶
Congratulations! You have learned how to use PyCelonis to create data pools, data models, and data jobs inside Data Integration. These objects will serve as places to organize your other Data Integration objects throughout the subsequent tutorials. In the next tutorial Data Upload, you will learn how to populate two of these objects, namely data pools and data models, by creating tables and uploading data from your local Python project into the EMS.