data_pool.py
Pool (CelonisApiObject)
¶
Pool object to interact to interact with Celonis Event Collection API.
url: str
property
readonly
¶
API
/integration/api/pools/{pool_id}
url_data_push
property
readonly
¶
API
/integration/api/v1/data-push/{pool_id}/jobs/
url_connection_creation
property
readonly
¶
API
/integration/api/datasource/
datamodels: CelonisCollection
property
readonly
¶
Get all Datamodels of the Pool.
API
GET: /integration/api/pools/{pool_id}/data-models
Returns:
Type | Description |
---|---|
CelonisCollection |
Collection of Pool Datamodels. |
tables: List[Dict]
property
readonly
¶
Get all Pool Tables.
API
GET: /integration/api/pools/{pool_id}/tables
Returns:
Type | Description |
---|---|
List[Dict] |
A List of dictionaries containing Pool tables. |
data_connections: CelonisCollection[DataConnection]
property
readonly
¶
Get all Pool Data Connections.
API
GET: /integration/api/pools/{pool_id}/data-sources/
Returns:
Type | Description |
---|---|
CelonisCollection[DataConnection] |
A Collection of Pool Data Connections. |
data_jobs: CelonisCollection[DataJob]
property
readonly
¶
Get all Pool Data Jobs.
API
GET: /integration/api/pools/{pool_id}/jobs
Returns:
Type | Description |
---|---|
CelonisCollection[DataJob] |
A Collection of Pool Data Jobs. |
variables: CelonisCollection[PoolParameter]
property
readonly
¶
Get all Pool Variables.
API
GET: /integration/api/pools/{pool_id}/variables
Returns:
Type | Description |
---|---|
CelonisCollection[PoolParameter] |
A Collection of Pool Variables. |
find_table(self, table_name, data_source_id=None)
¶
Find a Table in the Pool.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name |
str |
Name of the Pool Table. |
required |
data_source_id |
str |
ID of the Data Source. |
None |
Returns:
Type | Description |
---|---|
Optional[Dict] |
The Pool Table, if found. |
create_table(self, df_or_path, table_name, if_exists='error', column_config=None, connection=None, wait_for_finish=True, chunksize=100000)
¶
Creates a new Table in the Pool from a pandas.DataFrame or pyarrow.parquet.ParquetFile.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_or_path |
Union[pandas.core.frame.DataFrame, pathlib.Path, str] |
|
required |
table_name |
str |
Name of Table. |
required |
if_exists |
str |
|
'error' |
column_config |
List[Dict[str, Any]] |
Can be used to specify column types and string field length. withcolumnType one of [INTEGER , DATE , TIME , DATETIME , FLOAT , BOOLEAN , STRING ]. |
None |
connection |
Union[DataConnection, str] |
The Data Connection to upload to, else uploads to Global. |
None |
wait_for_finish |
bool |
Waits for the upload to finish processing, set to False to trigger only. |
True |
chunksize |
int |
If DataFrame is passed, the value is used to chunk the dataframe into multiple parquet files that are uploaded. If set to a value <1, no chunking is applied. |
100000 |
Returns:
Type | Description |
---|---|
Dict |
The Data Job Status. |
Exceptions:
Type | Description |
---|---|
PyCelonisValueError |
If Table already exists and |
PyCelonisTypeError |
When connection is not DataConnection object or ID of Data Connection. |
PyCelonisTypeError |
If Path is not valid a file or folder. |
append_table(self, df_or_path, table_name, column_config=None, connection=None, wait_for_finish=True, chunksize=100000)
¶
Appends a pandas.DataFrame or pyarrow.parquet.ParquetFile to an existing Table in the Pool.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_or_path |
Union[pandas.core.frame.DataFrame, pathlib.Path, str] |
|
required |
table_name |
str |
Name of Table. |
required |
column_config |
List[Dict[str, Any]] |
Can be used to specify column types and string field length. withcolumnType one of [INTEGER , DATE , TIME , DATETIME , FLOAT , BOOLEAN , STRING ]. |
None |
connection |
Union[DataConnection, str] |
The Data Connection to upload to, else uploads to Global. |
None |
wait_for_finish |
bool |
Waits for the upload to finish processing, set to False to trigger only. |
True |
chunksize |
int |
If DataFrame is passed, the value is used to chunk the dataframe into multiple parquet files that are uploaded. If set to a value <1, no chunking is applied. |
100000 |
Returns:
Type | Description |
---|---|
Dict |
The Data Job Status. |
Exceptions:
Type | Description |
---|---|
PyCelonisValueError |
If Table already exists and |
PyCelonisTypeError |
When connection is not DataConnection object or ID of Data Connection. |
PyCelonisTypeError |
If Path is not valid a file or folder. |
upsert_table(self, df_or_path, table_name, primary_keys, column_config=None, connection=None, wait_for_finish=True, chunksize=100000)
¶
Upserts the pandas.DataFrame or pyarrow.parquet.ParquetFile an existing Table in the Pool.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_or_path |
Union[pandas.core.frame.DataFrame, pathlib.Path, str] |
|
required |
table_name |
str |
Name of Table. |
required |
primary_keys |
List[str] |
List of Table primary keys. |
required |
column_config |
List[Dict[str, Any]] |
Can be used to specify column types and string field length. withcolumnType one of [INTEGER , DATE , TIME , DATETIME , FLOAT , BOOLEAN , STRING ]. |
None |
connection |
Union[DataConnection, str] |
The Data Connection to upload to, else uploads to Global. |
None |
wait_for_finish |
bool |
Waits for the upload to finish processing, set to False to trigger only. |
True |
chunksize |
int |
If DataFrame is passed, the value is used to chunk the dataframe into multiple parquet files that are uploaded. If set to a value <1, no chunking is applied. |
100000 |
Returns:
Type | Description |
---|---|
Dict |
The Data Job Status. |
Exceptions:
Type | Description |
---|---|
PyCelonisValueError |
If Table already exists and |
PyCelonisTypeError |
When connection is not DataConnection object or ID of Data Connection. |
PyCelonisTypeError |
If Path is not valid a file or folder. |
push_table(self, df_or_path, table_name, if_exists='error', primary_keys=None, column_config=None, connection=None, wait_for_finish=True, chunksize=100000)
¶
Pushes a pandas.DataFrame or pyarrow.parquet.ParquetFile to the specified Table in the Pool.
Warning
Deprecation: The method 'push_table' is deprecated and will be removed in the next release. Use one of: create_table, append_table, upsert_table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df_or_path |
Union[pandas.core.frame.DataFrame, pathlib.Path, str] |
|
required |
table_name |
str |
Name of Table. |
required |
if_exists |
str |
|
'error' |
primary_keys |
Optional[List[str]] |
List of Table primary keys. |
None |
column_config |
List[Dict[str, Any]] |
Can be used to specify column types and string field length. withcolumnType one of [INTEGER , DATE , TIME , DATETIME , FLOAT , BOOLEAN , STRING ]. |
None |
connection |
Union[DataConnection, str] |
The Data Connection to upload to, else uploads to Global. |
None |
wait_for_finish |
bool |
Waits for the upload to finish processing, set to False to trigger only. |
True |
chunksize |
int |
If DataFrame is passed, the value is used to chunk the dataframe into multiple parquet files that are uploaded. If set to a value <1, no chunking is applied. |
100000 |
Returns:
Type | Description |
---|---|
Dict |
The Data Job Status. |
get_column_config(self, table, raise_error=False)
¶
Get a Column Configuration of a Pool Table.
Column Config List:
[
{'columnName': 'colA', 'columnType': 'DATETIME'},
{'columnName': 'colB', 'columnType': 'FLOAT'},
{'columnName': 'colC', 'columnType': 'STRING', 'fieldLength': 80}
]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
Union[str, Dict] |
Name of the Pool Table or dictionary with |
required |
raise_error |
bool |
Raises a celonis_api.errors.PyCelonisValueError if Table data types are |
False |
Returns:
Type | Description |
---|---|
Optional[List[Dict[str, Any]]] |
The Column Configuration of the Pool Table (Always ignoring '_CELONIS_CHANGE_DATE'). |
check_push_status(self, job_id='')
¶
Checks the Status of a Data Push Job.
API
GET: /integration/api/v1/data-push/{pool_id}/jobs/{job_id}
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_id |
str |
The ID of the job to check. If empty returns all job status. |
'' |
Returns:
Type | Description |
---|---|
Dict |
Status of Data Push Job(s). |
check_data_job_execution_status(self)
¶
Checks the Status of Data Job Executions.
API
GET: /integration/api/pools/{pool_id}/logs/status
Returns:
Type | Description |
---|---|
List |
Status of all Data Job Executions. |
create_datamodel(self, name)
¶
Creates a new Datamodel in the Pool.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Name of the Datamodel. |
required |
Returns:
Type | Description |
---|---|
Datamodel |
The newly created Datamodel object. |
create_data_connection(self, client, host, password, system_number, user, name, connector_type, uplink_id=None, use_uplink=True, compression_type='GZIP', **kwargs)
¶
Creates a new Data Connection (Currently, only SAP connection are supported).
Warning
This method is deprecated and will be removed in the next release. Use the online wizard to set up Data Connections.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
client |
str |
Client. |
required |
host |
str |
Host. |
required |
user |
str |
Username. |
required |
password |
str |
Password. |
required |
system_number |
str |
System Number. |
required |
name |
str |
Name of the Data Connection. |
required |
connector_type |
str |
Type of the Data Connection. One of ['SAP']. |
required |
uplink_id |
str |
ID of an Uplink Connection. |
None |
use_uplink |
bool |
Whether to use an Uplink Connection or not. |
True |
compression_type |
str |
Compression Type. |
'GZIP' |
**kwargs |
{} |
Returns:
Type | Description |
---|---|
DataConnection |
The newly created Data Connection. |
move(self, to)
¶
Moves the Pool to another team.
API
POST: /integration/api/pools/move
Parameters:
Name | Type | Description | Default |
---|---|---|---|
to |
str |
Name of the host domain (e.g. |
required |
create_pool_parameter(self, pool_variable=None, name=None, placeholder=None, description=None, data_type='STRING', var_type='PUBLIC_CONSTANT', values=None)
¶
Creates a new Variable with the specified properties in the Pool.
API
POST: /integration/api/pools/{pool_id}/variables/
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pool_variable |
Union[Dict, PoolParameter] |
Pool Parameter object or dictionary (see API), if |
None |
name |
str |
Name of the Variable (same as |
None |
placeholder |
str |
Placeholder of the Variable. |
None |
description |
str |
Description of the Variable. |
None |
data_type |
str |
Data type of the Variable (see options |
'STRING' |
var_type |
str |
Type of the Variable (see options |
'PUBLIC_CONSTANT' |
values |
List |
List of Variable values. |
None |
Returns:
Type | Description |
---|---|
PoolParameter |
The newly create Pool Parameter object. |
create_data_job(self, name, data_source_id=None)
¶
Creates a new Data Job with the specified name in the Pool.
API
POST: /integration/api/pools/{pool_id}/jobs/
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str |
Name of the Data Job. |
required |
data_source_id |
str |
ID of the Data Source that the new Data Job will be connected to. If not specified, the default global source will be connected to. |
None |
Returns:
Type | Description |
---|---|
DataJob |
The newly created Data Job object. |