data_pool_table

Module to interact with data pool tables.

This module contains class to interact with data pool tables in EMS data integration.

Typical usage example

tables = data_pool.get_tables()
data_pool_table = data_pool.create_table(df, "TEST_TABLE")
data_pool_table.append(df)
data_pool_table.upsert(df, keys=["PRIMARY_KEY_COLUMN"])

DataPoolTable ¶

Bases: PoolTable

Data model table object to interact with data model table specific data integration endpoints.

client `instance-attribute` `class-attribute` ¶

client: Client = Field(Ellipsis, exclude=True)

name `instance-attribute` ¶

name: str

Name of data pool table.

data_pool_id `instance-attribute` ¶

data_pool_id: str

Id of data pool where table is located.

data_source_id `instance-attribute` ¶

data_source_id: typing.Optional[str]

Id of data connection where table is located.

columns `instance-attribute` ¶

columns: typing.Optional[
    typing.List[typing.Optional[PoolColumn]]
]

Columns of data pool table.

from_transport `classmethod` ¶

from_transport(client, data_pool_id, pool_table_transport)

Creates high-level data pool table object from given PoolTable.

Parameters:

client (Client) –

Client to use to make API calls for given data pool table.
data_pool_id (str) –

Id of data pool where table is located
pool_table_transport (PoolTable) –

PoolTable object containing properties of data pool table.

Returns:

DataPoolTable –

A DataPoolTable object with properties from transport and given client.

sync ¶

sync()

Syncs data pool table properties with EMS.

get_columns ¶

get_columns()

Gets all table columns of given table.

Returns:

CelonisCollection[typing.Optional[PoolColumn]] –

A list containing all columns of table.

upsert ¶

upsert(df, keys, chunk_size=100000, index=False, **kwargs)

Upserts data frame to existing table in data pool.

Parameters:

df (pd.DataFrame) –

DataFrame to push to existing table.
keys (typing.List[str]) –

Primary keys of table.
chunk_size (int) –

Number of rows to push in one chunk.
index (typing.Optional[bool]) –

Whether index is included in parquet file that is pushed. Default False. See pandas documentation.
**kwargs (typing.Any) –

Additional parameters set for DataPushJob object.

Returns:

None –

The updated table object.

Raises:

PyCelonisTableDoesNotExistError –

Raised if table does not exist in data pool.
PyCelonisDataPushExecutionFailedError –

Raised when table creation fails.

Examples:

Upsert new data to table:

df = pd.DataFrame({"ID": ["aa", "bb", "cc"], "TEST_COLUMN": [1,2, 3]})

pool_table = data_pool.get_table("TEST_TABLE")
data_pool_table.upsert(df, keys=["ID"])

append ¶

append(df, chunk_size=100000, index=False, **kwargs)

Appends data frame to existing table in data pool.

Parameters:

df (pd.DataFrame) –

DataFrame to push to existing table.
chunk_size (int) –

Number of rows to push in one chunk.
index (typing.Optional[bool]) –

Whether index is included in parquet file that is pushed. Default False. See pandas documentation.
**kwargs (typing.Any) –

Additional parameters set for NewTaskInstanceTransport object.

Returns:

None –

The updated table object.

Raises:

PyCelonisTableDoesNotExistError –

Raised if table does not exist in data pool
PyCelonisDataPushExecutionFailedError –

Raised when table creation fails

Examples:

Append new data to table:

df = pd.DataFrame({"ID": ["aa", "bb", "cc"], "TEST_COLUMN": [1,2, 3]})

pool_table = data_pool.get_table("TEST_TABLE")
data_pool_table.append(df)

data_pool_table

DataPoolTable ¶

client instance-attribute class-attribute ¶

name instance-attribute ¶

data_pool_id instance-attribute ¶

data_source_id instance-attribute ¶

columns instance-attribute ¶

from_transport classmethod ¶

sync ¶

get_columns ¶

upsert ¶

append ¶

client `instance-attribute` `class-attribute` ¶

name `instance-attribute` ¶

data_pool_id `instance-attribute` ¶

data_source_id `instance-attribute` ¶

columns `instance-attribute` ¶

from_transport `classmethod` ¶