Skip to content

data_pool_table

Module to interact with data pool tables.

This module contains class to interact with data pool tables in EMS data integration.

Typical usage example
tables = data_pool.get_tables()
data_pool_table = data_pool.create_table(df, "TEST_TABLE")
data_pool_table.append(df)
data_pool_table.upsert(df, keys=["PRIMARY_KEY_COLUMN"])

DataPoolTable

Bases: PoolTable

Data model table object to interact with data model table specific data integration endpoints.

client instance-attribute class-attribute

client: Client = Field(Ellipsis, exclude=True)

name instance-attribute

name: str

Name of data pool table.

data_pool_id instance-attribute

data_pool_id: str

Id of data pool where table is located.

data_source_id instance-attribute

data_source_id: typing.Optional[str]

Id of data connection where table is located.

columns instance-attribute

columns: typing.Optional[
    typing.List[typing.Optional[PoolColumn]]
]

Columns of data pool table.

from_transport classmethod

from_transport(client, data_pool_id, pool_table_transport)

Creates high-level data pool table object from given PoolTable.

Parameters:

  • client (Client) –

    Client to use to make API calls for given data pool table.

  • data_pool_id (str) –

    Id of data pool where table is located

  • pool_table_transport (PoolTable) –

    PoolTable object containing properties of data pool table.

Returns:

  • DataPoolTable

    A DataPoolTable object with properties from transport and given client.

sync

sync()

Syncs data pool table properties with EMS.

get_columns

get_columns()

Gets all table columns of given table.

Returns:

upsert

upsert(df, keys, chunk_size=100000, index=False, **kwargs)

Upserts data frame to existing table in data pool.

Parameters:

  • df (pd.DataFrame) –

    DataFrame to push to existing table.

  • keys (typing.List[str]) –

    Primary keys of table.

  • chunk_size (int) –

    Number of rows to push in one chunk.

  • index (typing.Optional[bool]) –

    Whether index is included in parquet file that is pushed. Default False. See pandas documentation.

  • **kwargs (typing.Any) –

    Additional parameters set for DataPushJob object.

Returns:

  • None

    The updated table object.

Raises:

  • PyCelonisTableDoesNotExistError

    Raised if table does not exist in data pool.

  • PyCelonisDataPushExecutionFailedError

    Raised when table creation fails.

Examples:

Upsert new data to table:

df = pd.DataFrame({"ID": ["aa", "bb", "cc"], "TEST_COLUMN": [1,2, 3]})

pool_table = data_pool.get_table("TEST_TABLE")
data_pool_table.upsert(df, keys=["ID"])

append

append(df, chunk_size=100000, index=False, **kwargs)

Appends data frame to existing table in data pool.

Parameters:

  • df (pd.DataFrame) –

    DataFrame to push to existing table.

  • chunk_size (int) –

    Number of rows to push in one chunk.

  • index (typing.Optional[bool]) –

    Whether index is included in parquet file that is pushed. Default False. See pandas documentation.

  • **kwargs (typing.Any) –

    Additional parameters set for NewTaskInstanceTransport object.

Returns:

  • None

    The updated table object.

Raises:

  • PyCelonisTableDoesNotExistError

    Raised if table does not exist in data pool

  • PyCelonisDataPushExecutionFailedError

    Raised when table creation fails

Examples:

Append new data to table:

df = pd.DataFrame({"ID": ["aa", "bb", "cc"], "TEST_COLUMN": [1,2, 3]})

pool_table = data_pool.get_table("TEST_TABLE")
data_pool_table.append(df)