Arithmetic Operations and Aggregations¶

In this tutorial, you will learn:

How to apply arithmetic operations such as + or -
How to apply aggregation functions
How to apply pull up aggregation functions

Prerequisites:¶

To follow this tutorial, you need to have a Celonis EMS team set up and be familiar with the basic functionalities of PyCelonis.

Tutorial¶

1. Connect to your data model¶

To get started with SaolaPy, we first need to connect to our data model:

In [ ]:

Copied!





from pycelonis import get_celonis
import pycelonis.pql as pql
import pandas as pd

c = get_celonis()
data_pool = c.data_integration.get_data_pools().find("<YOUR_DATA_POOL>")
data_model = data_pool.get_data_models().find("<YOUR_DATA_MODEL>")
from pycelonis import get_celonis
import pycelonis.pql as pql
import pandas as pd

c = get_celonis()
data_pool = c.data_integration.get_data_pools().find("")
data_model = data_pool.get_data_models().find("")

To avoid always having to specify the data model we set it as our global default data model:

In [2]:

Copied!

from pycelonis.config import Config

Config.DEFAULT_DATA_MODEL = data_model
from pycelonis.config import Config

Config.DEFAULT_DATA_MODEL = data_model

For convenience, we extract the columns of tables we want to use:

In [3]:

Copied!

ekpo_columns = data_model.get_tables().find("EKPO").get_columns()
ekbe_columns = data_model.get_tables().find("EKBE").get_columns()
ekpo_columns = data_model.get_tables().find("EKPO").get_columns()
ekbe_columns = data_model.get_tables().find("EKBE").get_columns()

2. Arithmetic Operations¶

Let's first create two series to work with:

In [4]:

Copied!

net_price_series = pql.Series(ekpo_columns.find("NETPR"))
menge_series = pql.Series(ekpo_columns.find("MENGE"))
net_price_series = pql.Series(ekpo_columns.find("NETPR"))
menge_series = pql.Series(ekpo_columns.find("MENGE"))

SaolaPy supports all basic arithmetic operations supported by pandas. Arithmetic operations can be applied to combine series, data frames, or scalars:

In [5]:

Copied!

result = net_price_series * menge_series
result = result * 5
result = net_price_series * menge_series
result = result * 5

Applying arithmetic operations does not perform any computations yet and only adjusts the query of the series to include the operation:

In [6]:

Copied!

result.data
result.data

Out[6]:

( ( "EKPO"."NETPR" * "EKPO"."MENGE" ) * 5 )

Operations can be chained as needed:

In [7]:

Copied!

result = (net_price_series + 10) * ((menge_series / 3) - 2)
result.data
result = (net_price_series + 10) * ((menge_series / 3) - 2)
result.data

Out[7]:

( ( "EKPO"."NETPR" + 10 ) * ( ( "EKPO"."MENGE" / 3 ) - 2 ) )

Other supported arithmetic operations are:

In [8]:

Copied!





# Addition
result = net_price_series + 10

# Subtraction
result = net_price_series - 10
 
# Multiplication
result = net_price_series * 10

# Division
result = net_price_series / 10

# Floor Division
result = net_price_series // 10

# Power
result = net_price_series ** 2

# Modulo
result = net_price_series % 2
# Addition
result = net_price_series + 10

# Subtraction
result = net_price_series - 10
 
# Multiplication
result = net_price_series * 10

# Division
result = net_price_series / 10

# Floor Division
result = net_price_series // 10

# Power
result = net_price_series ** 2

# Modulo
result = net_price_series % 2

SaolaPy also supports assignment operators:

In [9]:

Copied!





# Addition
result += 10

# Subtraction
result -= 10
 
# Multiplication
result *= 10

# Division
result /= 10

# Floor Division
result //= 10

# Power
result **= 2

# Modulo
result %= 2
# Addition
result += 10

# Subtraction
result -= 10
 
# Multiplication
result *= 10

# Division
result /= 10

# Floor Division
result //= 10

# Power
result **= 2

# Modulo
result %= 2

3. Aggregations¶

SaolaPy supports the most common pandas aggregation functions as well. Unlike arithmetic operations, aggregation functions are directly executed and the result is exported:

Supported aggregation functions include:

In [19]:

Copied!





print("Mean:", net_price_series.mean())
print("Minimum:", net_price_series.min())
print("Maximum:", net_price_series.max())
print("Sum:", net_price_series.sum())
print("Product:", net_price_series.product())
print("Count:", net_price_series.count())
print("Mode:", net_price_series.mode().iloc[0])
print("Median:", net_price_series.median())
print("Quantile:", net_price_series.quantile(0.3))
print("Variance:", net_price_series.var())
print("Standard Deviation:", net_price_series.std())
print("Mean:", net_price_series.mean())
print("Minimum:", net_price_series.min())
print("Maximum:", net_price_series.max())
print("Sum:", net_price_series.sum())
print("Product:", net_price_series.product())
print("Count:", net_price_series.count())
print("Mode:", net_price_series.mode().iloc[0])
print("Median:", net_price_series.median())
print("Quantile:", net_price_series.quantile(0.3))
print("Variance:", net_price_series.var())
print("Standard Deviation:", net_price_series.std())

Mean: 5989.272307177111
Minimum: 0.0
Maximum: 76417000.0
Sum: 1949466211.0799925
Product: nan
Count: 325493
Mode: 47.2
Median: 63.54
Quantile: 18.91
Variance: 45592292389.19145
Standard Deviation: 213523.51718064092

4. Pull Up Aggregations¶

The Pull-Up-functions in PQL allow you to aggregate a column based on another table. You can define the target table to which the entries of a column from another table are pulled.

SaolaPy also supports pull up functions for series. You only have to specify the table to pull the results to and the aggregation function:

In [11]:

Copied!

menge_series = pql.Series(ekbe_columns.find("MENGE"))

pu_menge_series = menge_series.pu("EKPO").mean()
menge_series = pql.Series(ekbe_columns.find("MENGE"))

pu_menge_series = menge_series.pu("EKPO").mean()

Afterwards the series can for example be added to a DataFrame of the EKPO table:

In [12]:

Copied!





df = pql.DataFrame({
    "BUKRS": ekpo_columns.find("BUKRS"),
    "PU_MENGE": pu_menge_series
})
df.head()
df = pql.DataFrame({
    "BUKRS": ekpo_columns.find("BUKRS"),
    "PU_MENGE": pu_menge_series
})
df.head()

Out[12]:

	BUKRS	PU_MENGE
Index
0	GB03	6.0
1	GB03	20.0
2	5100	1.0
3	5100	1.0
4	5100	3.0

Other supported PU functions include:

In [13]:

Copied!





# Mean
pu_menge_series = menge_series.pu("EKPO").mean()

# Minimum
pu_menge_series = menge_series.pu("EKPO").min()

# Maximum
pu_menge_series = menge_series.pu("EKPO").max()

# Sum
pu_menge_series = menge_series.pu("EKPO").sum()

# Product
pu_menge_series = menge_series.pu("EKPO").product()

# Count
pu_menge_series = menge_series.pu("EKPO").count()

# Count distinct
pu_menge_series = menge_series.pu("EKPO").count_distinct()

# Mode
pu_menge_series = menge_series.pu("EKPO").mode()

# Median
pu_menge_series = menge_series.pu("EKPO").median()

# Standard deviation
pu_menge_series = menge_series.pu("EKPO").std()
# Mean
pu_menge_series = menge_series.pu("EKPO").mean()

# Minimum
pu_menge_series = menge_series.pu("EKPO").min()

# Maximum
pu_menge_series = menge_series.pu("EKPO").max()

# Sum
pu_menge_series = menge_series.pu("EKPO").sum()

# Product
pu_menge_series = menge_series.pu("EKPO").product()

# Count
pu_menge_series = menge_series.pu("EKPO").count()

# Count distinct
pu_menge_series = menge_series.pu("EKPO").count_distinct()

# Mode
pu_menge_series = menge_series.pu("EKPO").mode()

# Median
pu_menge_series = menge_series.pu("EKPO").median()

# Standard deviation
pu_menge_series = menge_series.pu("EKPO").std()

5. Arithmetic Operations and Aggregations for DataFrames¶

Most arithmetic operations and aggregations can also be applied to DataFrames. The operation is then applied to every column of the DataFrame:

In [14]:

Copied!





df = pql.DataFrame({
    "MENGE": ekpo_columns.find("MENGE"),
    "NETPR": ekpo_columns.find("NETPR"),
})
df = pql.DataFrame({
    "MENGE": ekpo_columns.find("MENGE"),
    "NETPR": ekpo_columns.find("NETPR"),
})

In [15]:

Copied!

result_df = df * 100

print(result_df)
result_df.head()
result_df = df * 100

print(result_df)
result_df.head()

DataFrame(data={'MENGE': '( "EKPO"."MENGE" * 100 )', 'NETPR': '( "EKPO"."NETPR" * 100 )'}, index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[])

Out[15]:

	MENGE	NETPR
Index
0	600.0	16560.0
1	2000.0	11101.0
2	100.0	14950.0
3	100.0	37260.0
4	300.0	26580.0

If the other operand is a series it will be applied to the column with matching index:

In [16]:

Copied!

result_df = df + pd.Series([-1000, 1000], index=["MENGE", "NETPR"])

print(result_df)
result_df.head()
result_df = df + pd.Series([-1000, 1000], index=["MENGE", "NETPR"])

print(result_df)
result_df.head()

DataFrame(data={'MENGE': '( "EKPO"."MENGE" + -1000 )', 'NETPR': '( "EKPO"."NETPR" + 1000 )'}, index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[])

Out[16]:

	MENGE	NETPR
Index
0	-994.0	1165.60
1	-980.0	1111.01
2	-999.0	1149.50
3	-999.0	1372.60
4	-997.0	1265.80

Applying aggregation methods returns a pandas Series with the aggregated results:

In [17]:

Copied!

df.mean()
df.mean()

Out[17]:

MENGE    1753.993904
NETPR    5989.272307
Name: 0, dtype: float64

Bringing everything together, one can for example standardize the data:

In [18]:

Copied!

standardized_df = (df - df.mean()) / df.std()
standardized_df.head()
standardized_df = (df - df.mean()) / df.std()
standardized_df.head()

Out[18]:

	MENGE	NETPR
Index
0	-0.018224	-0.027274
1	-0.018078	-0.027530
2	-0.018276	-0.027350
3	-0.018276	-0.026305
4	-0.018256	-0.026805

Conclusion¶

Congratulations! You have learned how to apply arithmetic operations and aggregations to SaolaPy Series and DataFrames.