Arithmetic Operations and Aggregations¶
In this tutorial, you will learn:
- How to apply arithmetic operations such as
+
or-
- How to apply aggregation functions
- How to apply pull up aggregation functions
Prerequisites:¶
To follow this tutorial, you need to have a Celonis EMS team set up and be familiar with the basic functionalities of PyCelonis.
Tutorial¶
1. Connect to your data model¶
To get started with SaolaPy, we first need to connect to our data model:
from pycelonis import get_celonis
import pycelonis.pql as pql
import pandas as pd
c = get_celonis()
data_pool = c.data_integration.get_data_pools().find("<YOUR_DATA_POOL>")
data_model = data_pool.get_data_models().find("<YOUR_DATA_MODEL>")
To avoid always having to specify the data model we set it as our global default data model:
from pycelonis.config import Config
Config.DEFAULT_DATA_MODEL = data_model
For convenience, we extract the columns of tables we want to use:
ekpo_columns = data_model.get_tables().find("EKPO").get_columns()
ekbe_columns = data_model.get_tables().find("EKBE").get_columns()
2. Arithmetic Operations¶
Let's first create two series to work with:
net_price_series = pql.Series(ekpo_columns.find("NETPR"))
menge_series = pql.Series(ekpo_columns.find("MENGE"))
SaolaPy supports all basic arithmetic operations supported by pandas. Arithmetic operations can be applied to combine series, data frames, or scalars:
result = net_price_series * menge_series
result = result * 5
Applying arithmetic operations does not perform any computations yet and only adjusts the query of the series to include the operation:
result.data
( ( "EKPO"."NETPR" * "EKPO"."MENGE" ) * 5 )
Operations can be chained as needed:
result = (net_price_series + 10) * ((menge_series / 3) - 2)
result.data
( ( "EKPO"."NETPR" + 10 ) * ( ( "EKPO"."MENGE" / 3 ) - 2 ) )
Other supported arithmetic operations are:
# Addition
result = net_price_series + 10
# Subtraction
result = net_price_series - 10
# Multiplication
result = net_price_series * 10
# Division
result = net_price_series / 10
# Floor Division
result = net_price_series // 10
# Power
result = net_price_series ** 2
# Modulo
result = net_price_series % 2
SaolaPy also supports assignment operators:
# Addition
result += 10
# Subtraction
result -= 10
# Multiplication
result *= 10
# Division
result /= 10
# Floor Division
result //= 10
# Power
result **= 2
# Modulo
result %= 2
3. Aggregations¶
SaolaPy supports the most common pandas aggregation functions as well. Unlike arithmetic operations, aggregation functions are directly executed and the result is exported:
Supported aggregation functions include:
print("Mean:", net_price_series.mean())
print("Minimum:", net_price_series.min())
print("Maximum:", net_price_series.max())
print("Sum:", net_price_series.sum())
print("Product:", net_price_series.product())
print("Count:", net_price_series.count())
print("Mode:", net_price_series.mode().iloc[0])
print("Median:", net_price_series.median())
print("Quantile:", net_price_series.quantile(0.3))
print("Variance:", net_price_series.var())
print("Standard Deviation:", net_price_series.std())
Mean: 5989.272307177111 Minimum: 0.0 Maximum: 76417000.0 Sum: 1949466211.0799925 Product: nan Count: 325493 Mode: 47.2 Median: 63.54 Quantile: 18.91 Variance: 45592292389.19145 Standard Deviation: 213523.51718064092
4. Pull Up Aggregations¶
The Pull-Up-functions in PQL allow you to aggregate a column based on another table. You can define the target table to which the entries of a column from another table are pulled.
SaolaPy also supports pull up functions for series. You only have to specify the table to pull the results to and the aggregation function:
menge_series = pql.Series(ekbe_columns.find("MENGE"))
pu_menge_series = menge_series.pu("EKPO").mean()
Afterwards the series can for example be added to a DataFrame of the EKPO table:
df = pql.DataFrame({
"BUKRS": ekpo_columns.find("BUKRS"),
"PU_MENGE": pu_menge_series
})
df.head()
BUKRS | PU_MENGE | |
---|---|---|
Index | ||
0 | GB03 | 6.0 |
1 | GB03 | 20.0 |
2 | 5100 | 1.0 |
3 | 5100 | 1.0 |
4 | 5100 | 3.0 |
Other supported PU functions include:
# Mean
pu_menge_series = menge_series.pu("EKPO").mean()
# Minimum
pu_menge_series = menge_series.pu("EKPO").min()
# Maximum
pu_menge_series = menge_series.pu("EKPO").max()
# Sum
pu_menge_series = menge_series.pu("EKPO").sum()
# Product
pu_menge_series = menge_series.pu("EKPO").product()
# Count
pu_menge_series = menge_series.pu("EKPO").count()
# Count distinct
pu_menge_series = menge_series.pu("EKPO").count_distinct()
# Mode
pu_menge_series = menge_series.pu("EKPO").mode()
# Median
pu_menge_series = menge_series.pu("EKPO").median()
# Standard deviation
pu_menge_series = menge_series.pu("EKPO").std()
5. Arithmetic Operations and Aggregations for DataFrames¶
Most arithmetic operations and aggregations can also be applied to DataFrames. The operation is then applied to every column of the DataFrame:
df = pql.DataFrame({
"MENGE": ekpo_columns.find("MENGE"),
"NETPR": ekpo_columns.find("NETPR"),
})
result_df = df * 100
print(result_df)
result_df.head()
DataFrame(data={'MENGE': '( "EKPO"."MENGE" * 100 )', 'NETPR': '( "EKPO"."NETPR" * 100 )'}, index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[])
MENGE | NETPR | |
---|---|---|
Index | ||
0 | 600.0 | 16560.0 |
1 | 2000.0 | 11101.0 |
2 | 100.0 | 14950.0 |
3 | 100.0 | 37260.0 |
4 | 300.0 | 26580.0 |
If the other operand is a series it will be applied to the column with matching index:
result_df = df + pd.Series([-1000, 1000], index=["MENGE", "NETPR"])
print(result_df)
result_df.head()
DataFrame(data={'MENGE': '( "EKPO"."MENGE" + -1000 )', 'NETPR': '( "EKPO"."NETPR" + 1000 )'}, index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[])
MENGE | NETPR | |
---|---|---|
Index | ||
0 | -994.0 | 1165.60 |
1 | -980.0 | 1111.01 |
2 | -999.0 | 1149.50 |
3 | -999.0 | 1372.60 |
4 | -997.0 | 1265.80 |
Applying aggregation methods returns a pandas Series with the aggregated results:
df.mean()
MENGE 1753.993904 NETPR 5989.272307 Name: 0, dtype: float64
Bringing everything together, one can for example standardize the data:
standardized_df = (df - df.mean()) / df.std()
standardized_df.head()
MENGE | NETPR | |
---|---|---|
Index | ||
0 | -0.018224 | -0.027274 |
1 | -0.018078 | -0.027530 |
2 | -0.018276 | -0.027350 |
3 | -0.018276 | -0.026305 |
4 | -0.018256 | -0.026805 |
Conclusion¶
Congratulations! You have learned how to apply arithmetic operations and aggregations to SaolaPy Series and DataFrames.