Series¶
Series are a core concept of Pandas and SaolaPy and represent a single column of one-dimensional data.
In this tutorial, you will learn:
- How to create a SaolaPy Series
- How to apply basic functions to a SaolaPy Series
Prerequisites:¶
To follow this tutorial, you need to have a Celonis EMS team set up and be familiar with the basic functionalities of PyCelonis.
Tutorial¶
1. Connect to your data model¶
To get started with SaolaPy, we first need to connect to our data model:
from pycelonis import get_celonis
import pycelonis.pql as pql
c = get_celonis()
data_pool = c.data_integration.get_data_pools().find("<YOUR_DATA_POOL>")
data_model = data_pool.get_data_models().find("<YOUR_DATA_MODEL>")
To avoid always having to specify the data model we set it as our global default data model:
from pycelonis.config import Config
Config.DEFAULT_DATA_MODEL = data_model
For convenience, we extract the columns of tables we want to use:
ekpo_columns = data_model.get_tables().find("EKPO").get_columns()
2. SaolaPy Series Initialization¶
Now it's time to start working with SaolaPy Series. There are several ways to instantiate a SaolaPy Series.
First, we can directly pass a column from the default data model:
net_price_series = pql.Series(ekpo_columns.find("NETPR"))
net_price_series
Series(data='"EKPO"."NETPR"', index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[], name='NETPR')
This will initialize a Series with the given PQL column and a regular RangeIndex
.
Alternatively, we can specify the data model explicitly (e.g. in case we want to use a different data model than the default):
net_price_series = pql.Series(ekpo_columns.find("NETPR"), data_model=data_model)
net_price_series
Series(data='"EKPO"."NETPR"', index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[], name='NETPR')
It is also possible to instantiate a series using a PQL query string (generally all PQL query strings can be used but there might be some restrictions based on the PQL language):
net_price_series = pql.Series('"EKPO"."NETPR" + 10')
net_price_series
Series(data='"EKPO"."NETPR" + 10', index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[], name='Series')
We can also give the series a custom name by setting the name
attribute:
bukrs_series = pql.Series('"EKPO"."BUKRS"', name="Cost Center")
bukrs_series
Series(data='"EKPO"."BUKRS"', index=RangeIndex(name='Index', start=0, step=1), filters=[], order_by_columns=[], name='Cost Center')
3. Basic Series Functions¶
Now that we have a series we can examine it's content with the following functions.
To get the number of rows we can call the shape
function:
net_price_series.shape
(325493,)
For a sample of the data we can get the first rows using the head
function:
net_price_series.head()
Index 0 175.60 1 121.01 2 159.50 3 382.60 4 275.80 Name: Series, dtype: float64
You can export the full series using the to_pandas
function:
net_price_series.to_pandas()
Index 0 175.60 1 121.01 2 159.50 3 382.60 4 275.80 ... 325488 46.94 325489 1127.50 325490 124.00 325491 229.51 325492 505.81 Name: Series, Length: 325493, dtype: float64
We can also sort the data using the sort_values
function:
net_price_series.sort_values(ascending=False).to_pandas()
Index 0 76417010.00 1 50000010.00 2 31846575.50 3 25000010.00 4 24643442.50 ... 325488 10.01 325489 10.01 325490 10.00 325491 10.00 325492 10.00 Name: Series, Length: 325493, dtype: float64
If we want to know all unique values of a series we can use the unique
function:
bukrs_series.unique()
array(['4575', '5100', 'AT01', 'CN01', 'FR01', 'GB03', 'GB99', 'HU01', 'MY01', 'SE01', 'SG01'], dtype=object)
And for their distribution we can use the value_counts
function:
bukrs_series.value_counts()
Cost Center 5100 90984 SG01 69590 CN01 57458 GB03 44413 4575 28862 MY01 25939 HU01 7537 FR01 394 GB99 219 AT01 96 SE01 1 Name: COUNT_Cost Center, dtype: int64
In the data science workflow it is common to exclude none values which can be achieved using the dropna
function:
net_price_series = net_price_series.dropna()
Lastly, you can convert your series to a different data type:
net_price_series.astype(str)
Note: The conversion uses the PQL operators TO_STRING
, TO_INT
, and TO_FLOAT
and is therefore only possible for the allowed input data types.
Conclusion¶
Congratulations! You have learned how to use the basic functionalities of a SaolaPy Series.