Celonis Root Cause Analysis Tutorial¶
This automated root cause analysis module searches in all dimensions (First Level root causes) of the selected tables for single dimensions that have a bad performance with respect to the defined KPIs . E.g. if you have a certain vendor, plan, country, city , customer etc where the kpi is particularly low.
0. Import pycelonis modules and connect to the datamodel
from pycelonis import get_celonis
from pycelonis.root_cause_analysis import root_cause_analysis
celonis = get_celonis()
datamodel = celonis.datamodels.find("your_datamodel_id")
2021-01-25 10:59:29 - pycelonis: Login successful! The Application Key currently has access to 416 Analyses and to 1 Data Pools.
1. Define KPIs
Please include your kpis in the dictionary kpis
. The key
should be name of kpi and value
the pql query. You can include multiple kpis at once.
- Important: The pql query needs to be in the following format:
CASE WHEN "WANTED_BEHAVIOUR" THEN 0 WHEN "UNWANTED_BEHAVIOUR" THEN 1 ELSE NULL END
# Enter your kpis here, the one in the notebook is only an example:
kpis = {'late_payment': """
CASE WHEN PU_COUNT(EKPO, "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN", "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" = 'Clear Invoice')=0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'), TODAY()) > 0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'), TODAY()) <= 60
THEN NULL
WHEN PU_COUNT(EKPO, "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN", "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" = 'Clear Invoice')>0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'),
PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Clear Invoice')) > 30
THEN
1
WHEN PU_COUNT(EKPO, "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN", "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" = 'Clear Invoice')>0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'),
PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Clear Invoice')) <30
THEN 0
ELSE NULL
END
"""}
1.1 Check query
Below you can check if the provided kpi queries are valid pql statement. Just enter the kpi name an execute. If a no error occurs it works.
from pycelonis.celonis_api.pql.pql import PQL, PQLColumn
kpi_name = "...."
datamodel.get_data_frame(PQL(PQLColumn(kpis[kpi_name], kpi_name)))
2. Root Cause Analysis
The Pycelonis Root Cause Analysis searches the selected tables for dimensions, categories/field values which have a high correlation with the given kpis, and returns those in an ordered table.
2.1 Define Parameters
Uncomment the cell below and run it to look for tables in the model
# datamodel.tables
Please define the following parameters:
SELECTED_TABLES
: A list of the datamodel table names, which you want to include in the search.CHUNK_SIZE
: Size of chunks to be extracted (best leave as is).CELONIS_FILTER
: Filter string or list of filter strings used if you want to limit the search to e.g. a specific plant. (optional)
# Define parameters
SELECTED_TABLES=['EKPO']
CELONIS_FILTER=None
CHUNK_SIZE=20
2.2 Run Root Cause Analysis
# Create root cause analysis object
RCA = root_cause_analysis.RCA(kpis=kpis, datamodel=datamodel, selected_tables=SELECTED_TABLES, celonis_filter=CELONIS_FILTER, chunk_size=CHUNK_SIZE)
# Run RCA
rca_result = RCA.apply()
0%| | 0/15 [00:00<?, ?it/s]
2021-01-25 11:15:38 - pycelonis: PQL Export running, status: {'id': '4eb1d82b-f6cc-4527-99a2-f0c8d1db2d1e', 'exportStatus': 'RUNNING', 'created': None, 'message': None, 'exportType': 'PARQUET'} 2021-01-25 11:15:47 - numexpr.utils: Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2021-01-25 11:15:47 - numexpr.utils: NumExpr defaulting to 8 threads.
100%|██████████| 15/15 [01:04<00:00, 4.30s/it]
Interpreting the results
Below you can see the results of the root cause search:
Interpretation
Field Name
: Name of table + column where Field Value is located in.Field Value
: A Category in the Field Name column.KPI Name
: Name of the kpi that the row is corresponding to.KPI Value
: avg. KPI for all items of category Field Value (%)Lift
: The lift measures the deviation of the category KPI from the base line KPI value (averaged on the whole datamodel). This is the most important measure in this table. A lift > 1 means the category is performing WORSE than the average. A lift < 1 means the category is performing BETTER than the average. Refer to this article to better understand the Lift (https://www.kdnuggets.com/2016/03/lift-analysis-data-scientist-secret-weapon.html)Percentage of Cases
: Percentage of Cases that fall into the cateogory in Field Value, if this is high its a more frequent cateogry.Percentage of Bad Cases
: Percentage of Cases where kpi = "unwanted behaviour" that fall into the cateogory in Field Value, if this is high you know that a lot of the cases with unwanted hehaviour fall into this category.Number of Cases
: Total of Cases that fall into the cateogory in Field Value, if this is high its a more frequent cateogry.Number of Bad Cases
: Total of Cases where kpi = "unwanted behaviour" that fall into the cateogory in Field Value, if this is high you know that a lot of the cases with unwanted hehaviour fall into this category.
# Show results
rca_result
Field Name | Field Value | KPI Name | KPI Value | Lift | Percentage of Cases | Percentage of Bad Cases | Number Bad Cases | Number Cases | |
---|---|---|---|---|---|---|---|---|---|
43 | EKPO.TXZ01 | Color display | late_payment | 50.000000 | 12.625000 | 0.990099 | 12.5 | 4.0 | 8 |
86 | EKPO.PRDAT | 2002-02-04 00:00:00 | late_payment | 40.000000 | 10.100000 | 2.475248 | 25.0 | 8.0 | 20 |
27 | EKPO.MATKL | 008 | late_payment | 33.333333 | 8.416667 | 1.485149 | 12.5 | 4.0 | 12 |
34 | EKPO.MATNR | DPC1016 | late_payment | 25.000000 | 6.312500 | 1.980198 | 12.5 | 4.0 | 16 |
30 | EKPO.MATKL | R1114 | late_payment | 20.000000 | 5.050000 | 2.475248 | 12.5 | 4.0 | 20 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41 | EKPO.WERKS | R310 | late_payment | 0.000000 | 0.000000 | 4.455446 | 0.0 | 0.0 | 36 |
54 | EKPO.MTART | FRIP | late_payment | 0.000000 | 0.000000 | 7.425743 | 0.0 | 0.0 | 60 |
5 | EKPO.AEDAT | Weekday 1 | late_payment | 0.000000 | 0.000000 | 9.900990 | 0.0 | 0.0 | 80 |
66 | EKPO.MEINS | KAR | late_payment | 0.000000 | 0.000000 | 12.871287 | 0.0 | 0.0 | 104 |
64 | EKPO.LGORT | missing | late_payment | 0.000000 | 0.000000 | 8.415842 | 0.0 | 0.0 | 68 |
89 rows × 9 columns
3. (Optional) Push root cause analysis result table back to the data pool for further analysis
datamodel.push_table(result, "ROOT_CAUSE_RESULT", if_exists="replace", reload_datamodel=False)
datamodel.reload(from_cache=False)