Celonis Root Cause Analysis Tutorial¶
This automated root cause analysis module searches in all dimensions (First Level root causes) of the selected tables for single dimensions that have a bad performance with respect to the defined KPIs . E.g. if you have a certain vendor, plan, country, city , customer etc where the kpi is particularly low.
0. Import pycelonis modules and connect to the datamodel
from pycelonis import get_celonis
from pycelonis.root_cause_analysis import root_cause_analysis
celonis = get_celonis()
datamodel = celonis.datamodels.find("your_datamodel_id")
2021-01-25 10:59:29 - pycelonis: Login successful! The Application Key currently has access to 416 Analyses and to 1 Data Pools.
1. Define KPIs
Please include your kpis in the dictionary kpis. The key should be name of kpi and value the pql query. You can include multiple kpis at once.
- Important: The pql query needs to be in the following format:
CASE WHEN "WANTED_BEHAVIOUR" THEN 0 WHEN "UNWANTED_BEHAVIOUR" THEN 1 ELSE NULL END
# Enter your kpis here, the one in the notebook is only an example:
kpis = {'late_payment': """
CASE WHEN PU_COUNT(EKPO, "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN", "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" = 'Clear Invoice')=0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'), TODAY()) > 0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'), TODAY()) <= 60
THEN NULL
WHEN PU_COUNT(EKPO, "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN", "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" = 'Clear Invoice')>0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'),
PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Clear Invoice')) > 30
THEN
1
WHEN PU_COUNT(EKPO, "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN", "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" = 'Clear Invoice')>0
AND
DATEDIFF(dd, PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Send Overdue Notice'),
PU_MAX("EKPO", ROUND_DAY("_CEL_P2P_ACTIVITIES"."EVENTTIME"), "_CEL_P2P_ACTIVITIES"."ACTIVITY_EN" =
'Clear Invoice')) <30
THEN 0
ELSE NULL
END
"""}
1.1 Check query
Below you can check if the provided kpi queries are valid pql statement. Just enter the kpi name an execute. If a no error occurs it works.
from pycelonis.celonis_api.pql.pql import PQL, PQLColumn
kpi_name = "...."
datamodel.get_data_frame(PQL(PQLColumn(kpis[kpi_name], kpi_name)))
2. Root Cause Analysis
The Pycelonis Root Cause Analysis searches the selected tables for dimensions, categories/field values which have a high correlation with the given kpis, and returns those in an ordered table.
2.1 Define Parameters
Uncomment the cell below and run it to look for tables in the model
# datamodel.tables
Please define the following parameters:
SELECTED_TABLES: A list of the datamodel table names, which you want to include in the search.CHUNK_SIZE: Size of chunks to be extracted (best leave as is).CELONIS_FILTER: Filter string or list of filter strings used if you want to limit the search to e.g. a specific plant. (optional)
# Define parameters
SELECTED_TABLES=['EKPO']
CELONIS_FILTER=None
CHUNK_SIZE=20
2.2 Run Root Cause Analysis
# Create root cause analysis object
RCA = root_cause_analysis.RCA(kpis=kpis, datamodel=datamodel, selected_tables=SELECTED_TABLES, celonis_filter=CELONIS_FILTER, chunk_size=CHUNK_SIZE)
# Run RCA
rca_result = RCA.apply()
0%| | 0/15 [00:00<?, ?it/s]
2021-01-25 11:15:38 - pycelonis: PQL Export running, status: {'id': '4eb1d82b-f6cc-4527-99a2-f0c8d1db2d1e', 'exportStatus': 'RUNNING', 'created': None, 'message': None, 'exportType': 'PARQUET'}
2021-01-25 11:15:47 - numexpr.utils: Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-01-25 11:15:47 - numexpr.utils: NumExpr defaulting to 8 threads.
100%|██████████| 15/15 [01:04<00:00, 4.30s/it]
Interpreting the results
Below you can see the results of the root cause search:
Interpretation
Field Name: Name of table + column where Field Value is located in.Field Value: A Category in the Field Name column.KPI Name: Name of the kpi that the row is corresponding to.KPI Value: avg. KPI for all items of category Field Value (%)Lift: The lift measures the deviation of the category KPI from the base line KPI value (averaged on the whole datamodel). This is the most important measure in this table. A lift > 1 means the category is performing WORSE than the average. A lift < 1 means the category is performing BETTER than the average. Refer to this article to better understand the Lift (https://www.kdnuggets.com/2016/03/lift-analysis-data-scientist-secret-weapon.html)Percentage of Cases: Percentage of Cases that fall into the cateogory in Field Value, if this is high its a more frequent cateogry.Percentage of Bad Cases: Percentage of Cases where kpi = "unwanted behaviour" that fall into the cateogory in Field Value, if this is high you know that a lot of the cases with unwanted hehaviour fall into this category.Number of Cases: Total of Cases that fall into the cateogory in Field Value, if this is high its a more frequent cateogry.Number of Bad Cases: Total of Cases where kpi = "unwanted behaviour" that fall into the cateogory in Field Value, if this is high you know that a lot of the cases with unwanted hehaviour fall into this category.
# Show results
rca_result
| Field Name | Field Value | KPI Name | KPI Value | Lift | Percentage of Cases | Percentage of Bad Cases | Number Bad Cases | Number Cases | |
|---|---|---|---|---|---|---|---|---|---|
| 43 | EKPO.TXZ01 | Color display | late_payment | 50.000000 | 12.625000 | 0.990099 | 12.5 | 4.0 | 8 |
| 86 | EKPO.PRDAT | 2002-02-04 00:00:00 | late_payment | 40.000000 | 10.100000 | 2.475248 | 25.0 | 8.0 | 20 |
| 27 | EKPO.MATKL | 008 | late_payment | 33.333333 | 8.416667 | 1.485149 | 12.5 | 4.0 | 12 |
| 34 | EKPO.MATNR | DPC1016 | late_payment | 25.000000 | 6.312500 | 1.980198 | 12.5 | 4.0 | 16 |
| 30 | EKPO.MATKL | R1114 | late_payment | 20.000000 | 5.050000 | 2.475248 | 12.5 | 4.0 | 20 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 41 | EKPO.WERKS | R310 | late_payment | 0.000000 | 0.000000 | 4.455446 | 0.0 | 0.0 | 36 |
| 54 | EKPO.MTART | FRIP | late_payment | 0.000000 | 0.000000 | 7.425743 | 0.0 | 0.0 | 60 |
| 5 | EKPO.AEDAT | Weekday 1 | late_payment | 0.000000 | 0.000000 | 9.900990 | 0.0 | 0.0 | 80 |
| 66 | EKPO.MEINS | KAR | late_payment | 0.000000 | 0.000000 | 12.871287 | 0.0 | 0.0 | 104 |
| 64 | EKPO.LGORT | missing | late_payment | 0.000000 | 0.000000 | 8.415842 | 0.0 | 0.0 | 68 |
89 rows × 9 columns
3. (Optional) Push root cause analysis result table back to the data pool for further analysis
datamodel.push_table(result, "ROOT_CAUSE_RESULT", if_exists="replace", reload_datamodel=False)
datamodel.reload(from_cache=False)