String Methods¶
In this tutorial, you will learn:
- How to use string methods to transform and filter string columns
Prerequisites:¶
To follow this tutorial, you need to have a Celonis EMS team set up and be familiar with the basic functionalities of PyCelonis.
Tutorial¶
1. Connect to your data model¶
To get started with SaolaPy, we first need to connect to our data model:
from pycelonis import get_celonis
import pycelonis.pql as pql
c = get_celonis()
data_pool = c.data_integration.get_data_pools().find("<YOUR_DATA_POOL>")
data_model = data_pool.get_data_models().find("<YOUR_DATA_MODEL>")
To avoid always having to specify the data model we set it as our global default data model:
from pycelonis.config import Config
Config.DEFAULT_DATA_MODEL = data_model
For convenience, we extract the columns of tables we want to use:
ekpo_columns = data_model.get_tables().find("AP_EKPO").get_columns()
2. String Function Basics¶
Let's first create a series to work with that contains country codes:
country_code_series = pql.Series(ekpo_columns.find("BUKRS"))
country_code_series.head()
Index 0 GB03 1 GB03 2 5100 3 5100 4 5100 Name: BUKRS, dtype: object
To access any string method, we have to call series.str.<method_name>
. For example to get the length use:
result = country_code_series.str.len()
result.head()
Index 0 4 1 4 2 4 3 4 4 4 Name: BUKRS, dtype: int64
Substrings¶
To extract a substring one can use regular Python indices:
country_code_series.str[0:2].head()
Index 0 GB 1 GB 2 51 3 51 4 51 Name: BUKRS, dtype: object
It's also possible to access single characters and use negative indices to get e.g. the last 3 characters:
country_code_series.str[0].head()
Index 0 G 1 G 2 5 3 5 4 5 Name: BUKRS, dtype: object
country_code_series.str[-3:].head()
Index 0 B03 1 B03 2 100 3 100 4 100 Name: BUKRS, dtype: object
String concatenation¶
Strings can be concatenated using the add operator on the str
property:
result = country_code_series.str + "_SUFFIX"
result.head()
Index 0 GB03_SUFFIX 1 GB03_SUFFIX 2 5100_SUFFIX 3 5100_SUFFIX 4 5100_SUFFIX Name: BUKRS, dtype: object
NOTE: To concatenate multiple strings make sure to use the str
property again as otherwise the regular add operator for integers and floats would be applied:
result = ("PREFIX_" + country_code_series.str).str + "_SUFFIX"
result.head()
Index 0 PREFIX_GB03_SUFFIX 1 PREFIX_GB03_SUFFIX 2 PREFIX_5100_SUFFIX 3 PREFIX_5100_SUFFIX 4 PREFIX_5100_SUFFIX Name: BUKRS, dtype: object
3. String Manipulation¶
SaolaPy offers several methods for string manipulation.
You can for example convert all characters to lower or upper case:
result = country_code_series.str.upper()
result.head()
Index 0 GB03 1 GB03 2 5100 3 5100 4 5100 Name: BUKRS, dtype: object
result = country_code_series.str.lower()
result.head()
Index 0 gb03 1 gb03 2 5100 3 5100 4 5100 Name: BUKRS, dtype: object
It's also possible to strip whitespace using lstrip
, rstrip
, and strip
:
result = country_code_series.str.strip()
result.head()
Index 0 GB03 1 GB03 2 5100 3 5100 4 5100 Name: BUKRS, dtype: object
Lastly, it is possible to replace a substring with a different string:
result = country_code_series.str.replace("GB", "UK")
result.head()
Index 0 UK03 1 UK03 2 5100 3 5100 4 5100 Name: BUKRS, dtype: object
4. String Filtering¶
String methods can also be used to filter data using the startswith
, endswith
, and contains
method:
result = country_code_series[country_code_series.str.startswith("GB")]
result.head()
Index 0 GB03 1 GB03 2 GB03 3 GB03 4 GB03 Name: BUKRS, dtype: object
result = country_code_series[country_code_series.str.endswith("03")]
result.head()
Index 0 GB03 1 GB03 2 GB03 3 GB03 4 GB03 Name: BUKRS, dtype: object
result = country_code_series[country_code_series.str.contains("10")]
result.head()
Index 0 5100 1 5100 2 5100 3 5100 4 5100 Name: BUKRS, dtype: object
Conclusion¶
Congratulations! You have learned how to apply string methods to transform and filter string columns.