Skip to content

data_frame

DataFrame

DataFrame(
    data,
    index=None,
    filters=None,
    order_by_columns=None,
    saola_connector=None,
)

Bases: ImmutableObject

Two dimensional PQL DataFrame.

Parameters:

  • data (MutableMapping[str, SeriesLike]) –

    Dictionary with data to be used for data frames. Keys are column keys and values can be either Series, PQL query string, PQLColumn, or PQLOperator.

  • index (Optional[BaseIndex]) –

    Index to be used. Default is RangeIndex.

  • filters (Optional[FiltersLike]) –

    Filters to be used. Default is none.

  • order_by_columns (Optional[List[OrderByColumn]]) –

    OrderByColumns to be used to sort data frame. Default is none.

  • saola_connector (Optional[SaolaConnector]) –

    Saola connector used to export data.

index property

index: BaseIndex

Returns index of data frame.

filters property

filters: Filters

Returns filters of data frame.

order_by_columns property

order_by_columns: List[OrderByColumn]

Returns order by columns of data frame.

saola_connector property

saola_connector: SaolaConnector

Returns saola connector of data frame.

query property

query: PQL

Returns PQL query of data frame.

query_columns property

query_columns: Sequence[PQLColumn]

Returns list of PQL columns of data frame.

query_filters property

query_filters: List[PQLFilter]

Returns filters of data frame.

query_order_by_columns property

query_order_by_columns: List[OrderByColumn]

Returns order by columns of series.

columns property

columns: Sequence[str]

Returns column names of data frame.

ndim property

ndim: int

Returns an int representing the number of axes.

nrows property

nrows: int

Returns an int representing the number of rows of this data frame.

ncolumns property

ncolumns: int

Returns an int representing the number of columns of this data frame.

size property

size: int

Returns an int representing the number of elements in this data frame.

shape property

shape: Tuple[int, int]

Returns a tuple representing the shape of this data frame.

empty property

empty: bool

Returns whether data frame is empty.

object_str staticmethod

object_str(class_name, properties)

Returns string representation of object with given class name and properties.

Parameters:

  • class_name (str) –

    Name of object class.

  • properties (OrderedDict[str, Any]) –

    Properties to include.

Returns:

  • str

    String representation.

shorten_string staticmethod

shorten_string(string, max_length=None)

Shortens string to have maximum of max_length characters.

from_pql classmethod

from_pql(query, **kwargs)

Creates data frame from PQL query.

Parameters:

  • query (PQL) –

    PQL query to create data frame from.

Returns:

  • DataFrame

    DataFrame created from PQL query.

head

head(n=5)

Returns the first n rows based on position as pandas DataFrame.

Parameters:

  • n (int) –

    Number of rows to return.

Returns:

  • pd.DataFrame

    First n rows as pandas DataFrame.

to_pandas

to_pandas(distinct=None, limit=None, offset=None)

Exports data using saola_connector.

add

add(other)

Return addition of data frame and other.

Applies ADD operator to column.

Parameters:

Returns:

sub

sub(other)

Return subtraction of data frame and other.

Applies SUB operator to column.

Parameters:

Returns:

mul

mul(other)

Return multiplication of data frame and other.

Applies MULT operator to column.

Parameters:

Returns:

div

div(other)

Return division of data frame and other.

Applies DIV operator to column.

Parameters:

Returns:

floordiv

floordiv(other)

Return floor division of data frame and other.

Applies FLOOR operator and DIV operator to column.

Parameters:

Returns:

mod

mod(other)

Return modulo of data frame and other.

Applies MODULO operator to column.

Parameters:

Returns:

pow

pow(other)

Return the data frame raised to the power of other.

Applies POWER operator to column.

Parameters:

Returns:

abs

abs()

Return the DataFrame with the absolute value of its elements.

Applies ABS operator to column.

round

round(decimals=0)

Round dataframe to given number of decimals.

Applies ROUND operator to column.

lt

lt(other)

Return a DataFrame of booleans indicating whether each element is less than the other.

Applies LOWER_THAN operator to column.

Parameters:

Returns:

le

le(other)

Return a DataFrame of booleans indicating whether each element is less than or equal to the other.

Applies LOWER_EQUALS operator to column.

Parameters:

Returns:

eq

eq(other)

Return a DataFrame of booleans indicating whether each element is equal to the other.

Applies EQUALS operator to column.

Parameters:

Returns:

ne

ne(other)

Return a DataFrame of booleans indicating whether each element is not equal to the other.

Applies NOT_EQUALS operator to column.

Parameters:

Returns:

ge

ge(other)

Return a DataFrame of booleans indicating whether each element is greater than or equal to the other.

Applies GREATER_EQUALS operator to column.

Parameters:

Returns:

gt

gt(other)

Return a DataFrame of booleans indicating whether each element is greater than the other.

Applies GREATER_THAN operator to column.

Parameters:

Returns:

isnull

isnull()

Return a boolean same-sized DataFrame indicating if the values are null.

Applies IS NULL operator to column.

Returns:

  • DataFrame

    A DataFrame of masked bool values for each element that indicates whether an element is a null value.

isin

isin(values)

Returns whether elements of data frame are in values.

Applies IN operator to column.

Parameters:

Returns:

dropna

dropna()

Return DataFrame with filter for null values. Rows are removed if any column is null.

Returns:

  • DataFrame

    A DataFrame with null values filtered out.

mean

mean()

Return the mean of each column.

Applies AVG operator to column.

Returns:

  • pd.Series

    Mean of column values.

median

median()

Return the median of each column.

Applies MEDIAN operator to column.

Returns:

  • pd.Series

    Median of column values.

quantile

quantile(q=0.5)

Return the quantile of each column.

Applies QUANTILE operator to column.

Parameters:

  • q (float) –

    Quantile to compute. 0 <= q <= 1.

Returns:

  • pd.Series

    Quantile of series values.

mode

mode()

Return the mode of each column.

Applies MODE operator to column.

Returns:

  • pd.DataFrame

    Mode of column values.

max

max()

Return the max of each column.

Applies MAX operator to column.

Returns:

  • pd.Series

    Max of column values.

min

min()

Return the min of each column.

Applies MIN operator to column.

Returns:

  • pd.Series

    Min of column values.

sum

sum()

Return the sum of each column.

Applies SUM operator to column.

Returns:

  • pd.Series

    Sum of column values.

product

product()

Return the product of each column. Null values are skipped.

Applies PRODUCT operator to column. In case of an overflow the result will be null.

Returns:

  • pd.Series

    Product of column values.

count

count()

Return the number of non-null values per column of data frame.

Applies COUNT operator to column.

Returns:

  • pd.Series

    Number of non-null values per column.

groupby

groupby(by)

Return the group by aggregation methods containing all aggregation methods.

Parameters:

  • by (Union[str, List[str]]) –

    Used to determine the groups the aggregation method is applied on.

Returns:

var

var()

Return the variance of each column using the n-1 method. Null values are ignored.

Applies VAR operator to column.

Returns:

  • pd.Series

    Variance of column values.

std

std()

Return the standard deviation of each column using the n-1 method. Null values are ignored.

Applies STDEV operator to column.

Returns:

  • pd.Series

    Standard deviation of column values.

to_int

to_int()

Converts columns of given data frame to int.

Applies TO_INT operator to column.

to_float

to_float()

Converts columns of given data frame to float.

Applies TO_FLOAT operator to column.

to_string

to_string(format_=None)

Converts columns of given data frame to string.

Applies TO_STRING operator to column.

Parameters:

  • format_ (Optional[str]) –

    Optional, defines how dates are converted to string.

Returns:

  • DataFrame

    DataFrame converted to string.

to_date

to_date(format_)

Converts columns of given data frame to date.

Applies TO_DATE operator to column.

Parameters:

  • format_ (str) –

    Defines how strings are converted to date.

Returns:

astype

astype(type_, **kwargs)

Converts columns of given data frame to type.

Parameters:

  • type_ (Type[Union[str, int, float]]) –

    Type to convert to. Supported types are str, int, float.

  • **kwargs (Any) –

    Passed to conversion function.

Returns:

nunique

nunique(dropna=True)

Returns number of unique elements per column of data frame.

Parameters:

  • dropna (bool) –

    Whether none values are counted or not.

Returns:

  • pd.Series

    Number of unique elements per column.

set_filters

set_filters(filters)

Removes filters of series.

reset_filters

reset_filters()

Removes all filters.

reindex

reindex(index)

Updates index of series.

reset_index

reset_index()

Resets index to default and drop original index.

drop

drop(labels)

Drop labels from columns.

Parameters:

  • labels (Union[str, List[str]]) –

    Name of columns to drop.

Returns:

  • DataFrame

    DataFrame without given columns.

sort_values

sort_values(by, ascending=True)

Sorts data frame by given columns.

Parameters:

  • by (Union[str, List[str]]) –

    Name or list of names of columns to sort by.

  • ascending (Union[bool, List[bool]]) –

    Sort ascending or descending. Specify list for multiple sort orders.

Returns:

  • DataFrame

    DataFrame with OrderByColumns set.

apply_unary_operator

apply_unary_operator(operator, **kwargs)

Applies given unary operator to data frame.

Parameters:

Returns:

  • DataFrame

    DataFrame with operator applied.

apply_binary_operator

apply_binary_operator(other, operator, reverse=False)

Applies given binary operator to data frame and exports result.

Parameters:

  • other (Union[DataFrame, Series, pd.Series, ScalarValue]) –

    Other operand to apply binary operator on.

  • operator (Type[BinaryPQLOperator]) –

    Operator to apply.

  • reverse (bool) –

    If true order of operands is reversed.

Returns:

  • DataFrame

    DataFrame with operator applied.

apply_binary_operator_dunder

apply_binary_operator_dunder(
    other, operator, reverse=False
)

Combines data frame with other by applying function for each column for dunder methods.

apply_aggregation_operator

apply_aggregation_operator(operator, **kwargs)

Applies given aggregation operator to data frame and exports result.

Parameters:

Returns:

  • pd.Series

    Series with operator applied.

copy

copy(
    data=None,
    index=None,
    filters=None,
    order_by_columns=None,
    saola_connector=None,
)

Copies given data frame and overrides properties given as parameters.

verify_columns_contained

verify_columns_contained(columns)

Verifies whether the dataframe contains columns.

Parameters:

  • columns (List[str]) –

    List of columns to verify

Returns:

  • typing.Set[str]

    Set of verified column names