data_frame

DataFrame ¶

DataFrame(
    data,
    index=None,
    filters=None,
    order_by_columns=None,
    saola_connector=None,
)

Bases: ImmutableObject

Two dimensional PQL DataFrame.

Parameters:

data (MutableMapping[str, SeriesLike]) –

Dictionary with data to be used for data frames. Keys are column keys and values can be either Series, PQL query string, PQLColumn, or PQLOperator.
index (Optional[BaseIndex], default: None ) –

Index to be used. Default is RangeIndex.
filters (Optional[FiltersLike], default: None ) –

Filters to be used. Default is none.
order_by_columns (Optional[List[OrderByColumn]], default: None ) –

OrderByColumns to be used to sort data frame. Default is none.
saola_connector (Optional[SaolaConnector], default: None ) –

Saola connector used to export data.

index `property` ¶

index

Returns index of data frame.

filters `property` ¶

filters

Returns filters of data frame.

order_by_columns `property` ¶

order_by_columns

Returns order by columns of data frame.

saola_connector `property` ¶

saola_connector

Returns saola connector of data frame.

query `property` ¶

query

Returns PQL query of data frame.

query_columns `property` ¶

query_columns

Returns list of PQL columns of data frame.

query_filters `property` ¶

query_filters

Returns filters of data frame.

query_order_by_columns `property` ¶

query_order_by_columns

Returns order by columns of series.

columns `property` ¶

columns

Returns column names of data frame.

ndim `property` ¶

ndim

Returns an int representing the number of axes.

nrows `property` ¶

nrows

Returns an int representing the number of rows of this data frame.

ncolumns `property` ¶

ncolumns

Returns an int representing the number of columns of this data frame.

size `property` ¶

size

Returns an int representing the number of elements in this data frame.

shape `property` ¶

shape

Returns a tuple representing the shape of this data frame.

empty `property` ¶

empty

Returns whether data frame is empty.

object_str `staticmethod` ¶

object_str(class_name, properties)

Returns string representation of object with given class name and properties.

Parameters:

class_name (str) –

Name of object class.
properties (OrderedDict[str, Any]) –

Properties to include.

Returns:

str –

String representation.

shorten_string `staticmethod` ¶

shorten_string(string, max_length=None)

Shortens string to have maximum of max_length characters.

from_pql `classmethod` ¶

from_pql(query, **kwargs)

Creates data frame from PQL query.

Parameters:

query (PQL) –

PQL query to create data frame from.

Returns:

DataFrame –

DataFrame created from PQL query.

head ¶

head(n=5)

Returns the first n rows based on position as pandas DataFrame.

Parameters:

n (int, default: 5 ) –

Number of rows to return.

Returns:

DataFrame –

First n rows as pandas DataFrame.

to_pandas ¶

to_pandas(distinct=None, limit=None, offset=None)

Exports data using saola_connector.

add ¶

add(other)

Return addition of data frame and other.

Applies ADD operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be added.

Returns:

DataFrame –

The result of the operation.

sub ¶

sub(other)

Return subtraction of data frame and other.

Applies SUB operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be subtracted.

Returns:

DataFrame –

The result of the operation.

mul ¶

mul(other)

Return multiplication of data frame and other.

Applies MULT operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be multiplied.

Returns:

DataFrame –

The result of the operation.

div ¶

div(other)

Return division of data frame and other.

Applies DIV operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be divided.

Returns:

DataFrame –

The result of the operation.

floordiv ¶

floordiv(other)

Return floor division of data frame and other.

Applies FLOOR operator and DIV operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be floor divided.

Returns:

DataFrame –

The result of the operation.

mod ¶

mod(other)

Return modulo of data frame and other.

Applies MODULO operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be modulo'd.

Returns:

DataFrame –

The result of the operation.

pow ¶

pow(other)

Return the data frame raised to the power of other.

Applies POWER operator to column.

Parameters:

other (Union[DataFrame, Series, NumericValue]) –

DataFrame, Series or numeric scalar to be the exponent.

Returns:

DataFrame –

The result of the operation.

abs ¶

abs()

Return the DataFrame with the absolute value of its elements.

Applies ABS operator to column.

round ¶

round(decimals=0)

Round dataframe to given number of decimals.

Applies ROUND operator to column.

lt ¶

lt(other)

Return a DataFrame of booleans indicating whether each element is less than the other.

Applies LOWER_THAN operator to column.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

DataFrame, Series or scalar to be compared.

Returns:

DataFrame –

The result of the operation.

le ¶

le(other)

Return a DataFrame of booleans indicating whether each element is less than or equal to the other.

Applies LOWER_EQUALS operator to column.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

DataFrame, Series or scalar to be compared.

Returns:

DataFrame –

The result of the operation.

eq ¶

eq(other)

Return a DataFrame of booleans indicating whether each element is equal to the other.

Applies EQUALS operator to column.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

DataFrame, Series or scalar to be compared.

Returns:

DataFrame –

The result of the operation.

ne ¶

ne(other)

Return a DataFrame of booleans indicating whether each element is not equal to the other.

Applies NOT_EQUALS operator to column.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

DataFrame, Series or scalar to be compared.

Returns:

DataFrame –

The result of the operation.

ge ¶

ge(other)

Return a DataFrame of booleans indicating whether each element is greater than or equal to the other.

Applies GREATER_EQUALS operator to column.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

DataFrame, Series or scalar to be compared.

Returns:

DataFrame –

The result of the operation.

gt ¶

gt(other)

Return a DataFrame of booleans indicating whether each element is greater than the other.

Applies GREATER_THAN operator to column.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

DataFrame, Series or scalar to be compared.

Returns:

DataFrame –

The result of the operation.

isnull ¶

isnull()

Return a boolean same-sized DataFrame indicating if the values are null.

Applies IS NULL operator to column.

Returns:

DataFrame –

A DataFrame of masked bool values for each element that indicates whether an element is a null value.

isin ¶

isin(values)

Returns whether elements of data frame are in values.

Applies IN operator to column.

Parameters:

values (List[Union[Series, ScalarValue]]) –

List of values to test.

Returns:

DataFrame –

The result of the operation.

dropna ¶

dropna()

Return DataFrame with filter for null values. Rows are removed if any column is null.

Returns:

DataFrame –

A DataFrame with null values filtered out.

mean ¶

mean()

Return the mean of each column.

Applies AVG operator to column.

Returns:

Series –

Mean of column values.

median ¶

median()

Return the median of each column.

Applies MEDIAN operator to column.

Returns:

Series –

Median of column values.

quantile ¶

quantile(q=0.5)

Return the quantile of each column.

Applies QUANTILE operator to column.

Parameters:

q (float, default: 0.5 ) –

Quantile to compute. 0 <= q <= 1.

Returns:

Series –

Quantile of series values.

mode ¶

mode()

Return the mode of each column.

Applies MODE operator to column.

Returns:

DataFrame –

Mode of column values.

max ¶

max()

Return the max of each column.

Applies MAX operator to column.

Returns:

Series –

Max of column values.

min ¶

min()

Return the min of each column.

Applies MIN operator to column.

Returns:

Series –

Min of column values.

sum ¶

sum()

Return the sum of each column.

Applies SUM operator to column.

Returns:

Series –

Sum of column values.

product ¶

product()

Return the product of each column. Null values are skipped.

Applies PRODUCT operator to column. In case of an overflow the result will be null.

Returns:

Series –

Product of column values.

count ¶

count()

Return the number of non-null values per column of data frame.

Applies COUNT operator to column.

Returns:

Series –

Number of non-null values per column.

groupby ¶

groupby(by)

Return the group by aggregation methods containing all aggregation methods.

Parameters:

by (Union[str, List[str]]) –

Used to determine the groups the aggregation method is applied on.

Returns:

GroupByAggregationMethods –

GroupByAggregationMethods object

var ¶

var()

Return the variance of each column using the n-1 method. Null values are ignored.

Applies VAR operator to column.

Returns:

Series –

Variance of column values.

std ¶

std()

Return the standard deviation of each column using the n-1 method. Null values are ignored.

Applies STDEV operator to column.

Returns:

Series –

Standard deviation of column values.

to_int ¶

to_int()

Converts columns of given data frame to int.

Applies TO_INT operator to column.

to_float ¶

to_float()

Converts columns of given data frame to float.

Applies TO_FLOAT operator to column.

to_string ¶

to_string(format_=None)

Converts columns of given data frame to string.

Applies TO_STRING operator to column.

Parameters:

format_ (Optional[str], default: None ) –

Optional, defines how dates are converted to string.

Returns:

DataFrame –

DataFrame converted to string.

to_date ¶

to_date(format_)

Converts columns of given data frame to date.

Applies TO_DATE operator to column.

Parameters:

format_ (str) –

Defines how strings are converted to date.

Returns:

DataFrame –

DataFrame converted to date.

astype ¶

astype(type_, **kwargs)

Converts columns of given data frame to type.

Parameters:

type_ (Type[Union[str, int, float]]) –

Type to convert to. Supported types are str, int, float.
**kwargs (Any, default: {} ) –

Passed to conversion function.

Returns:

DataFrame –

Converted DataFrame.

nunique ¶

nunique(dropna=True)

Returns number of unique elements per column of data frame.

Parameters:

dropna (bool, default: True ) –

Whether none values are counted or not.

Returns:

Series –

Number of unique elements per column.

set_filters ¶

set_filters(filters)

Removes filters of series.

reset_filters ¶

reset_filters()

Removes all filters.

reindex ¶

reindex(index)

Updates index of series.

reset_index ¶

reset_index()

Resets index to default and drop original index.

drop ¶

drop(labels)

Drop labels from columns.

Parameters:

labels (Union[str, List[str]]) –

Name of columns to drop.

Returns:

DataFrame –

DataFrame without given columns.

sort_values ¶

sort_values(by, ascending=True)

Sorts data frame by given columns.

Parameters:

by (Union[str, List[str]]) –

Name or list of names of columns to sort by.
ascending (Union[bool, List[bool]], default: True ) –

Sort ascending or descending. Specify list for multiple sort orders.

Returns:

DataFrame –

DataFrame with OrderByColumns set.

apply_unary_operator ¶

apply_unary_operator(operator, **kwargs)

Applies given unary operator to data frame.

Parameters:

operator (Type[UnaryPQLOperator]) –

Operator to apply.

Returns:

DataFrame –

DataFrame with operator applied.

apply_binary_operator ¶

apply_binary_operator(other, operator, reverse=False)

Applies given binary operator to data frame and exports result.

Parameters:

other (Union[DataFrame, Series, Series, ScalarValue]) –

Other operand to apply binary operator on.
operator (Type[BinaryPQLOperator]) –

Operator to apply.
reverse (bool, default: False ) –

If true order of operands is reversed.

Returns:

DataFrame –

DataFrame with operator applied.

apply_binary_operator_dunder ¶

apply_binary_operator_dunder(
    other, operator, reverse=False
)

Combines data frame with other by applying function for each column for dunder methods.

apply_aggregation_operator ¶

apply_aggregation_operator(operator, **kwargs)

Applies given aggregation operator to data frame and exports result.

Parameters:

operator (Type[UnaryPQLOperator]) –

Operator to apply.

Returns:

Series –

Series with operator applied.

copy ¶

copy(
    data=None,
    index=None,
    filters=None,
    order_by_columns=None,
    saola_connector=None,
)

Copies given data frame and overrides properties given as parameters.

verify_columns_contained ¶

verify_columns_contained(columns)

Verifies whether the dataframe contains columns.

Parameters:

columns (List[str]) –

List of columns to verify

Returns:

Set[str] –

Set of verified column names

data_frame

DataFrame ¶

index property ¶

filters property ¶

order_by_columns property ¶

saola_connector property ¶

query property ¶

query_columns property ¶

query_filters property ¶

query_order_by_columns property ¶

columns property ¶

ndim property ¶

nrows property ¶

ncolumns property ¶

size property ¶

shape property ¶

empty property ¶

object_str staticmethod ¶

shorten_string staticmethod ¶

from_pql classmethod ¶

head ¶

to_pandas ¶

add ¶

sub ¶

mul ¶

div ¶

floordiv ¶

mod ¶

pow ¶

abs ¶

round ¶

lt ¶

le ¶

eq ¶

ne ¶

ge ¶

gt ¶

isnull ¶

isin ¶

dropna ¶

mean ¶

median ¶

quantile ¶

mode ¶

max ¶

min ¶

sum ¶

product ¶

count ¶

groupby ¶

var ¶

std ¶

to_int ¶

to_float ¶

to_string ¶

to_date ¶

astype ¶

nunique ¶

set_filters ¶

reset_filters ¶

reindex ¶

reset_index ¶

drop ¶

sort_values ¶

apply_unary_operator ¶

apply_binary_operator ¶

apply_binary_operator_dunder ¶

apply_aggregation_operator ¶

copy ¶

verify_columns_contained ¶

index `property` ¶

filters `property` ¶

order_by_columns `property` ¶

saola_connector `property` ¶

query `property` ¶

query_columns `property` ¶

query_filters `property` ¶

query_order_by_columns `property` ¶

columns `property` ¶

ndim `property` ¶

nrows `property` ¶

ncolumns `property` ¶

size `property` ¶

shape `property` ¶

empty `property` ¶

object_str `staticmethod` ¶

shorten_string `staticmethod` ¶

from_pql `classmethod` ¶