data_frame
DataFrame ¶
Bases: ImmutableObject
Two dimensional PQL DataFrame.
Parameters:
-
data(MutableMapping[str, SeriesLike]) –Dictionary with data to be used for data frames. Keys are column keys and values can be either Series, PQL query string, PQLColumn, or PQLOperator.
-
index(Optional[BaseIndex], default:None) –Index to be used. Default is RangeIndex.
-
filters(Optional[FiltersLike], default:None) –Filters to be used. Default is none.
-
order_by_columns(Optional[List[OrderByColumn]], default:None) –OrderByColumns to be used to sort data frame. Default is none.
-
saola_connector(Optional[SaolaConnector], default:None) –Saola connector used to export data.
object_str
staticmethod
¶
Returns string representation of object with given class name and properties.
Parameters:
-
class_name(str) –Name of object class.
-
properties(OrderedDict[str, Any]) –Properties to include.
Returns:
-
str–String representation.
shorten_string
staticmethod
¶
Shortens string to have maximum of max_length characters.
from_pql
classmethod
¶
head ¶
Returns the first n rows based on position as pandas DataFrame.
Parameters:
-
n(int, default:5) –Number of rows to return.
Returns:
-
DataFrame–First n rows as pandas DataFrame.
add ¶
Return addition of data frame and other.
Applies ADD operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be added.
Returns:
-
DataFrame–The result of the operation.
sub ¶
Return subtraction of data frame and other.
Applies SUB operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be subtracted.
Returns:
-
DataFrame–The result of the operation.
mul ¶
Return multiplication of data frame and other.
Applies MULT operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be multiplied.
Returns:
-
DataFrame–The result of the operation.
div ¶
Return division of data frame and other.
Applies DIV operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be divided.
Returns:
-
DataFrame–The result of the operation.
floordiv ¶
Return floor division of data frame and other.
Applies FLOOR operator and DIV operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be floor divided.
Returns:
-
DataFrame–The result of the operation.
mod ¶
Return modulo of data frame and other.
Applies MODULO operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be modulo'd.
Returns:
-
DataFrame–The result of the operation.
pow ¶
Return the data frame raised to the power of other.
Applies POWER operator to column.
Parameters:
-
other(Union[DataFrame, Series, NumericValue]) –DataFrame, Series or numeric scalar to be the exponent.
Returns:
-
DataFrame–The result of the operation.
abs ¶
Return the DataFrame with the absolute value of its elements.
Applies ABS operator to column.
round ¶
Round dataframe to given number of decimals.
Applies ROUND operator to column.
lt ¶
Return a DataFrame of booleans indicating whether each element is less than the other.
Applies LOWER_THAN operator to column.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –DataFrame, Series or scalar to be compared.
Returns:
-
DataFrame–The result of the operation.
le ¶
Return a DataFrame of booleans indicating whether each element is less than or equal to the other.
Applies LOWER_EQUALS operator to column.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –DataFrame, Series or scalar to be compared.
Returns:
-
DataFrame–The result of the operation.
eq ¶
Return a DataFrame of booleans indicating whether each element is equal to the other.
Applies EQUALS operator to column.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –DataFrame, Series or scalar to be compared.
Returns:
-
DataFrame–The result of the operation.
ne ¶
Return a DataFrame of booleans indicating whether each element is not equal to the other.
Applies NOT_EQUALS operator to column.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –DataFrame, Series or scalar to be compared.
Returns:
-
DataFrame–The result of the operation.
ge ¶
Return a DataFrame of booleans indicating whether each element is greater than or equal to the other.
Applies GREATER_EQUALS operator to column.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –DataFrame, Series or scalar to be compared.
Returns:
-
DataFrame–The result of the operation.
gt ¶
Return a DataFrame of booleans indicating whether each element is greater than the other.
Applies GREATER_THAN operator to column.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –DataFrame, Series or scalar to be compared.
Returns:
-
DataFrame–The result of the operation.
isnull ¶
Return a boolean same-sized DataFrame indicating if the values are null.
Applies IS NULL operator to column.
Returns:
-
DataFrame–A DataFrame of masked bool values for each element that indicates whether an element is a null value.
isin ¶
Returns whether elements of data frame are in values.
Applies IN operator to column.
Parameters:
-
values(List[Union[Series, ScalarValue]]) –List of values to test.
Returns:
-
DataFrame–The result of the operation.
dropna ¶
Return DataFrame with filter for null values. Rows are removed if any column is null.
Returns:
-
DataFrame–A DataFrame with null values filtered out.
mean ¶
Return the mean of each column.
Applies AVG operator to column.
Returns:
-
Series–Mean of column values.
median ¶
Return the median of each column.
Applies MEDIAN operator to column.
Returns:
-
Series–Median of column values.
quantile ¶
Return the quantile of each column.
Applies QUANTILE operator to column.
Parameters:
-
q(float, default:0.5) –Quantile to compute. 0 <= q <= 1.
Returns:
-
Series–Quantile of series values.
mode ¶
Return the mode of each column.
Applies MODE operator to column.
Returns:
-
DataFrame–Mode of column values.
max ¶
Return the max of each column.
Applies MAX operator to column.
Returns:
-
Series–Max of column values.
min ¶
Return the min of each column.
Applies MIN operator to column.
Returns:
-
Series–Min of column values.
sum ¶
Return the sum of each column.
Applies SUM operator to column.
Returns:
-
Series–Sum of column values.
product ¶
Return the product of each column. Null values are skipped.
Applies PRODUCT operator to column. In case of an overflow the result will be null.
Returns:
-
Series–Product of column values.
count ¶
Return the number of non-null values per column of data frame.
Applies COUNT operator to column.
Returns:
-
Series–Number of non-null values per column.
groupby ¶
Return the group by aggregation methods containing all aggregation methods.
Parameters:
-
by(Union[str, List[str]]) –Used to determine the groups the aggregation method is applied on.
Returns:
-
GroupByAggregationMethods–GroupByAggregationMethods object
var ¶
Return the variance of each column using the n-1 method. Null values are ignored.
Applies VAR operator to column.
Returns:
-
Series–Variance of column values.
std ¶
Return the standard deviation of each column using the n-1 method. Null values are ignored.
Applies STDEV operator to column.
Returns:
-
Series–Standard deviation of column values.
to_float ¶
Converts columns of given data frame to float.
Applies TO_FLOAT operator to column.
to_string ¶
Converts columns of given data frame to string.
Applies TO_STRING operator to column.
Parameters:
-
format_(Optional[str], default:None) –Optional, defines how dates are converted to string.
Returns:
-
DataFrame–DataFrame converted to string.
to_date ¶
Converts columns of given data frame to date.
Applies TO_DATE operator to column.
Parameters:
-
format_(str) –Defines how strings are converted to date.
Returns:
-
DataFrame–DataFrame converted to date.
astype ¶
Converts columns of given data frame to type.
Parameters:
-
type_(Type[Union[str, int, float]]) –Type to convert to. Supported types are str, int, float.
-
**kwargs(Any, default:{}) –Passed to conversion function.
Returns:
-
DataFrame–Converted DataFrame.
nunique ¶
Returns number of unique elements per column of data frame.
Parameters:
-
dropna(bool, default:True) –Whether none values are counted or not.
Returns:
-
Series–Number of unique elements per column.
drop ¶
Drop labels from columns.
Parameters:
-
labels(Union[str, List[str]]) –Name of columns to drop.
Returns:
-
DataFrame–DataFrame without given columns.
sort_values ¶
Sorts data frame by given columns.
Parameters:
-
by(Union[str, List[str]]) –Name or list of names of columns to sort by.
-
ascending(Union[bool, List[bool]], default:True) –Sort ascending or descending. Specify list for multiple sort orders.
Returns:
-
DataFrame–DataFrame with OrderByColumns set.
apply_unary_operator ¶
Applies given unary operator to data frame.
Parameters:
-
operator(Type[UnaryPQLOperator]) –Operator to apply.
Returns:
-
DataFrame–DataFrame with operator applied.
apply_binary_operator ¶
Applies given binary operator to data frame and exports result.
Parameters:
-
other(Union[DataFrame, Series, Series, ScalarValue]) –Other operand to apply binary operator on.
-
operator(Type[BinaryPQLOperator]) –Operator to apply.
-
reverse(bool, default:False) –If true order of operands is reversed.
Returns:
-
DataFrame–DataFrame with operator applied.
apply_binary_operator_dunder ¶
Combines data frame with other by applying function for each column for dunder methods.
apply_aggregation_operator ¶
Applies given aggregation operator to data frame and exports result.
Parameters:
-
operator(Type[UnaryPQLOperator]) –Operator to apply.
Returns:
-
Series–Series with operator applied.
copy ¶
Copies given data frame and overrides properties given as parameters.
verify_columns_contained ¶
Verifies whether the dataframe contains columns.
Parameters:
-
columns(List[str]) –List of columns to verify
Returns:
-
Set[str]–Set of verified column names