Utils

Contains functions that are shared across Iguanas modules.

iguanas.utils.utils.concat(objs: List[Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.series.Series, iguanas.utils.typing.databricks.koalas.frame.DataFrame, iguanas.utils.typing.databricks.koalas.series.Series]], **kwargs) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame][source]

Concatenates a set of Pandas Series/DataFrames or a set of Koalas Series/DataFrames.

Parameters
objsList[Union[PandasDataFrameType, PandasSeriesType, KoalasDataFrameType, KoalasSeriesType]]

List of Pandas/Koalas DataFrame to concatenate.

Returns
Union[PandasDataFrameType, KoalasDataFrameType]

The concatenated DataFrame.

Raises
Exception

objs must be a list of either Pandas objects or Koalas objects.

iguanas.utils.utils.generate_empty_data_structures() Tuple[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.frame.DataFrame][source]

Creates data structures often used in classes in Iguanas.

Returns
Tuple[PandasDataFrameType, PandasDataFrameType]

Contains the rule_descriptions and X_rules dataframes.

iguanas.utils.utils.return_columns_types(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame]) Tuple[List, List, List][source]

Returns the integer, float and OHE categorical columns for a given dataset.

Parameters
XUnion[PandasDataFrameType, KoalasDataFrameType])

Dataset.

Returns
Tuple[List, List, List]

List of integer columns, list of float columns, list of OHE categorical columns.

iguanas.utils.utils.sort_rule_dfs_by_opt_metric(rule_descriptions: iguanas.utils.typing.pandas.core.frame.DataFrame, X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame) Tuple[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.frame.DataFrame][source]

Method for sorting and reindexing rule_descriptions and X_rules by opt_metric.

Parameters
rule_descriptionsPandasDataFrameType

The standard rule_descriptions dataframe.

X_rulesPandasDataFrameType

The binary columns of the rules.

Returns
Tuple[PandasDataFrameType, PandasDataFrameType]

rule_descriptions, X_rules

iguanas.utils.utils.combine_rule_dfs(rule_descriptions_1: iguanas.utils.typing.pandas.core.frame.DataFrame, X_rules_1: iguanas.utils.typing.pandas.core.frame.DataFrame, rule_descriptions_2: iguanas.utils.typing.pandas.core.frame.DataFrame, X_rules_2: iguanas.utils.typing.pandas.core.frame.DataFrame) Tuple[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.frame.DataFrame][source]

Combines the rule_description and X_rules object of two rule sets.

Parameters
rule_descriptions_1PandasDataFrameType

The first rule_descriptions.

X_rules_1PandasDataFrameType

The first X_rules.

rule_descriptions_2PandasDataFrameType

The second rule_descriptions.

X_rules_2PandasDataFrameType

The second X_rules.

Returns
Tuple[PandasDataFrameType, PandasDataFrameType]

rule_descriptions, X_rules

iguanas.utils.utils.create_spark_df(X: iguanas.utils.typing.databricks.koalas.frame.DataFrame, y: iguanas.utils.typing.databricks.koalas.series.Series, sample_weight=None) iguanas.utils.typing.pyspark.sql.dataframe.DataFrame[source]

Creates a Spark DataFrame from the features and target given as Koalas objects.

Parameters
XKoalasDataFrameType

The feature set.

yKoalasSeriesType

The target.

sample_weightKoalasSeriesType, optional

Row-wise weights to apply. Defaults to None.

Returns
PySparkDataFrameType

The Spark DataFrame.

iguanas.utils.utils.calc_tps_fps_tns_fns(y_true: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], y_preds: Union[iguanas.utils.typing.pandas.core.series.Series, iguanas.utils.typing.pandas.core.frame.DataFrame, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series, iguanas.utils.typing.databricks.koalas.frame.DataFrame], sample_weight=None, tps=False, fps=False, tns=False, fns=False, tps_fps=False, tps_fns=False) Tuple[Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float]][source]

Calculates the True Positives, False Positives, True Negatives, False Negatives, True Positives + False Positives and True Positives + False Negatives for a set of binary predictors, given a binary target. The option to calculate the True Positives + False Positives or True Positives + False Positives in one sum is given as it’s faster to calculate these metrics together rather than calculating the individual metrics separately and summing them.

Parameters
y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]

The binary target.

y_predsUnion[PandasSeriesType, PandasDataFrameType, np.ndarray, KoalasSeriesType, KoalasDataFrameType]

The binary predictors.

sample_weightUnion[np.array, PandasSeriesType, KoalasSeriesType], optional

Row-wise weights to apply. Defaults to None.

tpsbool, optional

If True, the True Positives are calculated. Defaults to False.

fpsbool, optional

If True, the False Positives are calculated. Defaults to False.

tnsbool, optional

If True, the True Negatives are calculated. Defaults to False.

fnsbool, optional

If True, the False Negatives are calculated. Defaults to False.

tps_fpsbool, optional

If True, the True Positives + False Positives are calculated. Defaults to False.

tps_fnsbool, optional

If True, the True Positives + False Negatives are calculated. Defaults to False.

Returns
Tuple[Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float]]

The True Positives, False Positives, True Negatives, False Negatives, True Positives + False Positives and True Positives + False Negatives.

iguanas.utils.utils.return_binary_pred_perf_of_set(y_true: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], y_preds: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, numpy.ndarray, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y_preds_columns: List[str], sample_weight=None, opt_func=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Calculates the performance of a set of binary predictors given a target column.

Parameters
y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]

Binary integer target column.

y_predsUnion[PandasDataFrameType, np.ndarray, KoalasDataFrameType]

Set of binary integer predictors. Can also be a single predictor.

y_preds_columnsList[str]

Column names for the y_preds array.

sample_weightUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional

Row-wise sample_weights to apply. Defaults to None.

opt_funcCallable, optional

A function/method which calculates a custom metric (e.g. Fbeta score) for each column. Defaults to None.

Returns
PandasDataFrameType

Dataframe containing the performance metrics for each binary predictor.

iguanas.utils.utils.return_rule_descriptions_from_X_rules(X_rules: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], X_rules_cols: List[str], y_true: None, sample_weight=None, opt_func=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Calculates the performance metrics for the standard rule_descriptions dataframe, given a set of rule binary columns.

Parameters
X_rulesUnion[PandasDataFrameType, KoalasDataFrameType]

Set of rule binary columns.

X_rules_colsList[str]

Columns associated with X_rules.

y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional

Binary integer target column. Defaults to None.

sample_weightUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional

Row-wise sample_weights to apply. Defaults to None.

opt_funcCallable, optional

A function/method which calculates a custom metric (e.g. Fbeta score) for each rule. Defaults to None.

Returns
PandasDataFrameType

The performance metrics for the standard rule_descriptions dataframe.

iguanas.utils.utils.flatten_stringified_json_column(X_column: iguanas.utils.typing.pandas.core.series.Series) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Flattens JSONs contained in a column to their own columns.

Parameters
X_columnPandasSeriesType

Contains the JSONs to be flattened.

Returns
PandasDataFrameType

Contains a column per key-value pair in the JSONs.

iguanas.utils.utils.count_rule_conditions(rule_string: str) int[source]

Counts the number of conditions in a rule string.

Parameters
rule_stringstr

The standard Iguanas string representation of the rule.

Returns
int

Number of conditions in the rule.

iguanas.utils.utils.return_progress_ready_range(verbose: bool, range: Iterable) Union[tqdm.std.tqdm, Iterable][source]

Returns a tqdm object for a given iterable, range, if verbose is True. The tqdm object prints the progress of iteration.

Parameters
verbosebool

Dictates whether the tqdm object should be returned.

rangeIterable

The iterable.

Returns
Union[tqdm, Iterable]

Either the tqdm-version of the iterable, or the original iterable.

iguanas.utils.utils.return_conf_matrix(y_true: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], y_pred: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Creates a confusion matrix from a binary target and binary predictor.

Parameters
y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]

Binary target.

y_predUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]

Binary predictor.

sample_weightUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional

Row-wise weights to apply. Defaults to None.

Returns
PandasDataFrameType

The confusion matrix (the index shows the predicted class; the column shows the actual class).

iguanas.utils.utils.check_allowed_types(x: object, x_name: str, allowed_types: List[str]) None[source]

Checks whether the stringified type of x is in allowed_types - a list of stringified types. If not, it raises a TypeError.

Parameters
xobject

The object to check the type of.

x_namestr

The objects name (used when raising the error).

allowed_typesList[str]

The list of allowed types (in string format).

Raises
TypeError

If str(type(x)) is not in allowed_types.

iguanas.utils.utils.is_type(x: object, types: List[str]) bool[source]

Returns whether the stringified type of x is in types - a list of stringified types.

Parameters
xobject

The object to check the type of.

typesList[str]

The list of allowed types (in string format) to check against.

Returns
bool

If str(type(x)) is in types.