Utils¶
Contains functions that are shared across Iguanas modules.
- iguanas.utils.utils.concat(objs: List[Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.series.Series, iguanas.utils.typing.databricks.koalas.frame.DataFrame, iguanas.utils.typing.databricks.koalas.series.Series]], **kwargs) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame] [source]¶
Concatenates a set of Pandas Series/DataFrames or a set of Koalas Series/DataFrames.
- Parameters
- objsList[Union[PandasDataFrameType, PandasSeriesType, KoalasDataFrameType, KoalasSeriesType]]
List of Pandas/Koalas DataFrame to concatenate.
- Returns
- Union[PandasDataFrameType, KoalasDataFrameType]
The concatenated DataFrame.
- Raises
- Exception
objs must be a list of either Pandas objects or Koalas objects.
- iguanas.utils.utils.generate_empty_data_structures() Tuple[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.frame.DataFrame] [source]¶
Creates data structures often used in classes in Iguanas.
- Returns
- Tuple[PandasDataFrameType, PandasDataFrameType]
Contains the rule_descriptions and X_rules dataframes.
- iguanas.utils.utils.return_columns_types(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame]) Tuple[List, List, List] [source]¶
Returns the integer, float and OHE categorical columns for a given dataset.
- Parameters
- XUnion[PandasDataFrameType, KoalasDataFrameType])
Dataset.
- Returns
- Tuple[List, List, List]
List of integer columns, list of float columns, list of OHE categorical columns.
- iguanas.utils.utils.sort_rule_dfs_by_opt_metric(rule_descriptions: iguanas.utils.typing.pandas.core.frame.DataFrame, X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame) Tuple[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.frame.DataFrame] [source]¶
Method for sorting and reindexing rule_descriptions and X_rules by opt_metric.
- Parameters
- rule_descriptionsPandasDataFrameType
The standard rule_descriptions dataframe.
- X_rulesPandasDataFrameType
The binary columns of the rules.
- Returns
- Tuple[PandasDataFrameType, PandasDataFrameType]
rule_descriptions, X_rules
- iguanas.utils.utils.combine_rule_dfs(rule_descriptions_1: iguanas.utils.typing.pandas.core.frame.DataFrame, X_rules_1: iguanas.utils.typing.pandas.core.frame.DataFrame, rule_descriptions_2: iguanas.utils.typing.pandas.core.frame.DataFrame, X_rules_2: iguanas.utils.typing.pandas.core.frame.DataFrame) Tuple[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.pandas.core.frame.DataFrame] [source]¶
Combines the rule_description and X_rules object of two rule sets.
- Parameters
- rule_descriptions_1PandasDataFrameType
The first rule_descriptions.
- X_rules_1PandasDataFrameType
The first X_rules.
- rule_descriptions_2PandasDataFrameType
The second rule_descriptions.
- X_rules_2PandasDataFrameType
The second X_rules.
- Returns
- Tuple[PandasDataFrameType, PandasDataFrameType]
rule_descriptions, X_rules
- iguanas.utils.utils.create_spark_df(X: iguanas.utils.typing.databricks.koalas.frame.DataFrame, y: iguanas.utils.typing.databricks.koalas.series.Series, sample_weight=None) iguanas.utils.typing.pyspark.sql.dataframe.DataFrame [source]¶
Creates a Spark DataFrame from the features and target given as Koalas objects.
- Parameters
- XKoalasDataFrameType
The feature set.
- yKoalasSeriesType
The target.
- sample_weightKoalasSeriesType, optional
Row-wise weights to apply. Defaults to None.
- Returns
- PySparkDataFrameType
The Spark DataFrame.
- iguanas.utils.utils.calc_tps_fps_tns_fns(y_true: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], y_preds: Union[iguanas.utils.typing.pandas.core.series.Series, iguanas.utils.typing.pandas.core.frame.DataFrame, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series, iguanas.utils.typing.databricks.koalas.frame.DataFrame], sample_weight=None, tps=False, fps=False, tns=False, fns=False, tps_fps=False, tps_fns=False) Tuple[Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float], Union[numpy.ndarray, float]] [source]¶
Calculates the True Positives, False Positives, True Negatives, False Negatives, True Positives + False Positives and True Positives + False Negatives for a set of binary predictors, given a binary target. The option to calculate the True Positives + False Positives or True Positives + False Positives in one sum is given as it’s faster to calculate these metrics together rather than calculating the individual metrics separately and summing them.
- Parameters
- y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]
The binary target.
- y_predsUnion[PandasSeriesType, PandasDataFrameType, np.ndarray, KoalasSeriesType, KoalasDataFrameType]
The binary predictors.
- sample_weightUnion[np.array, PandasSeriesType, KoalasSeriesType], optional
Row-wise weights to apply. Defaults to None.
- tpsbool, optional
If True, the True Positives are calculated. Defaults to False.
- fpsbool, optional
If True, the False Positives are calculated. Defaults to False.
- tnsbool, optional
If True, the True Negatives are calculated. Defaults to False.
- fnsbool, optional
If True, the False Negatives are calculated. Defaults to False.
- tps_fpsbool, optional
If True, the True Positives + False Positives are calculated. Defaults to False.
- tps_fnsbool, optional
If True, the True Positives + False Negatives are calculated. Defaults to False.
- Returns
- Tuple[Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float], Union[np.ndarray, float]]
The True Positives, False Positives, True Negatives, False Negatives, True Positives + False Positives and True Positives + False Negatives.
- iguanas.utils.utils.return_binary_pred_perf_of_set(y_true: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], y_preds: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, numpy.ndarray, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y_preds_columns: List[str], sample_weight=None, opt_func=None) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Calculates the performance of a set of binary predictors given a target column.
- Parameters
- y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]
Binary integer target column.
- y_predsUnion[PandasDataFrameType, np.ndarray, KoalasDataFrameType]
Set of binary integer predictors. Can also be a single predictor.
- y_preds_columnsList[str]
Column names for the y_preds array.
- sample_weightUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional
Row-wise sample_weights to apply. Defaults to None.
- opt_funcCallable, optional
A function/method which calculates a custom metric (e.g. Fbeta score) for each column. Defaults to None.
- Returns
- PandasDataFrameType
Dataframe containing the performance metrics for each binary predictor.
- iguanas.utils.utils.return_rule_descriptions_from_X_rules(X_rules: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], X_rules_cols: List[str], y_true: None, sample_weight=None, opt_func=None) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Calculates the performance metrics for the standard rule_descriptions dataframe, given a set of rule binary columns.
- Parameters
- X_rulesUnion[PandasDataFrameType, KoalasDataFrameType]
Set of rule binary columns.
- X_rules_colsList[str]
Columns associated with X_rules.
- y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional
Binary integer target column. Defaults to None.
- sample_weightUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional
Row-wise sample_weights to apply. Defaults to None.
- opt_funcCallable, optional
A function/method which calculates a custom metric (e.g. Fbeta score) for each rule. Defaults to None.
- Returns
- PandasDataFrameType
The performance metrics for the standard rule_descriptions dataframe.
- iguanas.utils.utils.flatten_stringified_json_column(X_column: iguanas.utils.typing.pandas.core.series.Series) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Flattens JSONs contained in a column to their own columns.
- Parameters
- X_columnPandasSeriesType
Contains the JSONs to be flattened.
- Returns
- PandasDataFrameType
Contains a column per key-value pair in the JSONs.
- iguanas.utils.utils.count_rule_conditions(rule_string: str) int [source]¶
Counts the number of conditions in a rule string.
- Parameters
- rule_stringstr
The standard Iguanas string representation of the rule.
- Returns
- int
Number of conditions in the rule.
- iguanas.utils.utils.return_progress_ready_range(verbose: bool, range: Iterable) Union[tqdm.std.tqdm, Iterable] [source]¶
Returns a tqdm object for a given iterable, range, if verbose is True. The tqdm object prints the progress of iteration.
- Parameters
- verbosebool
Dictates whether the tqdm object should be returned.
- rangeIterable
The iterable.
- Returns
- Union[tqdm, Iterable]
Either the tqdm-version of the iterable, or the original iterable.
- iguanas.utils.utils.return_conf_matrix(y_true: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], y_pred: Union[iguanas.utils.typing.pandas.core.series.Series, numpy.ndarray, iguanas.utils.typing.databricks.koalas.series.Series], sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Creates a confusion matrix from a binary target and binary predictor.
- Parameters
- y_trueUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]
Binary target.
- y_predUnion[PandasSeriesType, np.ndarray, KoalasSeriesType]
Binary predictor.
- sample_weightUnion[PandasSeriesType, np.ndarray, KoalasSeriesType], optional
Row-wise weights to apply. Defaults to None.
- Returns
- PandasDataFrameType
The confusion matrix (the index shows the predicted class; the column shows the actual class).
- iguanas.utils.utils.check_allowed_types(x: object, x_name: str, allowed_types: List[str]) None [source]¶
Checks whether the stringified type of x is in allowed_types - a list of stringified types. If not, it raises a TypeError.
- Parameters
- xobject
The object to check the type of.
- x_namestr
The objects name (used when raising the error).
- allowed_typesList[str]
The list of allowed types (in string format).
- Raises
- TypeError
If str(type(x)) is not in allowed_types.
- iguanas.utils.utils.is_type(x: object, types: List[str]) bool [source]¶
Returns whether the stringified type of x is in types - a list of stringified types.
- Parameters
- xobject
The object to check the type of.
- typesList[str]
The list of allowed types (in string format) to check against.
- Returns
- bool
If str(type(x)) is in types.