iguanas.rule_generation.RuleGeneratorOpt

class iguanas.rule_generation.RuleGeneratorOpt(opt_func: Callable, n_total_conditions: int, num_rules_keep: int, n_points=10, ratio_window=2, one_cond_rule_opt_func=<bound method FScore.fit of FScore with beta=1>, remove_corr_rules=True, target_feat_corr_types=None, verbose=0, rule_name_prefix=None)[source]

Generate rules by optimising the thresholds of single features and combining these one condition rules with AND conditions to create more complex rules.

Parameters
opt_funcCallable

A function/method which calculates the desired optimisation metric (e.g. Fbeta score). Note that the module will assume higher values correspond to better performing rules.

n_total_conditionsint

The maximum number of conditions per generated rule.

num_rules_keepint

The top number of rules (by Fbeta score) to keep at the end of each stage of rule combination. Reducing this number will improve the runtime, but may result in useful rules being removed.

n_pointsint, optional

Number of points to split a numeric feature’s range into when generating the numeric one condition rules. A larger number will result in better optimised one condition rules, but will take longer to calculate. Defaults to 10.

ratio_windowint, optional

Factor which determines the optimisation range for numeric features (e.g. if a numeric field has range of 1 to 11 and ratio_window = 3, the optimisation range for the <= operator will be from 1 to (11-1)/3 = 3.33; the optimisation range for the >= operator will be from 11-((11-1)/3)=7.67 to 11). A larger number (greater than 1) will result in a smaller range being used for optimisation of one condition rules; set to 1 if you want to optimise the one condition rules across the full range of the numeric feature. Defaults to 2.

one_cond_rule_opt_funcCallable, optional

The optimisation function used for one condition rules. Note that the module will assume higher values correspond to better performing rules. Defaults to the method used for calculating the F1 score.

remove_corr_rulesbool, optional

Dictates whether correlated rules should be removed at the end of each pairwise combination. Defaults to True.

target_feat_corr_typesUnion[Dict[str, List[str]], str], optional

Limits the conditions of the rules based on the target-feature correlation (e.g. if a feature has a positive correlation with respect to the target, then only greater than operators are used for conditions that utilise that feature). Can be either a dictionary specifying the list of positively correlated features wrt the target (under the key PositiveCorr) and negatively correlated features wrt the target (under the key NegativeCorr), or ‘Infer’ (where each target-feature correlation type is inferred from the data). Defaults to None.

verboseint, optional

Controls the verbosity - the higher, the more messages. >0 : gives the progress of the training of the rules. Defaults to 0.

rule_name_prefixstr, optional

Prefix to use for each rule name. If None, the standard prefix is used. Defaults to None.

Attributes
rule_stringsDict[str, str]

The generated rules, defined using the standard Iguanas string format (values) and their names (keys).

rule_descriptionsPandasDataFrameType

A dataframe showing the logic of the rules and their performance metrics on the given dataset.

fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame[source]

Generate rules by optimising the thresholds of single features and combining these one condition rules with AND conditions to create more complex rules.

Parameters
XPandasDataFrameType

The feature set used for training the model.

yPandasSeriesType

The target column.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None.

Returns
PandasDataFrameType

The binary columns of the rules on the fitted dataset.

as_rule_dicts() Dict[str, dict]

Converts rules into the standard Iguanas dictionary format.

Returns
Dict[str, dict]

Rules in the standard Iguanas dictionary format.

as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]]

Converts rules into the standard Iguanas lambda expression format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

with_kwargsbool

If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.

Returns
Dict[str, Callable[[dict], str]]

Rules in the standard Iguanas lambda expression format.

as_rule_strings(as_numpy: bool) Dict[str, str]

Converts rules into the standard Iguanas string format.

Parameters
as_numpybool

If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.

Returns
Dict[str, str]

Rules in the standard Iguanas string format.

filter_rules(include=None, exclude=None) None

Filters the rules by their names.

Parameters
includeList[str], optional

The list of rule names to keep. Defaults to None.

excludeList[str], optional

The list of rule names to drop. Defaults to None.

Raises
Exception

include and exclude cannot contain similar values.

get_rule_features() Dict[str, set]

Returns the set of unique features present in each rule.

Returns
Dict[str, set]

Set of unique features (values) in each rule (keys).

transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame]

Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.

Parameters
XUnion[PandasDataFrameType, KoalasDataFrameType]

The feature set on which the rules should be applied.

yUnion[PandasSeriesType, KoalasSeriesType], optional

The target column. Defaults to None.

sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional

Record-wise weights to apply. Defaults to None.

Returns
Union[PandasDataFrameType, KoalasDataFrameType]

The binary columns of the rules.