iguanas.rule_generation
.RuleGeneratorOpt¶
- class iguanas.rule_generation.RuleGeneratorOpt(opt_func: Callable, n_total_conditions: int, num_rules_keep: int, n_points=10, ratio_window=2, one_cond_rule_opt_func=<bound method FScore.fit of FScore with beta=1>, remove_corr_rules=True, target_feat_corr_types=None, verbose=0, rule_name_prefix=None)[source]¶
Generate rules by optimising the thresholds of single features and combining these one condition rules with AND conditions to create more complex rules.
- Parameters
- opt_funcCallable
A function/method which calculates the desired optimisation metric (e.g. Fbeta score). Note that the module will assume higher values correspond to better performing rules.
- n_total_conditionsint
The maximum number of conditions per generated rule.
- num_rules_keepint
The top number of rules (by Fbeta score) to keep at the end of each stage of rule combination. Reducing this number will improve the runtime, but may result in useful rules being removed.
- n_pointsint, optional
Number of points to split a numeric feature’s range into when generating the numeric one condition rules. A larger number will result in better optimised one condition rules, but will take longer to calculate. Defaults to 10.
- ratio_windowint, optional
Factor which determines the optimisation range for numeric features (e.g. if a numeric field has range of 1 to 11 and ratio_window = 3, the optimisation range for the <= operator will be from 1 to (11-1)/3 = 3.33; the optimisation range for the >= operator will be from 11-((11-1)/3)=7.67 to 11). A larger number (greater than 1) will result in a smaller range being used for optimisation of one condition rules; set to 1 if you want to optimise the one condition rules across the full range of the numeric feature. Defaults to 2.
- one_cond_rule_opt_funcCallable, optional
The optimisation function used for one condition rules. Note that the module will assume higher values correspond to better performing rules. Defaults to the method used for calculating the F1 score.
- remove_corr_rulesbool, optional
Dictates whether correlated rules should be removed at the end of each pairwise combination. Defaults to True.
- target_feat_corr_typesUnion[Dict[str, List[str]], str], optional
Limits the conditions of the rules based on the target-feature correlation (e.g. if a feature has a positive correlation with respect to the target, then only greater than operators are used for conditions that utilise that feature). Can be either a dictionary specifying the list of positively correlated features wrt the target (under the key PositiveCorr) and negatively correlated features wrt the target (under the key NegativeCorr), or ‘Infer’ (where each target-feature correlation type is inferred from the data). Defaults to None.
- verboseint, optional
Controls the verbosity - the higher, the more messages. >0 : gives the progress of the training of the rules. Defaults to 0.
- rule_name_prefixstr, optional
Prefix to use for each rule name. If None, the standard prefix is used. Defaults to None.
- Attributes
- rule_stringsDict[str, str]
The generated rules, defined using the standard Iguanas string format (values) and their names (keys).
- rule_descriptionsPandasDataFrameType
A dataframe showing the logic of the rules and their performance metrics on the given dataset.
- fit(X: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.frame.DataFrame [source]¶
Generate rules by optimising the thresholds of single features and combining these one condition rules with AND conditions to create more complex rules.
- Parameters
- XPandasDataFrameType
The feature set used for training the model.
- yPandasSeriesType
The target column.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None.
- Returns
- PandasDataFrameType
The binary columns of the rules on the fitted dataset.
- as_rule_dicts() Dict[str, dict] ¶
Converts rules into the standard Iguanas dictionary format.
- Returns
- Dict[str, dict]
Rules in the standard Iguanas dictionary format.
- as_rule_lambdas(as_numpy: bool, with_kwargs: bool) Dict[str, Callable[[dict], str]] ¶
Converts rules into the standard Iguanas lambda expression format.
- Parameters
- as_numpybool
If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.
- with_kwargsbool
If True, the string in the lambda expression is created such that the inputs are keyword arguments. If False, the inputs are positional arguments.
- Returns
- Dict[str, Callable[[dict], str]]
Rules in the standard Iguanas lambda expression format.
- as_rule_strings(as_numpy: bool) Dict[str, str] ¶
Converts rules into the standard Iguanas string format.
- Parameters
- as_numpybool
If True, the conditions in the string format will uses Numpy rather than Pandas. These rules are generally evaluated more quickly on larger dataset stored as Pandas DataFrames.
- Returns
- Dict[str, str]
Rules in the standard Iguanas string format.
- filter_rules(include=None, exclude=None) None ¶
Filters the rules by their names.
- Parameters
- includeList[str], optional
The list of rule names to keep. Defaults to None.
- excludeList[str], optional
The list of rule names to drop. Defaults to None.
- Raises
- Exception
include and exclude cannot contain similar values.
- get_rule_features() Dict[str, set] ¶
Returns the set of unique features present in each rule.
- Returns
- Dict[str, set]
Set of unique features (values) in each rule (keys).
- transform(X: Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame], y=None, sample_weight=None) Union[iguanas.utils.typing.pandas.core.frame.DataFrame, iguanas.utils.typing.databricks.koalas.frame.DataFrame] ¶
Applies the set of rules to a dataset, X. If y is provided, the performance metrics for each rule will also be calculated.
- Parameters
- XUnion[PandasDataFrameType, KoalasDataFrameType]
The feature set on which the rules should be applied.
- yUnion[PandasSeriesType, KoalasSeriesType], optional
The target column. Defaults to None.
- sample_weightUnion[PandasSeriesType, KoalasSeriesType], optional
Record-wise weights to apply. Defaults to None.
- Returns
- Union[PandasDataFrameType, KoalasDataFrameType]
The binary columns of the rules.