iguanas.rbs.RBSPipeline

class iguanas.rbs.RBSPipeline(config: List[dict], final_decision: int, opt_func: Callable)[source]

A pipeline with each stage giving a decision - either 0 or 1 (corresponding to the binary target). Each stage is configured with a set of rules which, if any of them trigger, mark the relevant records with that decision.

Parameters
configList[dict]

The pipeline configuration, where each element aligns to a stage in the pipeline. Each element is a dictionary, where the key is the decision made at that stage (either 0 or 1) and the value is a list of the rules that must trigger to give that decision.

final_decisionint

The final decision to apply if no rules are triggered. Must be either 0 or 1.

opt_funcCallable

The optimisation function used to calculate the performance metric of the pipeline (e.g. F1 score).

Raises
ValueError

config must be a list.

ValueError

final_decision must be either 0 or 1.

Attributes
pipeline_opt_metricfloat

The result of the opt_func function when the pipeline is applied.

conf_matrixPandasDataFrameType

The confusion matrix for the applied pipeline. Only generated after running calc_performance.

conf_matrix_weightedPandasDataFrameType

The confusion matrix for the applied pipeline. Only generated after running calc_performance and when sample_weight is provided.

pipeline_perfPandasDataFrameType

The performance (precision, recall, percentage of data flagged) of each decision made by the pipeline. Only generated after running calc_performance.

predict(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.series.Series[source]

Applies the pipeline to the given dataset.

Parameters
X_rulesPandasDataFrameType

Dataset of each applied rule.

yPandasSeriesType

The target.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None. Defaults to None.

Returns
PandasSeriesType

The prediction of the pipeline.

calc_performance(y_true: iguanas.utils.typing.pandas.core.series.Series, y_pred: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None[source]

Calculates the confusion matrices (non-weighted and weighted, if provided) and overall performance of the pipeline.

Note that for the confusion matrices, the index shows the predicted class; the column shows the actual class.

Parameters
y_truePandasSeriesType

The target.

y_predPandasSeriesType

The RBS pipeline prediction.

sample_weightPandasSeriesType, optional

Record-wise weights to apply. Defaults to None. Defaults to None.