iguanas.rbs
.RBSPipeline¶
- class iguanas.rbs.RBSPipeline(config: List[dict], final_decision: int, opt_func: Callable)[source]¶
A pipeline with each stage giving a decision - either 0 or 1 (corresponding to the binary target). Each stage is configured with a set of rules which, if any of them trigger, mark the relevant records with that decision.
- Parameters
- configList[dict]
The pipeline configuration, where each element aligns to a stage in the pipeline. Each element is a dictionary, where the key is the decision made at that stage (either 0 or 1) and the value is a list of the rules that must trigger to give that decision.
- final_decisionint
The final decision to apply if no rules are triggered. Must be either 0 or 1.
- opt_funcCallable
The optimisation function used to calculate the performance metric of the pipeline (e.g. F1 score).
- Raises
- ValueError
config must be a list.
- ValueError
final_decision must be either 0 or 1.
- Attributes
- pipeline_opt_metricfloat
The result of the opt_func function when the pipeline is applied.
- conf_matrixPandasDataFrameType
The confusion matrix for the applied pipeline. Only generated after running calc_performance.
- conf_matrix_weightedPandasDataFrameType
The confusion matrix for the applied pipeline. Only generated after running calc_performance and when sample_weight is provided.
- pipeline_perfPandasDataFrameType
The performance (precision, recall, percentage of data flagged) of each decision made by the pipeline. Only generated after running calc_performance.
- predict(X_rules: iguanas.utils.typing.pandas.core.frame.DataFrame, y: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) iguanas.utils.typing.pandas.core.series.Series [source]¶
Applies the pipeline to the given dataset.
- Parameters
- X_rulesPandasDataFrameType
Dataset of each applied rule.
- yPandasSeriesType
The target.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None. Defaults to None.
- Returns
- PandasSeriesType
The prediction of the pipeline.
- calc_performance(y_true: iguanas.utils.typing.pandas.core.series.Series, y_pred: iguanas.utils.typing.pandas.core.series.Series, sample_weight=None) None [source]¶
Calculates the confusion matrices (non-weighted and weighted, if provided) and overall performance of the pipeline.
Note that for the confusion matrices, the index shows the predicted class; the column shows the actual class.
- Parameters
- y_truePandasSeriesType
The target.
- y_predPandasSeriesType
The RBS pipeline prediction.
- sample_weightPandasSeriesType, optional
Record-wise weights to apply. Defaults to None. Defaults to None.