TrainTestLabelDrift¶

class TrainTestLabelDrift[source]¶

Calculate label drift between train dataset and test dataset, using statistical measures.

Check calculates a drift score for the label in test dataset, by comparing its distribution to the train dataset. For numerical columns, we use the Earth Movers Distance. See https://en.wikipedia.org/wiki/Wasserstein_metric For categorical columns, we use the Population Stability Index (PSI). See https://www.lexjansen.com/wuss/2017/47_Final_Paper_PDF.pdf.

Parameters

max_num_categoriesint , default: 10: Only for categorical columns. Max number of allowed categories. If there are more, they are binned into an “Other” category. If max_num_categories=None, there is no limit. This limit applies for both drift calculation and for distribution plots.

Methods

`TrainTestLabelDrift.add_condition`(name, ...)	Add new condition function to the check.
`TrainTestLabelDrift.add_condition_drift_score_not_greater_than`([...])	Add condition - require drift score to not be more than a certain threshold.
`TrainTestLabelDrift.clean_conditions`()	Remove all conditions from this check instance.
`TrainTestLabelDrift.conditions_decision`(result)	Run conditions on given result.
`TrainTestLabelDrift.name`()	Name of class in split camel case.
`TrainTestLabelDrift.params`([show_defaults])	Return parameters to show when printing the check.
`TrainTestLabelDrift.remove_condition`(index)	Remove given condition by index.
`TrainTestLabelDrift.run`(train_dataset, ...)	Run check.
`TrainTestLabelDrift.run_logic`(context)	Calculate drift for all columns.

Example¶

WholeDatasetDrift.run_logic

TrainTestLabelDrift.add_condition