Index Leakage¶
[1]:
from deepchecks.base import Dataset
from deepchecks.checks import IndexTrainTestLeakage
import pandas as pd
[2]:
def dataset_from_dict(d: dict, index_name: str = None) -> Dataset:
dataframe = pd.DataFrame(data=d)
return Dataset(dataframe, index_name=index_name)
Synthetic example with index leakage¶
[3]:
train_ds = dataset_from_dict({'col1': [1, 2, 3, 4, 10, 11]}, 'col1')
test_ds = dataset_from_dict({'col1': [4, 3, 5, 6, 7]}, 'col1')
check_obj = IndexTrainTestLeakage()
check_obj.run(train_ds, test_ds)
Index Train-Test Leakage
Check if test indexes are present in train data. Read More...
Additional Outputs
40.0% of test data indexes appear in training data
| 0 | |
|---|---|
| Sample of test indexes in train: | [3, 4] |
[4]:
train_ds = dataset_from_dict({'col1': [1, 2, 3, 4, 10, 11]}, 'col1')
test_ds = dataset_from_dict({'col1': [4, 3, 5, 6, 7]}, 'col1')
check_obj = IndexTrainTestLeakage(n_index_to_show=1)
check_obj.run(train_ds, test_ds)
Index Train-Test Leakage
Check if test indexes are present in train data. Read More...
Additional Outputs
40.0% of test data indexes appear in training data
| 0 | |
|---|---|
| Sample of test indexes in train: | [3] |
Synthetic example without index leakage¶
[5]:
train_ds = dataset_from_dict({'col1': [1, 2, 3, 4, 10, 11]}, 'col1')
test_ds = dataset_from_dict({'col1': [20, 21, 5, 6, 7]}, 'col1')
check_obj = IndexTrainTestLeakage()
check_obj.run(train_ds, test_ds)
Index Train-Test Leakage
Check if test indexes are present in train data. Read More...
Additional Outputs
✓ Nothing found