Binder badge Colab badge

Index Leakage

[1]:
from deepchecks.base import Dataset
from deepchecks.checks import IndexTrainTestLeakage
import pandas as pd
[2]:
def dataset_from_dict(d: dict, index_name: str = None) -> Dataset:
    dataframe = pd.DataFrame(data=d)
    return Dataset(dataframe, index_name=index_name)

Synthetic example with index leakage

[3]:
train_ds = dataset_from_dict({'col1': [1, 2, 3, 4, 10, 11]}, 'col1')
test_ds = dataset_from_dict({'col1': [4, 3, 5, 6, 7]}, 'col1')
check_obj = IndexTrainTestLeakage()
check_obj.run(train_ds, test_ds)

Index Train-Test Leakage

Check if test indexes are present in train data. Read More...

Additional Outputs
40.0% of test data indexes appear in training data
  0
Sample of test indexes in train: [3, 4]
[4]:
train_ds = dataset_from_dict({'col1': [1, 2, 3, 4, 10, 11]}, 'col1')
test_ds = dataset_from_dict({'col1': [4, 3, 5, 6, 7]}, 'col1')
check_obj = IndexTrainTestLeakage(n_index_to_show=1)
check_obj.run(train_ds, test_ds)

Index Train-Test Leakage

Check if test indexes are present in train data. Read More...

Additional Outputs
40.0% of test data indexes appear in training data
  0
Sample of test indexes in train: [3]

Synthetic example without index leakage

[5]:
train_ds = dataset_from_dict({'col1': [1, 2, 3, 4, 10, 11]}, 'col1')
test_ds = dataset_from_dict({'col1': [20, 21, 5, 6, 7]}, 'col1')
check_obj = IndexTrainTestLeakage()
check_obj.run(train_ds, test_ds)

Index Train-Test Leakage

Check if test indexes are present in train data. Read More...

Additional Outputs

Nothing found