API Reference - TrainTestSamplesMix

Train Test Samples Mix¶

Imports¶

[1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from deepchecks.checks.methodology import TrainTestSamplesMix
from deepchecks.base import Dataset

Generating data:¶

[2]:

iris = load_iris(return_X_y=False, as_frame=True)
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=55)
train_dataset = Dataset(pd.concat([X_train, y_train], axis=1),
            features=iris.feature_names,
            label='target')

test_df = pd.concat([X_test, y_test], axis=1)
bad_test = test_df.append(train_dataset.data.iloc[[0, 1, 1, 2, 3, 4]], ignore_index=True)

test_dataset = Dataset(bad_test,
            features=iris.feature_names,
            label='target')

Running data_sample_leakage_report check:¶

[3]:

check = TrainTestSamplesMix()

[4]:

check.run(test_dataset=test_dataset, train_dataset=train_dataset)

Train Test Samples Mix

Detect samples in the test data that appear also in training data. Read More...

Additional Outputs

11.76% (6 / 51) of test data samples appear in train data

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target
Train indices: 131 Test indices: 46, 47	7.90	3.80	6.40	2.00	2
Train indices: 23 Test indices: 49	5.10	3.30	1.70	0.50	0
Train indices: 101, 142 Test indices: 45	5.80	2.70	5.10	1.90	2
Train indices: 115 Test indices: 50	6.40	3.20	5.30	2.30	2
Train indices: 110 Test indices: 48	6.50	3.20	5.10	2.00	2

Single Feature Contribution Train Test

Unused Features