Dominant Frequency Change¶
Imports¶
[1]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from deepchecks.checks.integrity import DominantFrequencyChange
from deepchecks.base import Dataset
Generating data:¶
[2]:
iris = load_iris(return_X_y=False, as_frame=True)
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=55)
train_dataset = Dataset(pd.concat([X_train, y_train], axis=1),
features=iris.feature_names,
label='target')
test_df = pd.concat([X_test, y_test], axis=1)
# make duplicates in the test data
test_df.loc[test_df.index % 2 == 0, 'petal length (cm)'] = 5.1
test_df.loc[test_df.index / 3 > 8, 'sepal width (cm)'] = 2.7
validation_dataset = Dataset(test_df,
features=iris.feature_names,
label='target')
Running dominant_frequency_change check:¶
[3]:
check = DominantFrequencyChange()
[4]:
check.run(validation_dataset, train_dataset)
Dominant Frequency Change
Check if dominant values have increased significantly between test and reference data. Read More...
Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
| Value | Train data % | Test data % | Train data # | Test data # | P value | |
|---|---|---|---|---|---|---|
| Column | ||||||
| sepal width (cm) | 2.70 | 0.82 | 0.07 | 37 | 7 | 0.00 |
| petal length (cm) | 5.10 | 0.56 | 0.06 | 25 | 6 | 0.00 |
[ ]: