Binder badge Colab badge

New CategoryΒΆ

[1]:
from deepchecks.checks.integrity.new_category import CategoryMismatchTrainTest
from deepchecks.base import Dataset
import pandas as pd
[2]:
train_data = {"col1": ["somebody", "once", "told", "me"] * 10}
test_data = {"col1": ["the","world","is", "gonna", "role", "me","I", "I"] * 10}
train = Dataset(pd.DataFrame(data=train_data), cat_features=["col1"])
test = Dataset(pd.DataFrame(data=test_data), cat_features=["col1"])
[3]:
CategoryMismatchTrainTest().run(train, test)

Category Mismatch Train Test

Find new categories in the test set. Read More...

Additional Outputs
  Number of new categories Percent of new categories in sample New categories examples
Column      
col1 6 87.5% ['I', 'gonna', 'is', 'role', 'the']
[4]:
train_data = {"col1": ["a", "b", "a", "c"] * 10, "col2": ['a','b','b','q']*10}
test_data = {"col1": ["a","b","d"] * 10, "col2": ['a', '2', '1']*10}
train = Dataset(pd.DataFrame(data=train_data), cat_features=["col1","col2"])
test = Dataset(pd.DataFrame(data=test_data), cat_features=["col1", "col2"])
[5]:
CategoryMismatchTrainTest().run(train, test)

Category Mismatch Train Test

Find new categories in the test set. Read More...

Additional Outputs
  Number of new categories Percent of new categories in sample New categories examples
Column      
col1 1 33.33% ['d']
col2 2 66.67% ['1', '2']