New CategoryΒΆ
[1]:
from deepchecks.checks.integrity.new_category import CategoryMismatchTrainTest
from deepchecks.base import Dataset
import pandas as pd
[2]:
train_data = {"col1": ["somebody", "once", "told", "me"] * 10}
test_data = {"col1": ["the","world","is", "gonna", "role", "me","I", "I"] * 10}
train = Dataset(pd.DataFrame(data=train_data), cat_features=["col1"])
test = Dataset(pd.DataFrame(data=test_data), cat_features=["col1"])
[3]:
CategoryMismatchTrainTest().run(train, test)
Category Mismatch Train Test
Find new categories in the test set. Read More...
Additional Outputs
| Number of new categories | Percent of new categories in sample | New categories examples | |
|---|---|---|---|
| Column | |||
| col1 | 6 | 87.5% | ['I', 'gonna', 'is', 'role', 'the'] |
[4]:
train_data = {"col1": ["a", "b", "a", "c"] * 10, "col2": ['a','b','b','q']*10}
test_data = {"col1": ["a","b","d"] * 10, "col2": ['a', '2', '1']*10}
train = Dataset(pd.DataFrame(data=train_data), cat_features=["col1","col2"])
test = Dataset(pd.DataFrame(data=test_data), cat_features=["col1", "col2"])
[5]:
CategoryMismatchTrainTest().run(train, test)
Category Mismatch Train Test
Find new categories in the test set. Read More...
Additional Outputs
| Number of new categories | Percent of new categories in sample | New categories examples | |
|---|---|---|---|
| Column | |||
| col1 | 1 | 33.33% | ['d'] |
| col2 | 2 | 66.67% | ['1', '2'] |