String Mismatch ComparisonΒΆ
[1]:
from deepchecks.checks import StringMismatchComparison
import pandas as pd
data = {'col1': ['Deep', 'deep', 'deep!!!', 'earth', 'foo', 'bar', 'foo?']}
compared_data = {'col1': ['Deep', 'deep', '$deeP$', 'earth', 'foo', 'bar', 'foo?', '?deep']}
StringMismatchComparison().run(pd.DataFrame(data=data), pd.DataFrame(data=compared_data))
String Mismatch Comparison
Detect different variants of string categories between the same categorical column in two datasets. Read More...
Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
| Column name | col1 |
|---|---|
| Base form | deep |
| Common variants | ['deep', 'Deep'] |
| Variants only in test | ['$deeP$', '?deep'] |
| % Unique variants out of all dataset samples (count) | 25% (2) |
| Variants only in train | ['deep!!!'] |
| % Unique variants out of all baseline samples (count) | 14.29% (1) |