Binder badge Colab badge

String Mismatch ComparisonΒΆ

[1]:
from deepchecks.checks import StringMismatchComparison
import pandas as pd

data = {'col1': ['Deep', 'deep', 'deep!!!', 'earth', 'foo', 'bar', 'foo?']}
compared_data = {'col1': ['Deep', 'deep', '$deeP$', 'earth', 'foo', 'bar', 'foo?', '?deep']}

StringMismatchComparison().run(pd.DataFrame(data=data), pd.DataFrame(data=compared_data))

String Mismatch Comparison

Detect different variants of string categories between the same categorical column in two datasets. Read More...

Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
Column name col1
Base form deep
Common variants ['deep', 'Deep']
Variants only in test ['$deeP$', '?deep']
% Unique variants out of all dataset samples (count) 25% (2)
Variants only in train ['deep!!!']
% Unique variants out of all baseline samples (count) 14.29% (1)