Binder badge Colab badge

Mixed Data Types

[1]:
from deepchecks.checks import MixedDataTypes
import pandas as pd

Benign condition

[2]:
data = {'col1': ['foo', 'bar', 'cat']}
dataframe = pd.DataFrame(data=data)
MixedDataTypes().add_condition_rare_type_ratio_not_in_range().run(dataframe)

Mixed Data Types

Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column. Read More...

Conditions Summary
Status Condition More Info
Rare data types in column are either more than 10% or less than 1% of the data
Additional Outputs

Nothing found

Issue detected

[3]:
data = {'col1': ['str', '1.0', 1, 2 , 2.61 , 't', 1, 1, 1,1,1], 'col2':['', '', '1.0', 'a', 'b', 'c', 'a', 'a', 'a', 'a','a'],
        'col3': [1,2,3,4,5,6,7,8, 9,10,11], 'col4': [1,2,3,4,5, 6, 7,8,'a',10,12]}
dataframe = pd.DataFrame(data=data)
MixedDataTypes().add_condition_rare_type_ratio_not_in_range().run(dataframe)

Mixed Data Types

Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column. Read More...

Conditions Summary
Status Condition More Info
!
Rare data types in column are either more than 10% or less than 1% of the data Found columns with non-negligible quantities of samples with a different data type from the majority of samples: ['col2', 'col4']
Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
  col1 col2 col4
strings 18.18% 90.91% 9.09%
numbers 81.82% 9.09% 90.91%