API Reference - MixedDataTypes

Mixed Data Types¶

[1]:

from deepchecks.checks import MixedDataTypes
import pandas as pd

Benign condition¶

[2]:

data = {'col1': ['foo', 'bar', 'cat']}
dataframe = pd.DataFrame(data=data)
MixedDataTypes().add_condition_rare_type_ratio_not_in_range().run(dataframe)

Mixed Data Types

Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column. Read More...

Conditions Summary

Status	Condition	More Info
✓	Rare data types in column are either more than 10% or less than 1% of the data

Additional Outputs

✓ Nothing found

Issue detected¶

[3]:

data = {'col1': ['str', '1.0', 1, 2 , 2.61 , 't', 1, 1, 1,1,1], 'col2':['', '', '1.0', 'a', 'b', 'c', 'a', 'a', 'a', 'a','a'],
        'col3': [1,2,3,4,5,6,7,8, 9,10,11], 'col4': [1,2,3,4,5, 6, 7,8,'a',10,12]}
dataframe = pd.DataFrame(data=data)
MixedDataTypes().add_condition_rare_type_ratio_not_in_range().run(dataframe)

Mixed Data Types

Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column. Read More...

Conditions Summary

Status	Condition	More Info
!	Rare data types in column are either more than 10% or less than 1% of the data	Found columns with non-negligible quantities of samples with a different data type from the majority of samples: ['col2', 'col4']

Additional Outputs

* showing only the top 10 columns, you can change it using n_top_columns param

	col1	col2	col4
strings	18.18%	90.91%	9.09%
numbers	81.82%	9.09%	90.91%

Label Ambiguity

Mixed Nulls