Mixed Data Types¶
[1]:
from deepchecks.checks import MixedDataTypes
import pandas as pd
Benign condition¶
[2]:
data = {'col1': ['foo', 'bar', 'cat']}
dataframe = pd.DataFrame(data=data)
MixedDataTypes().add_condition_rare_type_ratio_not_in_range().run(dataframe)
Mixed Data Types
Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column. Read More...
Conditions Summary
| Status | Condition | More Info |
|---|---|---|
✓ |
Rare data types in column are either more than 10% or less than 1% of the data |
Additional Outputs
✓ Nothing found
Issue detected¶
[3]:
data = {'col1': ['str', '1.0', 1, 2 , 2.61 , 't', 1, 1, 1,1,1], 'col2':['', '', '1.0', 'a', 'b', 'c', 'a', 'a', 'a', 'a','a'],
'col3': [1,2,3,4,5,6,7,8, 9,10,11], 'col4': [1,2,3,4,5, 6, 7,8,'a',10,12]}
dataframe = pd.DataFrame(data=data)
MixedDataTypes().add_condition_rare_type_ratio_not_in_range().run(dataframe)
Mixed Data Types
Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column. Read More...
Conditions Summary
| Status | Condition | More Info |
|---|---|---|
! |
Rare data types in column are either more than 10% or less than 1% of the data | Found columns with non-negligible quantities of samples with a different data type from the majority of samples: ['col2', 'col4'] |
Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
| col1 | col2 | col4 | |
|---|---|---|---|
| strings | 18.18% | 90.91% | 9.09% |
| numbers | 81.82% | 9.09% | 90.91% |