IntegrityΒΆ

Module contains all data integrity checks.

Classes

MixedNulls

Search for various types of null values in a string column(s), including string representations of null.

StringMismatch

Detect different variants of string categories (e.g.

MixedDataTypes

Detect a small amount of a rare data type within a column, such as few string samples in a mostly numeric column.

IsSingleValue

Check if there are columns which have only a single unique value in all rows.

SpecialCharacters

Search in column[s] for values that contains only special characters.

StringLengthOutOfBounds

Detect strings with length that is much longer/shorter than the identified "normal" string lengths.

StringMismatchComparison

Detect different variants of string categories between the same categorical column in two datasets.

DominantFrequencyChange

Check if dominant values have increased significantly between test and reference data.

DataDuplicates

Checks for duplicate samples in the dataset.

CategoryMismatchTrainTest

Find new categories in the test set.

NewLabelTrainTest

Find new labels in test.

LabelAmbiguity

Find samples with multiple labels.