API Reference - StringLengthOutOfBounds

String Length Out Of Bounds¶

[1]:

from deepchecks.checks.integrity.string_length_out_of_bounds import StringLengthOutOfBounds
import pandas as pd

[2]:

col1 = ["aaaaa33", "aaaaaaa33"]*40
col1.append("a")
col1.append("aaaaaadsfasdfasdf")

col2 = ["b", "abc"]*41

col3 = ["a"]*80
col3.append("a"*100)
col3.append("a"*200)
# col1 and col3 contrains outliers, col2 does not
df = pd.DataFrame({"col1":col1, "col2": col2, "col3": col3 })

[3]:

StringLengthOutOfBounds(min_unique_value_ratio=0.01).run(df)

String Length Out Of Bounds

Detect strings with length that is much longer/shorter than the identified "normal" string lengths. Read More...

Additional Outputs

* showing only the top 10 columns, you can change it using n_top_columns param

			Number of Outlier Samples	Example Samples
Column Name	Range of Detected Normal String Lengths	Range of Detected Outlier String Lengths
col1	7 - 9	1 - 1	1	['a']
col1	7 - 9	17 - 17	1	['aaaaaadsfasdfasdf']
col3	1 - 1	100 - 200	2	['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...']

[4]:

col = ["a","a","a","a","a","a","a","a","a","a","a","a","a","ab","ab","ab","ab","ab","ab", "ab"]*1000
col.append("basdbadsbaaaaaaaaaa")
col.append("basdbadsbaaaaaaaaaaa")
df = pd.DataFrame({"col1":col})
StringLengthOutOfBounds(num_percentiles=1000, min_unique_values=3).run(df)

String Length Out Of Bounds

Detect strings with length that is much longer/shorter than the identified "normal" string lengths. Read More...

Additional Outputs

* showing only the top 10 columns, you can change it using n_top_columns param

			Number of Outlier Samples	Example Samples
Column Name	Range of Detected Normal String Lengths	Range of Detected Outlier String Lengths
col1	1 - 2	19 - 20	2	['basdbadsbaaaaaaaaaa', 'basdbadsbaaaaaaaaaaa']

Special Characters

String Mismatch