String Length Out Of BoundsΒΆ
[1]:
from deepchecks.checks.integrity.string_length_out_of_bounds import StringLengthOutOfBounds
import pandas as pd
[2]:
col1 = ["aaaaa33", "aaaaaaa33"]*40
col1.append("a")
col1.append("aaaaaadsfasdfasdf")
col2 = ["b", "abc"]*41
col3 = ["a"]*80
col3.append("a"*100)
col3.append("a"*200)
# col1 and col3 contrains outliers, col2 does not
df = pd.DataFrame({"col1":col1, "col2": col2, "col3": col3 })
[3]:
StringLengthOutOfBounds(min_unique_value_ratio=0.01).run(df)
String Length Out Of Bounds
Detect strings with length that is much longer/shorter than the identified "normal" string lengths. Read More...
Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
| Number of Outlier Samples | Example Samples | |||
|---|---|---|---|---|
| Column Name | Range of Detected Normal String Lengths | Range of Detected Outlier String Lengths | ||
| col1 | 7 - 9 | 1 - 1 | 1 | ['a'] |
| 17 - 17 | 1 | ['aaaaaadsfasdfasdf'] | ||
| col3 | 1 - 1 | 100 - 200 | 2 | ['aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...', 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa...'] |
[4]:
col = ["a","a","a","a","a","a","a","a","a","a","a","a","a","ab","ab","ab","ab","ab","ab", "ab"]*1000
col.append("basdbadsbaaaaaaaaaa")
col.append("basdbadsbaaaaaaaaaaa")
df = pd.DataFrame({"col1":col})
StringLengthOutOfBounds(num_percentiles=1000, min_unique_values=3).run(df)
String Length Out Of Bounds
Detect strings with length that is much longer/shorter than the identified "normal" string lengths. Read More...
Additional Outputs
* showing only the top 10 columns, you can change it using n_top_columns param
| Number of Outlier Samples | Example Samples | |||
|---|---|---|---|---|
| Column Name | Range of Detected Normal String Lengths | Range of Detected Outlier String Lengths | ||
| col1 | 1 - 2 | 19 - 20 | 2 | ['basdbadsbaaaaaaaaaa', 'basdbadsbaaaaaaaaaaa'] |