Model Error Analysis

Load Data

The dataset is the adult dataset which can be downloaded from the UCI machine learning repository.

import pandas as pd
from urllib.request import urlopen
from sklearn.preprocessing import LabelEncoder

name_data = urlopen('')
lines = [l.decode("utf-8") for l in name_data if ':' in l.decode("utf-8") and '|' not in l.decode("utf-8")]

features = [l.split(':')[0] for l in lines]
label_name = 'income'

cat_features = [l.split(':')[0] for l in lines if 'continuous' not in l]

train_df = pd.read_csv('',
                       names=features + [label_name])
test_df = pd.read_csv('',
                      names=features + [label_name], skiprows=1)

test_df[label_name] = test_df [label_name].str[:-1]

encoder = LabelEncoder()[label_name])
train_df[label_name] = encoder.transform(train_df[label_name])
test_df[label_name] = encoder.transform(test_df[label_name])

Create Dataset

from deepchecks import Dataset

cat_features = ['workclass', 'education', 'marital-status', 'occupation', 'relationship',
                'race', 'sex', 'native-country']
train_ds = Dataset(train_df, label=label_name, cat_features=cat_features)
test_ds = Dataset(test_df, label=label_name, cat_features=cat_features)

numeric_features = [feat_name for feat_name in train_ds.features if feat_name not in train_ds.cat_features]

Classification Model

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier

numeric_transformer = SimpleImputer()
categorical_transformer = Pipeline(
    steps=[("imputer", SimpleImputer(strategy="most_frequent")), ("encoder", OrdinalEncoder())]

preprocessor = ColumnTransformer(
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, cat_features),

model = Pipeline(steps=[("preprocessing", preprocessor), ("model", RandomForestClassifier(max_depth=5, n_jobs=-1, random_state=0))])[train_ds.features],[train_ds.label_name]);

Run Check

from deepchecks.checks import ModelErrorAnalysis
check = ModelErrorAnalysis(min_error_model_score=0.3)
check = check.add_condition_segments_performance_relative_difference_not_greater_than()
res =, test_ds, model)

Model Error Analysis

Find features that best split the data into segments of high and low model error. Read More...

Conditions Summary
Status Condition More Info
The performance difference of the detected segments must not be greater than 5% Found change in Accuracy in features above threshold: {'capital-gain': '10.57%', 'relationship': '23%'}
Additional Outputs
The following graphs show the distribution of error for top features that are most useful for distinguishing high error samples from low error samples. Top features are calculated using `feature_importances_`.
{'scorer_name': 'Accuracy',
 'feature_segments': {'capital-gain': {'segment1': {'score': 0.9442231075697212,
    'n_samples': 251,
    'frac_samples': 0.0502},
   'segment2': {'score': 0.8443882922720573,
    'n_samples': 4749,
    'frac_samples': 0.9498}},
  'relationship': {'segment1': {'score': 0.8595824204150019,
    'n_samples': 15518,
    'frac_samples': 0.9531355567839813},
   'segment2': {'score': 0.6618610747051114,
    'n_samples': 763,
    'frac_samples': 0.046864443216018674}}}}