Binder badge Colab badge

Model Error Analysis

Load Data

The dataset is the adult dataset which can be downloaded from the UCI machine learning repository.

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

[1]:
import pandas as pd
from urllib.request import urlopen
from sklearn.preprocessing import LabelEncoder

name_data = urlopen('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names')
lines = [l.decode("utf-8") for l in name_data if ':' in l.decode("utf-8") and '|' not in l.decode("utf-8")]

features = [l.split(':')[0] for l in lines]
label_name = 'income'

cat_features = [l.split(':')[0] for l in lines if 'continuous' not in l]

train_df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data',
                       names=features + [label_name])
test_df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test',
                      names=features + [label_name], skiprows=1)

test_df[label_name] = test_df [label_name].str[:-1]

encoder = LabelEncoder()
encoder.fit(train_df[label_name])
train_df[label_name] = encoder.transform(train_df[label_name])
test_df[label_name] = encoder.transform(test_df[label_name])

Create Dataset

[2]:
from deepchecks import Dataset

cat_features = ['workclass', 'education', 'marital-status', 'occupation', 'relationship',
                'race', 'sex', 'native-country']
train_ds = Dataset(train_df, label=label_name, cat_features=cat_features)
test_ds = Dataset(test_df, label=label_name, cat_features=cat_features)

numeric_features = [feat_name for feat_name in train_ds.features if feat_name not in train_ds.cat_features]

Classification Model

[3]:
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier

numeric_transformer = SimpleImputer()
categorical_transformer = Pipeline(
    steps=[("imputer", SimpleImputer(strategy="most_frequent")), ("encoder", OrdinalEncoder())]
)

train_ds.features
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, cat_features),
    ]
)

model = Pipeline(steps=[("preprocessing", preprocessor), ("model", RandomForestClassifier(max_depth=5, n_jobs=-1, random_state=0))])
model.fit(train_ds.data[train_ds.features], train_ds.data[train_ds.label_name]);

Run Check

[4]:
from deepchecks.checks import ModelErrorAnalysis
[5]:
check = ModelErrorAnalysis(min_error_model_score=0.3)
check = check.add_condition_segments_performance_relative_difference_not_greater_than()
res = check.run(train_ds, test_ds, model)
res

Model Error Analysis

Find features that best split the data into segments of high and low model error. Read More...

Conditions Summary
Status Condition More Info
!
The performance difference of the detected segments must not be greater than 5% Found change in Accuracy in features above threshold: {'capital-gain': '10.57%', 'relationship': '23%'}
Additional Outputs
The following graphs show the distribution of error for top features that are most useful for distinguishing high error samples from low error samples. Top features are calculated using `feature_importances_`.
[6]:
res.value
[6]:
{'scorer_name': 'Accuracy',
 'feature_segments': {'capital-gain': {'segment1': {'score': 0.9442231075697212,
    'n_samples': 251,
    'frac_samples': 0.0502},
   'segment2': {'score': 0.8443882922720573,
    'n_samples': 4749,
    'frac_samples': 0.9498}},
  'relationship': {'segment1': {'score': 0.8595824204150019,
    'n_samples': 15518,
    'frac_samples': 0.9531355567839813},
   'segment2': {'score': 0.6618610747051114,
    'n_samples': 763,
    'frac_samples': 0.046864443216018674}}}}