Regression Error Distribution¶
Imports¶
[1]:
from deepchecks.base import Dataset
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from deepchecks.checks.performance import RegressionErrorDistribution
Generating data:¶
[2]:
diabetes_df = load_diabetes(return_X_y=False, as_frame=True).frame
train_df, test_df = train_test_split(diabetes_df, test_size=0.33, random_state=42)
train = Dataset(train_df, label='target', cat_features=['sex'])
test = Dataset(test_df, label='target', cat_features=['sex'])
clf = GradientBoostingRegressor(random_state=0)
_ = clf.fit(train.data[train.features], train.data[train.label_name])
Running RegressionErrorDistribution check (normal distribution):¶
[3]:
check = RegressionErrorDistribution()
[4]:
check.run(test, clf)
Largest over estimation errors:
| age | sex | bmi | bp | s1 | s2 | s3 | s4 | s5 | s6 | target | predicted target | target Prediction Difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 364 | 0.00 | 0.05 | -0.01 | -0.02 | -0.01 | 0.00 | -0.04 | 0.03 | 0.01 | 0.10 | 262.00 | 120.59 | 141.41 |
| 9 | -0.07 | -0.04 | 0.04 | -0.03 | -0.01 | -0.03 | -0.02 | -0.00 | 0.07 | -0.01 | 310.00 | 183.63 | 126.37 |
| 77 | -0.10 | -0.04 | -0.04 | -0.07 | -0.04 | -0.03 | 0.02 | -0.04 | -0.07 | -0.00 | 200.00 | 85.48 | 114.52 |
Largest under estimation errors:
| age | sex | bmi | bp | s1 | s2 | s3 | s4 | s5 | s6 | target | predicted target | target Prediction Difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 380 | 0.02 | -0.04 | 0.03 | 0.06 | -0.06 | -0.04 | -0.01 | -0.03 | -0.05 | -0.03 | 52.00 | 223.72 | -171.72 |
| 56 | -0.04 | -0.04 | 0.04 | -0.03 | -0.03 | -0.03 | -0.04 | 0.00 | 0.03 | -0.02 | 52.00 | 199.97 | -147.97 |
| 7 | 0.06 | 0.05 | -0.00 | 0.07 | 0.09 | 0.11 | 0.02 | 0.02 | -0.04 | 0.00 | 63.00 | 183.45 | -120.45 |
#Skewing the data:
[5]:
test.data[test.label_name] = 150
Running RegressionErrorDistribution check (abnormal distribution):¶
[6]:
check = RegressionErrorDistribution()
[7]:
check.run(test, clf)
Largest over estimation errors:
| age | sex | bmi | bp | s1 | s2 | s3 | s4 | s5 | s6 | target | predicted target | target Prediction Difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 237 | 0.06 | -0.04 | -0.07 | -0.07 | -0.00 | -0.00 | 0.04 | -0.04 | -0.05 | -0.00 | 150 | 59.07 | 90.93 |
| 436 | -0.06 | -0.04 | -0.07 | -0.05 | -0.02 | -0.05 | 0.09 | -0.08 | -0.06 | -0.05 | 150 | 61.05 | 88.95 |
| 55 | -0.04 | -0.04 | -0.05 | -0.04 | -0.01 | -0.02 | 0.09 | -0.04 | -0.07 | 0.01 | 150 | 61.54 | 88.46 |
Largest under estimation errors:
| age | sex | bmi | bp | s1 | s2 | s3 | s4 | s5 | s6 | target | predicted target | target Prediction Difference | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 114 | 0.02 | -0.04 | 0.11 | 0.06 | 0.01 | -0.03 | -0.02 | 0.02 | 0.10 | 0.02 | 150 | 302.13 | -152.13 |
| 332 | 0.03 | -0.04 | 0.10 | 0.08 | -0.01 | -0.01 | -0.06 | 0.03 | 0.06 | 0.04 | 150 | 295.71 | -145.71 |
| 321 | 0.10 | -0.04 | 0.05 | 0.08 | 0.05 | 0.04 | -0.08 | 0.14 | 0.10 | 0.06 | 150 | 269.18 | -119.18 |