API Reference - RegressionErrorDistribution

Regression Error Distribution¶

Imports¶

[1]:

from deepchecks.base import Dataset
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from deepchecks.checks.performance import RegressionErrorDistribution

Generating data:¶

[2]:

diabetes_df = load_diabetes(return_X_y=False, as_frame=True).frame
train_df, test_df = train_test_split(diabetes_df, test_size=0.33, random_state=42)

train = Dataset(train_df, label='target', cat_features=['sex'])
test = Dataset(test_df, label='target', cat_features=['sex'])

clf = GradientBoostingRegressor(random_state=0)
_ = clf.fit(train.data[train.features], train.data[train.label_name])

Running RegressionErrorDistribution check (normal distribution):¶

[3]:

check = RegressionErrorDistribution()

[4]:

check.run(test, clf)

Regression Error Distribution

Check regression error distribution. Read More...

Additional Outputs

Largest over estimation errors:

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target	predicted target	target Prediction Difference
364	0.00	0.05	-0.01	-0.02	-0.01	0.00	-0.04	0.03	0.01	0.10	262.00	120.59	141.41
9	-0.07	-0.04	0.04	-0.03	-0.01	-0.03	-0.02	-0.00	0.07	-0.01	310.00	183.63	126.37
77	-0.10	-0.04	-0.04	-0.07	-0.04	-0.03	0.02	-0.04	-0.07	-0.00	200.00	85.48	114.52

Largest under estimation errors:

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target	predicted target	target Prediction Difference
380	0.02	-0.04	0.03	0.06	-0.06	-0.04	-0.01	-0.03	-0.05	-0.03	52.00	223.72	-171.72
56	-0.04	-0.04	0.04	-0.03	-0.03	-0.03	-0.04	0.00	0.03	-0.02	52.00	199.97	-147.97
7	0.06	0.05	-0.00	0.07	0.09	0.11	0.02	0.02	-0.04	0.00	63.00	183.45	-120.45

#Skewing the data:

[5]:

test.data[test.label_name] = 150

Running RegressionErrorDistribution check (abnormal distribution):¶

[6]:

check = RegressionErrorDistribution()

[7]:

check.run(test, clf)

Regression Error Distribution

Check regression error distribution. Read More...

Additional Outputs

Largest over estimation errors:

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target	predicted target	target Prediction Difference
237	0.06	-0.04	-0.07	-0.07	-0.00	-0.00	0.04	-0.04	-0.05	-0.00	150	59.07	90.93
436	-0.06	-0.04	-0.07	-0.05	-0.02	-0.05	0.09	-0.08	-0.06	-0.05	150	61.05	88.95
55	-0.04	-0.04	-0.05	-0.04	-0.01	-0.02	0.09	-0.04	-0.07	0.01	150	61.54	88.46

Largest under estimation errors:

	age	sex	bmi	bp	s1	s2	s3	s4	s5	s6	target	predicted target	target Prediction Difference
114	0.02	-0.04	0.11	0.06	0.01	-0.03	-0.02	0.02	0.10	0.02	150	302.13	-152.13
332	0.03	-0.04	0.10	0.08	-0.01	-0.01	-0.06	0.03	0.06	0.04	150	295.71	-145.71
321	0.10	-0.04	0.05	0.08	0.05	0.04	-0.08	0.14	0.10	0.06	150	269.18	-119.18

Performance Report

Regression Systematic Error