++++Data Science
May 2026×Notebook lesson
Notebook converted from Jupyter for blog publishing.
01-Gradient-Boosting
Driptanil DattaSoftware Developer
Gradient Boosting and GridSearch
The Data
Mushroom Hunting: Edible or Poisonous?
Data Source: https://archive.ics.uci.edu/ml/datasets/Mushroom (opens in a new tab)
This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy.
Attribute Information:
- cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
- cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
- cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y
- bruises?: bruises=t,no=f
- odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s
- gill-attachment: attached=a,descending=d,free=f,notched=n
- gill-spacing: close=c,crowded=w,distant=d
- gill-size: broad=b,narrow=n
- gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y
- stalk-shape: enlarging=e,tapering=t
- stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
- stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
- stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
- stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
- stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
- veil-type: partial=p,universal=u
- veil-color: brown=n,orange=o,white=w,yellow=y
- ring-number: none=n,one=o,two=t
- ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z
- spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y
- population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y
- habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d
Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsdf = pd.read_csv("../DATA/mushrooms.csv")df.head()HTML
MORE
class
cap-shape
cap-surface
cap-color
bruisesData Prep
X = df.drop('class',axis=1)y = df['class']X = pd.get_dummies(X,drop_first=True)X.head()HTML
MORE
cap-shape_c
cap-shape_f
cap-shape_k
cap-shape_s
cap-shape_xy.head()RESULT
MORE
0 p
1 e
2 e
3 p
4 eTrain Test Split
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=101)Gradient Boosting and Grid Search with CV
from sklearn.ensemble import GradientBoostingClassifierhelp(GradientBoostingClassifier)STDOUT
MORE
Help on class GradientBoostingClassifier in module sklearn.ensemble._gb:
class GradientBoostingClassifier(sklearn.base.ClassifierMixin, BaseGradientBoosting)
| GradientBoostingClassifier(*, loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='deprecated', validation_fraction=0.1, n_iter_no_change=None, tol=0.0001, ccp_alpha=0.0)
| from sklearn.model_selection import GridSearchCVparam_grid = {"n_estimators":[1,5,10,20,40,100],'max_depth':[3,4,5,6]}gb_model = GradientBoostingClassifier()grid = GridSearchCV(gb_model,param_grid)Fit to Training Data with CV Search
grid.fit(X_train,y_train)RESULT
GridSearchCV(estimator=GradientBoostingClassifier(),
param_grid={'max_depth': [3, 4, 5, 6],
'n_estimators': [1, 5, 10, 20, 40, 100]})grid.best_params_RESULT
{'max_depth': 3, 'n_estimators': 100}Performance
from sklearn.metrics import classification_report,plot_confusion_matrix,accuracy_scorepredictions = grid.predict(X_test)predictionsRESULT
array(['p', 'e', 'p', ..., 'p', 'p', 'e'], dtype=object)print(classification_report(y_test,predictions))STDOUT
MORE
precision recall f1-score support
e 1.00 1.00 1.00 655
p 1.00 1.00 1.00 564
grid.best_estimator_.feature_importances_RESULT
MORE
array([2.91150176e-04, 1.55427847e-17, 2.67658844e-21, 0.00000000e+00,
1.11459235e-16, 1.05030313e-03, 3.26837862e-18, 9.23288948e-17,
3.33934930e-18, 0.00000000e+00, 1.27133255e-17, 0.00000000e+00,
3.56629935e-17, 2.46527883e-21, 0.00000000e+00, 5.60405971e-07,
2.31055039e-03, 5.13955090e-02, 1.84253604e-04, 1.40371481e-02,feat_import = grid.best_estimator_.feature_importances_imp_feats = pd.DataFrame(index=X.columns,data=feat_import,columns=['Importance'])imp_featsHTML
MORE
Importance
cap-shape_c
2.911502e-04
cap-shape_f
1.554278e-17imp_feats.sort_values("Importance",ascending=False)HTML
MORE
Importance
odor_n
0.614744
stalk-root_c
0.135977imp_feats.describe().transpose()HTML
MORE
count
mean
std
min
25%imp_feats = imp_feats[imp_feats['Importance'] > 0.000527]imp_feats.sort_values('Importance')HTML
MORE
Importance
population_y
0.000550
stalk-color-above-ring_w
0.000575plt.figure(figsize=(14,6),dpi=200)
sns.barplot(data=imp_feats.sort_values('Importance'),x=imp_feats.sort_values('Importance').index,y='Importance')
plt.xticks(rotation=90);PLOT
