🚀
Boosted Trees
01 Gradient Boosting
++++
Data Science
May 2026×Notebook lesson

Notebook converted from Jupyter for blog publishing.

01-Gradient-Boosting

Driptanil Datta
Driptanil DattaSoftware Developer

Gradient Boosting and GridSearch

The Data

Mushroom Hunting: Edible or Poisonous?

Data Source: https://archive.ics.uci.edu/ml/datasets/Mushroom (opens in a new tab)

This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy.

Attribute Information:

  1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
  2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
  3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y
  4. bruises?: bruises=t,no=f
  5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s
  6. gill-attachment: attached=a,descending=d,free=f,notched=n
  7. gill-spacing: close=c,crowded=w,distant=d
  8. gill-size: broad=b,narrow=n
  9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y
  10. stalk-shape: enlarging=e,tapering=t
  11. stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
  12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
  13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
  14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
  15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
  16. veil-type: partial=p,universal=u
  17. veil-color: brown=n,orange=o,white=w,yellow=y
  18. ring-number: none=n,one=o,two=t
  19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z
  20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y
  21. population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y
  22. habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d

Imports

import numpy as np
import pandas as pd
 
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("../DATA/mushrooms.csv")
df.head()
HTML
MORE
class
cap-shape
cap-surface
cap-color
bruises

Data Prep

X = df.drop('class',axis=1)
y = df['class']
X = pd.get_dummies(X,drop_first=True)
X.head()
HTML
MORE
cap-shape_c
cap-shape_f
cap-shape_k
cap-shape_s
cap-shape_x
y.head()
RESULT
MORE
0    p
1    e
2    e
3    p
4    e

Train Test Split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=101)

Gradient Boosting and Grid Search with CV

from sklearn.ensemble import GradientBoostingClassifier
help(GradientBoostingClassifier)
STDOUT
MORE
Help on class GradientBoostingClassifier in module sklearn.ensemble._gb:

class GradientBoostingClassifier(sklearn.base.ClassifierMixin, BaseGradientBoosting)
 |  GradientBoostingClassifier(*, loss='deviance', learning_rate=0.1, n_estimators=100, subsample=1.0, criterion='friedman_mse', min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort='deprecated', validation_fraction=0.1, n_iter_no_change=None, tol=0.0001, ccp_alpha=0.0)
 |  
from sklearn.model_selection import GridSearchCV
param_grid = {"n_estimators":[1,5,10,20,40,100],'max_depth':[3,4,5,6]}
gb_model = GradientBoostingClassifier()
grid = GridSearchCV(gb_model,param_grid)

Fit to Training Data with CV Search

grid.fit(X_train,y_train)
RESULT
GridSearchCV(estimator=GradientBoostingClassifier(),
             param_grid={'max_depth': [3, 4, 5, 6],
                         'n_estimators': [1, 5, 10, 20, 40, 100]})
grid.best_params_
RESULT
{'max_depth': 3, 'n_estimators': 100}

Performance

from sklearn.metrics import classification_report,plot_confusion_matrix,accuracy_score
predictions = grid.predict(X_test)
predictions
RESULT
array(['p', 'e', 'p', ..., 'p', 'p', 'e'], dtype=object)
print(classification_report(y_test,predictions))
STDOUT
MORE
              precision    recall  f1-score   support

           e       1.00      1.00      1.00       655
           p       1.00      1.00      1.00       564
grid.best_estimator_.feature_importances_
RESULT
MORE
array([2.91150176e-04, 1.55427847e-17, 2.67658844e-21, 0.00000000e+00,
       1.11459235e-16, 1.05030313e-03, 3.26837862e-18, 9.23288948e-17,
       3.33934930e-18, 0.00000000e+00, 1.27133255e-17, 0.00000000e+00,
       3.56629935e-17, 2.46527883e-21, 0.00000000e+00, 5.60405971e-07,
       2.31055039e-03, 5.13955090e-02, 1.84253604e-04, 1.40371481e-02,
feat_import = grid.best_estimator_.feature_importances_
imp_feats = pd.DataFrame(index=X.columns,data=feat_import,columns=['Importance'])
imp_feats
HTML
MORE
Importance
cap-shape_c
2.911502e-04
cap-shape_f
1.554278e-17
imp_feats.sort_values("Importance",ascending=False)
HTML
MORE
Importance
odor_n
0.614744
stalk-root_c
0.135977
imp_feats.describe().transpose()
HTML
MORE
count
mean
std
min
25%
imp_feats = imp_feats[imp_feats['Importance'] > 0.000527]
imp_feats.sort_values('Importance')
HTML
MORE
Importance
population_y
0.000550
stalk-color-above-ring_w
0.000575
plt.figure(figsize=(14,6),dpi=200)
sns.barplot(data=imp_feats.sort_values('Importance'),x=imp_feats.sort_values('Importance').index,y='Importance')
plt.xticks(rotation=90);
PLOT
Output 1
Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.