🚀
Model Deployment
00 Model Persistence
++++
Data Science
May 2026×Notebook lesson

Notebook converted from Jupyter for blog publishing.

00-Model-Persistence

Driptanil Datta
Driptanil DattaSoftware Developer

Model Persistence

Imports

import pandas as pd
from sklearn.ensemble import RandomForestRegressor

Data

df = pd.read_csv('../DATA/Advertising.csv')
df
HTML
MORE
TV
radio
newspaper
sales
0
df.describe()
HTML
MORE
TV
radio
newspaper
sales
count

Data Preparation

X = df.drop('sales',axis=1)
y = df['sales']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=101)
# HOLD OUT SET
# Further split 30% of test into validation and hold-out (15% and 15% each)
X_validation, X_holdout_test, y_validation, y_holdout_test = train_test_split(X_test, y_test, test_size=0.5, random_state=101)

Model Training

model = RandomForestRegressor(n_estimators=10,random_state=101)
model.fit(X_train,y_train)
RESULT
RandomForestRegressor(n_estimators=10, random_state=101)

Model Evaluation

validation_predictions = model.predict(X_validation)
from sklearn.metrics import mean_absolute_error,mean_squared_error
mean_absolute_error(y_validation,validation_predictions)
RESULT
0.6636666666666673
mean_squared_error(y_validation,validation_predictions)**0.5 #RMSE
RESULT
0.7831368547918899

Hyperparameter Tuning

model = RandomForestRegressor(n_estimators=35,random_state=101)
model.fit(X_train,y_train)
RESULT
RandomForestRegressor(n_estimators=35, random_state=101)
validation_predictions = model.predict(X_validation)
mean_absolute_error(y_validation,validation_predictions)
RESULT
0.6759047619047621
mean_squared_error(y_validation,validation_predictions)**0.5 #RMSE
RESULT
0.8585352183157281

Final Hold Out Test Performance for Reporting

model = RandomForestRegressor(n_estimators=35,random_state=101)
model.fit(X_train,y_train)
RESULT
RandomForestRegressor(n_estimators=35, random_state=101)
test_predictions = model.predict(X_holdout_test)
mean_absolute_error(y_holdout_test,test_predictions)
RESULT
0.5817142857142852
mean_squared_error(y_holdout_test,test_predictions)**0.5
RESULT
0.730550812603694

Full Training

final_model = RandomForestRegressor(n_estimators=35,random_state=101)
final_model.fit(X,y)
RESULT
RandomForestRegressor()

Saving Model (and anything else as pickle file)

import joblib
joblib.dump(final_model,'final_model.pkl')
RESULT
['final_model.pkl']
X.columns
RESULT
Index(['TV', 'radio', 'newspaper'], dtype='object')
list(X.columns)
RESULT
['TV', 'radio', 'newspaper']
joblib.dump(list(X.columns),'column_names.pkl')
RESULT
['column_names.pkl']

Loading Model (Model Persistence)

col_names = joblib.load('column_names.pkl')
col_names
RESULT
['TV', 'radio', 'newspaper']
loaded_model = joblib.load('final_model.pkl')
loaded_model.predict([[230.1,37.8,69.2]])
RESULT
array([21.998])

Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.