++++Data Science
May 2026×Notebook lesson
Notebook converted from Jupyter for blog publishing.
00-Model-Persistence
Driptanil DattaSoftware Developer
Model Persistence
Imports
import pandas as pd
from sklearn.ensemble import RandomForestRegressorData
df = pd.read_csv('../DATA/Advertising.csv')dfHTML
MORE
TV
radio
newspaper
sales
0df.describe()HTML
MORE
TV
radio
newspaper
sales
countData Preparation
X = df.drop('sales',axis=1)
y = df['sales']from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=101)# HOLD OUT SET# Further split 30% of test into validation and hold-out (15% and 15% each)
X_validation, X_holdout_test, y_validation, y_holdout_test = train_test_split(X_test, y_test, test_size=0.5, random_state=101)Model Training
model = RandomForestRegressor(n_estimators=10,random_state=101)model.fit(X_train,y_train)RESULT
RandomForestRegressor(n_estimators=10, random_state=101)Model Evaluation
validation_predictions = model.predict(X_validation)from sklearn.metrics import mean_absolute_error,mean_squared_errormean_absolute_error(y_validation,validation_predictions)RESULT
0.6636666666666673mean_squared_error(y_validation,validation_predictions)**0.5 #RMSERESULT
0.7831368547918899Hyperparameter Tuning
model = RandomForestRegressor(n_estimators=35,random_state=101)
model.fit(X_train,y_train)RESULT
RandomForestRegressor(n_estimators=35, random_state=101)validation_predictions = model.predict(X_validation)mean_absolute_error(y_validation,validation_predictions)RESULT
0.6759047619047621mean_squared_error(y_validation,validation_predictions)**0.5 #RMSERESULT
0.8585352183157281Final Hold Out Test Performance for Reporting
model = RandomForestRegressor(n_estimators=35,random_state=101)
model.fit(X_train,y_train)RESULT
RandomForestRegressor(n_estimators=35, random_state=101)test_predictions = model.predict(X_holdout_test)mean_absolute_error(y_holdout_test,test_predictions)RESULT
0.5817142857142852mean_squared_error(y_holdout_test,test_predictions)**0.5RESULT
0.730550812603694Full Training
final_model = RandomForestRegressor(n_estimators=35,random_state=101)final_model.fit(X,y)RESULT
RandomForestRegressor()Saving Model (and anything else as pickle file)
import joblibjoblib.dump(final_model,'final_model.pkl')RESULT
['final_model.pkl']X.columnsRESULT
Index(['TV', 'radio', 'newspaper'], dtype='object')list(X.columns)RESULT
['TV', 'radio', 'newspaper']joblib.dump(list(X.columns),'column_names.pkl')RESULT
['column_names.pkl']Loading Model (Model Persistence)
col_names = joblib.load('column_names.pkl')col_namesRESULT
['TV', 'radio', 'newspaper']loaded_model = joblib.load('final_model.pkl')loaded_model.predict([[230.1,37.8,69.2]])RESULT
array([21.998])