🚀
Logistic Regression Models
00 Logistic Regression
++++
Data Science
May 2026×Notebook lesson

Notebook converted from Jupyter for blog publishing.

00-Logistic-Regression

Driptanil Datta
Driptanil DattaSoftware Developer

Logistic Regression

Imports

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Data

An experiment was conducted on 5000 participants to study the effects of age and physical health on hearing loss, specifically the ability to hear high pitched tones. This data displays the result of the study in which participants were evaluated and scored for physical ability and then had to take an audio test (pass/no pass) which evaluated their ability to hear high frequencies. The age of the user was also noted. Is it possible to build a model that would predict someone's liklihood to hear the high frequency sound based solely on their features (age and physical score)?

  • Features

    • age - Age of participant in years
    • physical_score - Score achieved during physical exam
  • Label/Target

    • test_result - 0 if no pass, 1 if test passed
df = pd.read_csv('../DATA/hearing_test.csv')
df.head()
HTML
MORE
age
physical_score
test_result
0
33.0

Exploratory Data Analysis and Visualization

Feel free to explore the data further on your own.

df.info()
STDOUT
MORE
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
df.describe()
HTML
MORE
age
physical_score
test_result
count
5000.000000
df['test_result'].value_counts()
RESULT
1    3000
0    2000
Name: test_result, dtype: int64
sns.countplot(data=df,x='test_result')
RESULT
<AxesSubplot:xlabel='test_result', ylabel='count'>
PLOT
Output 1
sns.boxplot(x='test_result',y='age',data=df)
RESULT
<AxesSubplot:xlabel='test_result', ylabel='age'>
PLOT
Output 2
sns.boxplot(x='test_result',y='physical_score',data=df)
RESULT
<AxesSubplot:xlabel='test_result', ylabel='physical_score'>
PLOT
Output 3
sns.scatterplot(x='age',y='physical_score',data=df,hue='test_result')
RESULT
<AxesSubplot:xlabel='age', ylabel='physical_score'>
PLOT
Output 4
sns.pairplot(df,hue='test_result')
RESULT
<seaborn.axisgrid.PairGrid at 0x19ceae2fd08>
PLOT
Output 5
sns.heatmap(df.corr(),annot=True)
RESULT
<AxesSubplot:>
PLOT
Output 6
sns.scatterplot(x='physical_score',y='test_result',data=df)
RESULT
<AxesSubplot:xlabel='physical_score', ylabel='test_result'>
PLOT
Output 7
sns.scatterplot(x='age',y='test_result',data=df)
RESULT
<AxesSubplot:xlabel='age', ylabel='test_result'>
PLOT
Output 8

Easily discover new plot types with a google search! Searching for "3d matplotlib scatter plot" quickly takes you to: https://matplotlib.org/3.1.1/gallery/mplot3d/scatter3d.html (opens in a new tab)

from mpl_toolkits.mplot3d import Axes3D 
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['age'],df['physical_score'],df['test_result'],c=df['test_result'])
RESULT
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x19ceaf878c8>
PLOT
Output 9

Train | Test Split and Scaling

X = df.drop('test_result',axis=1)
y = df['test_result']
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)
scaler = StandardScaler()
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)

Logistic Regression Model

from sklearn.linear_model import LogisticRegression
# help(LogisticRegression)
# help(LogisticRegressionCV)
log_model = LogisticRegression()
log_model.fit(scaled_X_train,y_train)
RESULT
LogisticRegression()

Coefficient Interpretation

Things to remember:

  • These coeffecients relate to the odds and can not be directly interpreted as in linear regression.
  • We trained on a scaled version of the data
  • It is much easier to understand and interpret the relationship between the coefficients than it is to interpret the coefficients relationship with the probability of the target/label class.

Make sure to watch the video explanation, also check out the links below:

The odds ratio

For a continuous independent variable the odds ratio can be defined as:

This exponential relationship provides an interpretation for β1\beta _{1}

The odds multiply by e1β{e^\beta _{1}} for every 1-unit increase in x.

log_model.coef_
RESULT
array([[-0.94953524,  3.45991194]])

This means:

  • We can expect the odds of passing the test to decrease (the original coeff was negative) per unit increase of the age.
  • We can expect the odds of passing the test to increase (the original coeff was positive) per unit increase of the physical score.
  • Based on the ratios with each other, the physical_score indicator is a stronger predictor than age.

Model Performance on Classification Tasks

from sklearn.metrics import accuracy_score,confusion_matrix,classification_report,plot_confusion_matrix
y_pred = log_model.predict(scaled_X_test)
accuracy_score(y_test,y_pred)
RESULT
0.93
confusion_matrix(y_test,y_pred)
RESULT
array([[172,  21],
       [ 14, 293]], dtype=int64)
plot_confusion_matrix(log_model,scaled_X_test,y_test)
RESULT
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x19ceb65e588>
PLOT
Output 10
# Scaled so highest value=1
plot_confusion_matrix(log_model,scaled_X_test,y_test,normalize='true')
RESULT
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x19ceb691b88>
PLOT
Output 11
print(classification_report(y_test,y_pred))
STDOUT
MORE
              precision    recall  f1-score   support

           0       0.92      0.89      0.91       193
           1       0.93      0.95      0.94       307
X_train.iloc[0]
RESULT
age               32.0
physical_score    43.0
Name: 141, dtype: float64
y_train.iloc[0]
RESULT
1
# 0% probability of 0 class
# 100% probability of 1 class
log_model.predict_proba(X_train.iloc[0].values.reshape(1, -1))
RESULT
array([[0., 1.]])
log_model.predict(X_train.iloc[0].values.reshape(1, -1))
RESULT
array([1], dtype=int64)

Evaluating Curves and AUC

Make sure to watch the video on this!

from sklearn.metrics import precision_recall_curve,plot_precision_recall_curve,plot_roc_curve
plot_precision_recall_curve(log_model,scaled_X_test,y_test)
RESULT
<sklearn.metrics._plot.precision_recall_curve.PrecisionRecallDisplay at 0x19cec76dac8>
PLOT
Output 12
plot_roc_curve(log_model,scaled_X_test,y_test)
RESULT
<sklearn.metrics._plot.roc_curve.RocCurveDisplay at 0x19ceb5c4288>
PLOT
Output 13


Drip

Driptanil Datta

Software Developer

Building full-stack systems, one commit at a time. This blog is a centralized learning archive for developers.

Legal Notes
Disclaimer

The content provided on this blog is for educational and informational purposes only. While I strive for accuracy, all information is provided "as is" without any warranties of completeness, reliability, or accuracy. Any action you take upon the information found on this website is strictly at your own risk.

Copyright & IP

Certain technical content, interview questions, and datasets are curated from external educational sources to provide a centralized learning resource. Respect for original authorship is maintained; no copyright infringement is intended. All trademarks, logos, and brand names are the property of their respective owners.

System Operational

© 2026 Driptanil Datta. All rights reserved.