Logistic Regression using Scikit-Learn

Logistic regression is a fundamental algorithm in machine learning, often used for binary classification problems. It's a linear model, meaning it finds a linear decision boundary to separate data points belonging to different classes. While the math behind logistic regression is important, visualizing the decision boundary can provide valuable insights into how the model works. This blog post will demonstrate how to visualize the decision boundary of a logistic regression model using Python, NumPy, and Matplotlib.

The Dataset:

We'll work with a simple 2D dataset consisting of two classes. Here's the data, represented using NumPy arrays:

import numpy as np

X = np.array([[0.5, 1.5], [1, 1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1])

X contains the features (two in this case), and y contains the corresponding class labels (0 or 1). Let's visualize this data:

import matplotlib.pyplot as plt

plt.scatter(X[y == 0, 0], X[y == 0, 1], label='Class 0')
plt.scatter(X[y == 1, 0], X[y == 1, 1], label='Class 1')
plt.xlabel('X1')  # More descriptive feature names
plt.ylabel('X2')
plt.legend()
plt.xlim(0, 3.5)
plt.ylim(0, 3.5)
plt.title('Data Points')
plt.show()

Screenshot 2025-02-24 at 9.30.26 AM.png

Training the Logistic Regression Model:

Now, let's train a logistic regression model using scikit-learn:

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X, y)

y_pred = model.predict(X)
print("Predictions on training set:", y_pred)
print("Accuracy on training set:", model.score(X, y))

Prediction on training set: [0 0 0 1 1 1]
Accuracy on training set: 1.0

This code snippet creates a LogisticRegression object, trains it on our data using the fit method, and then makes predictions on the training set. We also print the accuracy of the model.

Visualizing the Decision Boundary:

The key part is visualizing the decision boundary. We'll create a meshgrid of points and use the trained model to predict the class for each point in the grid. This will allow us to create a contour plot that represents the decision boundary.

# Create a meshgrid to plot the decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                     np.arange(y_min, y_max, 0.01))

# Get predictions for the meshgrid points
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the decision boundary and data points
plt.contourf(xx, yy, Z, alpha=0.4)  # Filled contours for decision regions
plt.scatter(X[y == 0, 0], X[y == 0, 1], label='Class 0')
plt.scatter(X[y == 1, 0], X[y == 1, 1], label='Class 1')
plt.xlabel('X2')
plt.ylabel('X2')
plt.legend()
plt.xlim(0, 3.5)
plt.ylim(0, 3.5) 
plt.title('Logistic Regression Decision Boundary')
plt.show()

Screenshot 2025-02-24 at 9.31.34 AM.png

Practice Quiz:

Screenshot 2025-02-24 at 9.41.21 AM.png