Perceptron learning from mistakes

We will study in great details a class of computations that makes decision by weighing evidence. The perceptrons would converge upon a solution if one existed, “solution” being defined as a hyperplane that linearly separates the data into two groups. The algorithm will always find a linearly separating hyperplane in finite time if one exists.

Consider a perceptron that seeks to classify someone as obese, y = +1, or not-obese, y = -1. The inputs are a person's body weight, x1, and height, x2. Let's say that the dataset contains a hundred entries, with each entry comprising a person's body weight and height and a label saying whether a doctor thinks the person is obese according to guidelines set by the National Heart, Lung, and Blood Institute. A perceptron's task is to learn the values for w1 and w2 and the value of the bias term b, such that it correctly classifies each person in the dataset as "obese" or "not-obese."

Once the perceptron has learned the correct values for w1 and w2 and the bias term, it's ready to make predictions. Given another person's body weight and height, this person was not in the original dataset. The perceptron makes one basic assumption: It assumes that there exists a clear, linear divide between the categories of people classified as obese and those classified as not-obese.

In the context of this simple example, if you were to plot the body weights and heights of people on an x y graph, with weights on the x-axis and heights on the y-axis, such that each person was a point on the graph, then the "clear divide" assumption states that there would exist a straight line separating the points representing the obese from the points representing the not-obese. If so, the dataset is said to be linearly separable.

Here's a graphical look at what happens as the perceptron learns. We start with two sets of data points, one characterized by red circles (y = +1, obese) and another by blue triangles (y = -1, not obese). Each data point is characterized by a pair of values (x1, x2), where x1 is the body weight of the person in kilograms, plotted along the x-axis, and x2 is the height in centimeters, plotted along the y-axis.

The perceptron starts with its weights, w1 and w2, and the bias initialized to zero. The weights and bias represent a line in the x y plane. The perceptron then tries to find a separating line, defined by some set of values for its weights and bias, that attempts to classify the points. The perceptron learns from its mistakes and adjusts its weights and bias. After numerous passes through the data, the perceptron eventually discovers at least one set of correct values of its weights and its bias term. It finds a line that delineates the clusters: The circles and the triangles lie on opposite sides. This is shown as a solid black line separating the coordinate space into two regions. The weights learned by the perceptron dictate the slope of the line; the bias determines the distance, or offset, of the line from the origin.

Weights: These are parameters that influence the direction and steepness (slope) of a decision boundary, which is often visualized as a line in a two-dimensional space. The weights adjust the angle of this line.
Bias: This is another parameter that shifts the line away from the origin without changing its slope. This shift helps position the line appropriately to separate different classes in the data.

Essentially, the weights control how the line tilts, and the bias moves the line closer or farther from the center point of the graph (origin) to better classify the data.

import numpy as np
import matplotlib.pyplot as plt

# Step 1: Generate synthetic data for weight and height
np.random.seed(42)
n_samples = 100

# Obese class (+1) and Not-obese class (-1)
weights_obese = np.random.uniform(85, 120, n_samples // 2)
heights_obese = np.random.uniform(150, 170, n_samples // 2)
weights_not_obese = np.random.uniform(45, 84, n_samples // 2)
heights_not_obese = np.random.uniform(170, 200, n_samples // 2)

# Combine data into X and labels into y
X_obese = np.column_stack((weights_obese, heights_obese))
X_not_obese = np.column_stack((weights_not_obese, heights_not_obese))
X = np.vstack((X_obese, X_not_obese))
y = np.hstack((np.ones(n_samples // 2), -np.ones(n_samples // 2)))  # Labels: +1 for obese, -1 for not-obese

# Step 2: Initialize perceptron parameters
w = np.zeros(2)  # Weights for x1 (weight) and x2 (height)
b = 0            # Bias term
learning_rate = 0.1
epochs = 1000    # Number of iterations

# Step 3: Perceptron training loop
for epoch in range(epochs):
    errors = 0
    for i in range(n_samples):
        # Calculate the perceptron output
        activation = np.dot(w, X[i]) + b
        prediction = 1 if activation >= 0 else -1
        
        # Update weights and bias if there's a misclassification
        if prediction != y[i]:
            w += learning_rate * y[i] * X[i]
            b += learning_rate * y[i]
            errors += 1
            
            # Print the updated weights and bias after each change
            print(f"Epoch {epoch + 1}, Sample {i + 1}: Weights = {w}, Bias = {b}")
    
    # Stop early if no errors (i.e., linearly separable)
    if errors == 0:
        print(f"Training completed after {epoch + 1} epochs with final Weights = {w}, Bias = {b}")
        break

# Step 4: Print final perceptron model equation
print(w, b)
print(f"Final Perceptron Model Equation: {w[0]:.4f} * x1 + {w[1]:.4f} * x2 + {b:.4f} = 0")

# Step 5: Plot data points and decision boundary
plt.figure(figsize=(10, 6))
plt.scatter(X_obese[:, 0], X_obese[:, 1], color="red", marker="o", label="Obese (+1)")
plt.scatter(X_not_obese[:, 0], X_not_obese[:, 1], color="blue", marker="^", label="Not-Obese (-1)")

# Define the decision boundary line
x_vals = np.linspace(40, 130, 200)
y_vals = -(w[0] / w[1]) * x_vals - (b / w[1])
plt.plot(x_vals, y_vals, color="black", linestyle="--", label="Decision Boundary")

# Add labels and legend
plt.xlabel("Body Weight (kg)")
plt.ylabel("Height (cm)")
plt.legend()
plt.title("Perceptron Classification of Obesity Based on Weight and Height")
plt.show()

Epoch 1, Sample 51: Weights = [ -4.62257382 -19.72479766], Bias = -0.1
Epoch 2, Sample 1: Weights = [ 5.18831659 -2.7856284 ], Bias = 0.0
Epoch 2, Sample 43: Weights = [13.80867642 13.73594169], Bias = 0.1
Epoch 2, Sample 51: Weights = [ 9.18610259 -5.98885596], Bias = 0.0
Epoch 3, Sample 1: Weights = [18.99699301 10.95031329], Bias = 0.1
Epoch 3, Sample 51: Weights = [14.37441918 -8.77448437], Bias = 0.0
Epoch 4, Sample 1: Weights = [24.1853096   8.16468489], Bias = 0.1
Epoch 4, Sample 51: Weights = [ 19.56273578 -11.56011277], Bias = 0.0
Epoch 5, Sample 1: Weights = [29.37362619  5.37905649], Bias = 0.1
Epoch 5, Sample 51: Weights = [ 24.75105237 -14.34574117], Bias = 0.0
...
...
...
Epoch 102, Sample 91: Weights = [ 202.48166368 -117.41360821], Bias = 0.0
Epoch 103, Sample 1: Weights = [ 212.29255409 -100.47443895], Bias = 0.1
Epoch 103, Sample 91: Weights = [ 204.03900964 -117.75374726], Bias = 0.0
Epoch 104, Sample 5: Weights = [ 213.08507488 -101.5579473 ], Bias = 0.1
Epoch 104, Sample 91: Weights = [ 204.83153043 -118.8372556 ], Bias = 0.0
Epoch 105, Sample 1: Weights = [ 214.64242085 -101.89808635], Bias = 0.1
Epoch 105, Sample 91: Weights = [ 206.3888764  -119.17739465], Bias = 0.0
Epoch 106, Sample 5: Weights = [ 215.43494164 -102.98159469], Bias = 0.1
Training completed after 107 epochs with final Weights = [ 215.43494164 -102.98159469], Bias = 0.1

Final Perceptron Model Equation: 215.4349 * x1 + -102.9816 * x2 + 0.1000 = 0

Adding a bias term to the equation is the same as moving the hyperlane away from the origin but without changing it’s orientation.

# Step 5: Prediction function
def predict(weight, height):
    """Predicts obesity status given weight (x1) and height (x2) using the trained perceptron model."""
    activation = w[0] * weight + w[1] * height + b
    return 1 if activation >= 0 else -1

# Example prediction
weight = 70  # Replace with desired weight in kg
height = 172  # Replace with desired height in cm
prediction = predict(weight, height)
status = "Obese" if prediction == 1 else "Not-Obese"
print(f"Given weight: {weight} kg, height: {height} cm -> Prediction: {status} ({'+' if prediction == 1 else '-'}1)")

Given weight: 70 kg, height: 172 cm -> Prediction: Not-Obese (-1)

From a machine learning perspective, the task of a perceptron is to learn the weight vector, give a set of input data vectors, such that the weight vector represents a hyperplane that separate the data into two clusters. Once it has learned the weight vector, and then is given a new data point to classify (obese or non-obese), the perceptron simply has to calculate the W.T*X for the new instance of data, to see if it falls on one side or the other side of the hyperplane, and then classify it accordingly.

This exercise demonstrates the perceptron's ability to classify linearly separable data, with a straightforward mathematical model and interpretable decision boundary. While effective for linearly separable cases, the perceptron would require modifications or more advanced algorithms (e.g., neural networks) for non-linearly separable data. This perceptron implementation provides a foundational understanding of linear classifiers and can be extended to other applications with linearly separable features.

Perceptron is one of the cornerstones of our eventual forays into other ML techniques, including modern deep neural networks.