Multiple Linear Regression (MLR) is a statistical technique used to understand and predict the relationship between one dependent variable and multiple independent variables. It extends the simple linear regression model, which relates only one independent variable to the dependent variable.

Example: Predicting House Prices

Let’s consider a dataset with the following features:

Using this data, we aim to predict the price of a house using the MLR model.

Step 1: Data Preparation

Here is the Python code to prepare the data:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

# Simulating a synthetic dataset
np.random.seed(42)
n_samples = 1000
size_in_feet_square = np.random.randint(500, 5000, n_samples)
number_of_bedrooms = np.random.randint(1, 7, n_samples)
number_of_floors = np.random.randint(1, 4, n_samples)
age_of_home = np.random.randint(0, 100, n_samples)

# Calculating price with noise
base_price = (size_in_feet_square * 150) + (number_of_bedrooms * 5000) + (number_of_floors * 20000)
age_discount = age_of_home * 200
price_of_house = base_price - age_discount + np.random.normal(0, 20000, n_samples)
price_of_house = np.clip(price_of_house, 5000, None)

# Creating a DataFrame
data = pd.DataFrame({
    "Size in feet square": size_in_feet_square,
    "Number of bedrooms": number_of_bedrooms,
    "Number of floors": number_of_floors,
    "Age of home in years": age_of_home,
    "Price of the house": price_of_house.round(2)
})

# Splitting the dataset
X = data[["Size in feet square", "Number of bedrooms", "Number of floors", "Age of home in years"]]
y = data["Price of the house"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
data.head(15)

Screenshot 2025-01-06 at 9.34.03 AM.png

Step 2: Fitting the Model

Train the Multiple Linear Regression model as follows:

from sklearn.linear_model import LinearRegression

# Training the model
model = LinearRegression()
model.fit(X_train, y_train)

# Extracting coefficients and intercept
intercept = model.intercept_
coefficients = model.coef_

print("Intercept:", intercept)
print("Coefficients:", coefficients)
Intercept: 4615.359586907318
Coefficients: [  148.97954899  5531.03099493 18492.58738415  -212.33829062]

Step 3: Model Equation

The resulting equation might look like this:

Interpretation: