Multiple Linear Regression (MLR) is a statistical technique used to understand and predict the relationship between one dependent variable and multiple independent variables. It extends the simple linear regression model, which relates only one independent variable to the dependent variable.
Let’s consider a dataset with the following features:
Using this data, we aim to predict the price of a house using the MLR model.
Here is the Python code to prepare the data:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
# Simulating a synthetic dataset
np.random.seed(42)
n_samples = 1000
size_in_feet_square = np.random.randint(500, 5000, n_samples)
number_of_bedrooms = np.random.randint(1, 7, n_samples)
number_of_floors = np.random.randint(1, 4, n_samples)
age_of_home = np.random.randint(0, 100, n_samples)
# Calculating price with noise
base_price = (size_in_feet_square * 150) + (number_of_bedrooms * 5000) + (number_of_floors * 20000)
age_discount = age_of_home * 200
price_of_house = base_price - age_discount + np.random.normal(0, 20000, n_samples)
price_of_house = np.clip(price_of_house, 5000, None)
# Creating a DataFrame
data = pd.DataFrame({
"Size in feet square": size_in_feet_square,
"Number of bedrooms": number_of_bedrooms,
"Number of floors": number_of_floors,
"Age of home in years": age_of_home,
"Price of the house": price_of_house.round(2)
})
# Splitting the dataset
X = data[["Size in feet square", "Number of bedrooms", "Number of floors", "Age of home in years"]]
y = data["Price of the house"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
data.head(15)
Train the Multiple Linear Regression model as follows:
from sklearn.linear_model import LinearRegression
# Training the model
model = LinearRegression()
model.fit(X_train, y_train)
# Extracting coefficients and intercept
intercept = model.intercept_
coefficients = model.coef_
print("Intercept:", intercept)
print("Coefficients:", coefficients)
Intercept: 4615.359586907318
Coefficients: [ 148.97954899 5531.03099493 18492.58738415 -212.33829062]
The resulting equation might look like this:
Interpretation: