When working with large datasets, the efficiency of your computations becomes crucial. This blog explores the performance differences between using Python for loops and NumPy's vectorized operations to compute a simple linear function: y = np.dot(w, x) + b
. We'll walk through an example that measures the running time of both approaches and highlights why vectorization is a key optimization technique.
Below is a Python script that demonstrates the difference in running time between a for loop and vectorized operations in NumPy:
import numpy as np
import time
# Define the function
def linear_function(w, x, b):
return w * x + b
# Generate synthetic dataset
np.random.seed(42)
n_samples = 10000000 # Number of samples
x = np.random.rand(n_samples)
w = 2.5
b = 1.0
# Perform the computation using a for loop
start_time = time.time()
y_loop = []
for i in range(n_samples):
y_loop.append(linear_function(w, x[i], b))
end_time = time.time()
loop_time = end_time - start_time
print(f"Time taken using for loop: {loop_time:.6f} seconds")
# Perform the computation using vectorization in NumPy
start_time = time.time()
y_vectorized = np.dot(w, x) + b
end_time = time.time()
vectorized_time = end_time - start_time
print(f"Time taken using vectorization: {vectorized_time:.6f} seconds")
# Compare results
assert np.allclose(y_loop, y_vectorized), "Results do not match!"
print("Results match!")
x
) is generated using np.random.rand()
.w
(weight) and b
(bias) are defined for the linear function.x
is passed through the linear_function
using a loop.time.time()
.y = np.dot(w, x) + b
is performed in a single line using NumPy.np.allclose
.