Linear Regression using Gradient Descent
By Bindeshwar Singh Kushwaha
General Linear Regression Model
We have a collection of labeled examples:
$$
\{(\mathbf{x}_i, y_i)\}_{i=1}^{N}
$$
- \( \mathbf{x}_i \) is a \( D \)-dimensional feature vector
- \( y_i \) is a real-valued target
- Each feature \( x_i^{(j)} \in \mathbb{R} \), where \( j = 1, …, D \)
- The model is: $$ f_{\mathbf{w}, b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b $$
- \( \mathbf{w} \): weights, \( b \): bias
Sample Dataset
| Company | Spending (M$) | Sales (Units) |
|---|---|---|
| 1 | 37.8 | 22.1 |
| 2 | 39.3 | 10.4 |
| 3 | 45.9 | 9.3 |
| 4 | 41.3 | 18.5 |
| 5 | 50.0 | 25.0 |
| 6 | 38.5 | 15.0 |
| 7 | 42.2 | 19.3 |
| 8 | 48.1 | 23.7 |
| 9 | 36.4 | 13.2 |
| 10 | 40.7 | 17.4 |
| 11 | 43.5 | 20.2 |
| 12 | 47.3 | 24.5 |
| 13 | 49.0 | 26.1 |
| 14 | 35.2 | 11.3 |
| 15 | 38.0 | 14.7 |
Goal: Predict Sales based on Spending.
Linear Regression Model
Model: \( f(x) = wx + b \)
Objective: Minimize MSE:
$$
l = \frac{1}{N} \sum_{i=1}^{N} (y_i – (wx_i + b))^2
$$
Gradient Descent Derivatives
Gradients:
\[ \frac{\partial l}{\partial w} = \frac{1}{N} \sum_{i=1}^{N} -2x_i(y_i – (wx_i + b)) \]
\[ \frac{\partial l}{\partial b} = \frac{1}{N} \sum_{i=1}^{N} -2(y_i – (wx_i + b)) \]
Gradient Descent Update Rule
Update equations:
\[
w \leftarrow w + \frac{2\alpha}{N} \sum_{i=1}^{N} x_i(y_i – (wx_i + b))
\]
\[
b \leftarrow b + \frac{2\alpha}{N} \sum_{i=1}^{N} (y_i – (wx_i + b))
\]
Step-by-Step Python Implementation
- Import Libraries:
numpy,matplotlib.pyplot,matplotlib.animation - Define Dataset: 15 (x, y) pairs
- Initialize Parameters: \( w = 0.0, b = 0.0, \alpha = 0.0005, \text{epochs} = 100 \)
- Training Loop:
- Predict \( \hat{y} = wx + b \)
- Compute Loss: $$ \text{MSE} = \frac{1}{N} \sum (y – \hat{y})^2 $$
- Compute Gradients:
- \( \frac{\partial L}{\partial w} = \frac{2}{N} \sum x(y – \hat{y}) \)
- \( \frac{\partial L}{\partial b} = \frac{2}{N} \sum (y – \hat{y}) \)
- Update \( w, b \)
- Set Up Plots: Left: scatter + line, Right: loss curve
- Define Animation: Update line and loss with frame
- Run Animation:
FuncAnimation()
Key Takeaways
- Gradient descent iteratively minimizes error
- Helps learn optimal parameters from data
- Animations provide intuition in training
📣 Reach PostNetwork Academy
- Website: www.postnetwork.co
- YouTube: @postnetworkacademy
- Facebook: /postnetworkacademy
- LinkedIn: PostNetwork Academy
- GitHub: /postnetworkacademy
