Feature Scaling in Machine Learning
Author: Bindeshwar Singh Kushwaha
Postnetwork Academy
Why Feature Scaling?
- Machine learning algorithms often struggle when input features have different scales.
- Example: Total number of rooms might range from 6 to 39,320, while median incomes range from 0 to 15.
- Two common methods to scale features:
- Min-Max Scaling (Normalization)
- Standardization
Min-Max Scaling (Normalization)
- Rescales values to a fixed range, typically [0, 1].
- Formula:
\\[
x_{\text{scaled}} = \frac{x – x_{\text{min}}}{x_{\text{max}} – x_{\text{min}}}
\\]
- Implemented in Scikit-Learn using
MinMaxScaler. - You can customize the range using
feature_range.
Standardization
- Subtracts the mean and divides by the standard deviation.
- Produces features with zero mean and unit variance.
- Formula:
\\[
x_{\text{standard}} = \frac{x – \mu}{\sigma}
\\]
- Does not bound values to a fixed range.
- Less sensitive to outliers compared to Min-Max Scaling.
- Scikit-Learn provides
StandardScalerfor this method.
Comparison Example
- A district had a median income of 100 (likely an error).
- Min-Max Scaling: All values shift from [0–15] to [0–0.15], affected heavily by the outlier.
- Standardization: Less influenced by the outlier due to use of mean and standard deviation.
Impact of Scaling on Sample Data
| Feature Vector | Rooms | Income |
|---|---|---|
| Original | 1000 | 4.5 |
| Original | 2000 | 6.0 |
| Original | 3000 | 8.5 |
| Min-Max Scaled | 0.00 | 0.00 |
| Min-Max Scaled | 0.50 | 0.43 |
| Min-Max Scaled | 1.00 | 1.00 |
| Standardized | -1.22 | -1.13 |
| Standardized | 0.00 | -0.23 |
| Standardized | 1.22 | 1.36 |
- Min-Max bounds values between 0 and 1 but is affected by scale difference.
- Standardization centers around 0 and better handles variance and outliers.
Python Code Example: Feature Scaling with Scikit-Learn
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import numpy as np
# Sample dataset
X = np.array([[1000, 4.5],
[2000, 6.0],
[3000, 8.5]])
# Min-Max Scaling
minmax_scaler = MinMaxScaler()
X_minmax = minmax_scaler.fit_transform(X)
# Standardization
standard_scaler = StandardScaler()
X_standard = standard_scaler.fit_transform(X)
print("Min-Max Scaled:\n", X_minmax)
print("Standardized:\n", X_standard)
Video
Connect with PostNetwork Academy
- Website: www.postnetwork.co
- YouTube: youtube.com/@postnetworkacademy
- Facebook: facebook.com/postnetworkacademy
- LinkedIn: linkedin.com/company/postnetworkacademy
- GitHub: github.com/postnetworkacademy
