Feature Scaling in Machine Learning : Preprocessing Technique

Feature Scaling in Machine Learning

Author: Bindeshwar Singh Kushwaha
Postnetwork Academy

Why Feature Scaling?

  • Machine learning algorithms often struggle when input features have different scales.
  • Example: Total number of rooms might range from 6 to 39,320, while median incomes range from 0 to 15.
  • Two common methods to scale features:
    • Min-Max Scaling (Normalization)
    • Standardization

Min-Max Scaling (Normalization)

  • Rescales values to a fixed range, typically [0, 1].
  • Formula:

\\[
x_{\text{scaled}} = \frac{x – x_{\text{min}}}{x_{\text{max}} – x_{\text{min}}}
\\]

  • Implemented in Scikit-Learn using MinMaxScaler.
  • You can customize the range using feature_range.

Standardization

  • Subtracts the mean and divides by the standard deviation.
  • Produces features with zero mean and unit variance.
  • Formula:

\\[
x_{\text{standard}} = \frac{x – \mu}{\sigma}
\\]

  • Does not bound values to a fixed range.
  • Less sensitive to outliers compared to Min-Max Scaling.
  • Scikit-Learn provides StandardScaler for this method.

Comparison Example

  • A district had a median income of 100 (likely an error).
  • Min-Max Scaling: All values shift from [0–15] to [0–0.15], affected heavily by the outlier.
  • Standardization: Less influenced by the outlier due to use of mean and standard deviation.

Impact of Scaling on Sample Data

Feature Vector Rooms Income
Original 1000 4.5
Original 2000 6.0
Original 3000 8.5
Min-Max Scaled 0.00 0.00
Min-Max Scaled 0.50 0.43
Min-Max Scaled 1.00 1.00
Standardized -1.22 -1.13
Standardized 0.00 -0.23
Standardized 1.22 1.36
  • Min-Max bounds values between 0 and 1 but is affected by scale difference.
  • Standardization centers around 0 and better handles variance and outliers.

Python Code Example: Feature Scaling with Scikit-Learn

from sklearn.preprocessing import MinMaxScaler, StandardScaler
import numpy as np

# Sample dataset
X = np.array([[1000, 4.5],
              [2000, 6.0],
              [3000, 8.5]])

# Min-Max Scaling
minmax_scaler = MinMaxScaler()
X_minmax = minmax_scaler.fit_transform(X)

# Standardization
standard_scaler = StandardScaler()
X_standard = standard_scaler.fit_transform(X)

print("Min-Max Scaled:\n", X_minmax)
print("Standardized:\n", X_standard)

PDF

featurscaling

Video

Connect with PostNetwork Academy

Thank You!

©Postnetwork-All rights reserved.