π Label Encoding and One-Hot Encoding
Author: Bindeshwar Singh Kushwaha
π― Encoding Categorical Features
πΉ Label Encoding
- Assigns each category an integer value
- Suitable for ordinal data (e.g., size: small, medium, large)
- Tool:
LabelEncoderfromsklearn.preprocessing - Example (Titanic): Encoding
Sexas 0 (male), 1 (female)
πΉ One-Hot Encoding
- Converts categories into binary columns (one per category)
- Suitable for nominal data (no natural order)
- Tools:
OneHotEncoder,pd.get_dummies() - Example (Mushroom dataset): Encoding
cap-shapeinto binary columns
π’ Titanic Dataset Overview
- Source: Kaggleβs βTitanic: Machine Learning from Disasterβ
- Objective: Predict survival of passengers
- Target:
Survived: 0 = Did not survive, 1 = Survived - Key Features:
Pclass,Sex,Age,Fare,Embarked, etc.
π’ Label Encoding in Titanic
- Assigns each category an integer
- Example (Ordinal):
small β 0,medium β 1,large β 2 - Tool:
LabelEncoderfromsklearn - Example on Titanic:
- Original:
['male', 'female', 'male'] - Encoded:
[0, 1, 0] - Mapping:
male β 0,female β 1
- Original:
π Implementation (Titanic)
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.read_csv("titanic.csv")
le = LabelEncoder()
df['Sex_encoded'] = le.fit_transform(df['Sex'])
print(df['Sex'].unique()) # ['male', 'female']
print(df['Sex_encoded'].unique()) # [1, 0]
print(le.classes_) # ['female', 'male']
π Mushroom Dataset Overview
- Source: UCI ML Repository
- Target:
class:e= edible,p= poisonous - All Features: Categorical (e.g.,
cap-shape,odor, etc.)
Example: One-Hot Encoding cap-shape
- Categories:
b,c,x,f,k,s - After Encoding: binary vectors like:
x β [0, 0, 1, 0, 0, 0]f β [0, 0, 0, 1, 0, 0]
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
df = pd.read_csv("mushrooms.csv")
pd.get_dummies(df['cap-shape'], prefix='cap-shape')
π’ Reach PostNetwork Academy
- Website: www.postnetwork.co
- YouTube: PostNetwork Academy
- Facebook: PostNetwork Academy
- LinkedIn: PostNetwork Academy
- GitHub: PostNetwork Academy
