๐ Label Encoding and One-Hot Encoding
Author: Bindeshwar Singh Kushwaha
๐ฏ Encoding Categorical Features
๐น Label Encoding
- Assigns each category an integer value
- Suitable for ordinal data (e.g., size: small, medium, large)
- Tool:
LabelEncoderfromsklearn.preprocessing - Example (Titanic): Encoding
Sexas 0 (male), 1 (female)
๐น One-Hot Encoding
- Converts categories into binary columns (one per category)
- Suitable for nominal data (no natural order)
- Tools:
OneHotEncoder,pd.get_dummies() - Example (Mushroom dataset): Encoding
cap-shapeinto binary columns
๐ข Titanic Dataset Overview
- Source: Kaggleโs โTitanic: Machine Learning from Disasterโ
- Objective: Predict survival of passengers
- Target:
Survived: 0 = Did not survive, 1 = Survived - Key Features:
Pclass,Sex,Age,Fare,Embarked, etc.
๐ข Label Encoding in Titanic
- Assigns each category an integer
- Example (Ordinal):
small โ 0,medium โ 1,large โ 2 - Tool:
LabelEncoderfromsklearn - Example on Titanic:
- Original:
['male', 'female', 'male'] - Encoded:
[0, 1, 0] - Mapping:
male โ 0,female โ 1
- Original:
๐ Implementation (Titanic)
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.read_csv("titanic.csv")
le = LabelEncoder()
df['Sex_encoded'] = le.fit_transform(df['Sex'])
print(df['Sex'].unique()) # ['male', 'female']
print(df['Sex_encoded'].unique()) # [1, 0]
print(le.classes_) # ['female', 'male']
๐ Mushroom Dataset Overview
- Source: UCI ML Repository
- Target:
class:e= edible,p= poisonous - All Features: Categorical (e.g.,
cap-shape,odor, etc.)
Example: One-Hot Encoding cap-shape
- Categories:
b,c,x,f,k,s - After Encoding: binary vectors like:
x โ [0, 0, 1, 0, 0, 0]f โ [0, 0, 0, 1, 0, 0]
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
df = pd.read_csv("mushrooms.csv")
pd.get_dummies(df['cap-shape'], prefix='cap-shape')
๐ข Reach PostNetwork Academy
- Website: www.postnetwork.co
- YouTube: PostNetwork Academy
- Facebook: PostNetwork Academy
- LinkedIn: PostNetwork Academy
- GitHub: PostNetwork Academy
