Handling Missing Data and Categorical Features

Contents hide

1 Handling Missing Data and Categorical Features

1.1 Data Preprocessing Flow

1.2 Overview of Data Preprocessing

1.3 Step 1: Load the Titanic Dataset

1.4 Step 2: View the First Few Rows

1.5 Step 3: Checking for Missing Values

1.6 Step 4: Handle Missing Values

1.7 Step 5: Verify Missing Values

1.8 Step 6: Data Overview

1.9 Step 7: Encoding Categorical Data

1.10 Step 8: Save the Cleaned Dataset

1.11 Python Libraries Used

1.12 PDF

2 Video

2.1 Reach PostNetwork Academy

3 Thank You

4 See Also:

Handling Missing Data and Categorical Features

By: Bindeshwar Singh Kushwaha

Data Preprocessing Flow

Raw Data → Handle Missing Values → Encode Categorical Variables → Feature Scaling → Preprocessed Data

Overview of Data Preprocessing

Load Titanic dataset from CSV file
Handle missing values using various techniques
Encode categorical data for machine learning
Save the cleaned dataset to a new CSV file

Step 1: Load the Titanic Dataset

import pandas as pd
df = pd.read_csv('titanic.csv')

Step 2: View the First Few Rows

print(df.head())

Use df.head() to preview the dataset structure.

Step 3: Checking for Missing Values

print(df.isnull().sum())

This helps identify missing data in columns.

Step 4: Handle Missing Values

df['Age'].fillna(df['Age'].median(), inplace=True)
df.dropna(subset=['Embarked'], inplace=True)

Fill missing Age values with median, drop rows with missing ‘Embarked’.

Step 5: Verify Missing Values

print(df.isnull().sum())

Ensure no missing values remain.

Step 6: Data Overview

print(df.describe())

Summary statistics for numerical columns.

Step 7: Encoding Categorical Data

df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})

Convert gender into numerical form for model compatibility.

Step 8: Save the Cleaned Dataset

df.to_csv('cleaned_titanic.csv', index=False)

Store the cleaned data for future use.

Python Libraries Used

pandas – Data manipulation
numpy – Numerical operations
scikit-learn – Machine learning preprocessing tools

PDF

PreprocessingML-2

Video

Reach PostNetwork Academy

Website: www.postnetwork.co
YouTube: youtube.com/@postnetworkacademy
Facebook: facebook.com/postnetworkacademy
LinkedIn: linkedin.com/company/postnetworkacademy
GitHub: github.com/postnetworkacademy

Handling Missing Data and Categorical Features

Data Preprocessing Flow

Overview of Data Preprocessing

Step 1: Load the Titanic Dataset

Step 2: View the First Few Rows

Step 3: Checking for Missing Values

Step 4: Handle Missing Values

Step 5: Verify Missing Values

Step 6: Data Overview

Step 7: Encoding Categorical Data

Step 8: Save the Cleaned Dataset

Python Libraries Used

PDF

Video

Reach PostNetwork Academy

Thank You

See Also:

©Postnetwork-All rights reserved.