Text Classification with Bag of Words and Naive Bayes
Author: Bindeshwar Singh Kushwaha | PostNetwork Academy
Understanding Text with Machine Learning
- Processing and understanding text allows extraction of meaningful information from raw data.
- Text data can be structured into features that machine learning algorithms can analyze.
- Machine learning approaches include supervised, unsupervised, and deep learning techniques.
- AI models combine data and algorithms to identify patterns and generate actionable insights.
Text Classification and Categorization
- Text classification organizes documents into predefined categories automatically.
- Each document is represented by features derived from its words or phrases.
- Applications include spam filtering, sentiment analysis, and news/topic categorization.
- Effective classification relies on sufficient labeled data to train predictive models.
Supervised vs. Unsupervised
- Supervised: documents have labels (e.g., spam, non-spam).
- Unsupervised: documents grouped by similarity without labels.
- Document classification is a general problem applicable to many use cases.
Practical Example
- Preprocessing text
- Extracting features
- Training a classification model
- Evaluating performance
Concept of Text Classification
Represent documents as features (words, phrases, embeddings).
Process: Preprocessing → Feature Extraction → Classification.
Simple Flow of Text Classification
- Input Documents
- Preprocessing (Cleaning, Tokenization)
- Feature Extraction (BoW, TF-IDF, Embeddings)
- Classification Model (SVM, Naive Bayes, Neural Net)
- Predicted Category (Spam/Not Spam, etc.)
Mathematical Formulation
Suppose there are $n$ distinct words across all documents.
Each document $D$ can be represented as:
\[
D = (w_{1D}, w_{2D}, \dots, w_{nD})
\]
where $w_{iD}$ = weight of word $i$ in document $D$ (frequency, TF-IDF, etc.)
Common Feature Extraction Models
- Bag of Words (BoW)
- TF-IDF (Term Frequency–Inverse Document Frequency)
- Word2Vec, GloVe, BERT embeddings
Example Movie Reviews
Positive:
- I absolutely loved this movie. I loved the story and the characters.
- The acting was amazing and the story was touching.
- What a great experience. The visuals were memorable.
- This movie was thrilling and full of suspense.
- The cinematography and direction were excellent.
Negative:
- I hated this film, it was boring and too long.
- The plot was weak and the acting was terrible.
- Such a disappointing movie, I regret watching it.
- This was the worst movie I have seen.
- Poor storyline and bad performances throughout.
BoW Representation
| Review | loved (x1) | movie (x2) | fantastic (x3) | boring (x4) | terrible (x5) | great (x6) | excellent (x7) | worst (x8) | acting (x9) | story (x10) | Class |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | Positive |
| 6 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | Negative |
| 10 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | Negative |
Naive Bayes Classification
Naive Bayes uses Bayes’ Theorem:
\[
P(y \mid x) = \frac{P(y)\, P(x \mid y)}{P(x)}
\]
Since $P(x)$ is constant:
\[
\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y)^{x_i}
\]
Worked Example (Review 1)
Review: “I absolutely loved this movie. I loved the story and the characters.”
BoW vector: $x = [2,1,0,0,0,0,0,0,0,1]$
Positive class probability (approx):
\[
P(Positive \mid x) \propto 0.5 \cdot (0.182^2 \cdot 0.273 \cdot 0.182) \approx 0.00083
\]
Negative class probability:
\[
P(Negative \mid x) = 0
\]
Conclusion: Classified as Positive.
Video
Reach PostNetwork Academy
- Website: www.postnetwork.co
- YouTube: www.youtube.com/@postnetworkacademy
- Facebook: www.facebook.com/postnetworkacademy
- LinkedIn: www.linkedin.com/company/postnetworkacademy
- GitHub: www.github.com/postnetworkacademy