📝 NLP Text Summarization System
An end-to-end NLP web application that generates concise summaries from long documents using Extractive and Abstractive techniques.
Feature
Description
✅ Extractive Summarization
Selects key sentences using TF-based frequency scoring
✅ Abstractive Summarization
Generates new sentences using DistilBART transformer
✅ Bullet Points Format
Clean bullet-point output
✅ ROUGE Evaluation
ROUGE-1, ROUGE-2, ROUGE-L metrics
✅ Preprocessing Pipeline
Tokenization, cleaning, stop-word removal
✅ File Upload
PDF, DOCX, TXT support
✅ Download Summary
Export as TXT or PDF
✅ Domain Context
General, News, Document, Meeting
✅ Modern UI
Clean light theme with real-time feedback
Layer
Technology
Backend
Python, Flask
NLP
NLTK, Transformers (DistilBART)
Evaluation
ROUGE Score
File Processing
PyPDF2, python-docx
Frontend
HTML, CSS, JavaScript
Export
FPDF
Text-Summarization/
│
├── app.py # Flask app and API routes
├── requirements.txt # Python dependencies
├── README.md
│
├── utils/
│ ├── __init__.py
│ ├── summarizer.py # Extractive & Abstractive logic
│ └── text_processing.py # Preprocessing pipeline
│
├── static/
│ ├── style.css
│ └── script.js
│
└── templates/
└── index.html
git clone https://github.com/Purushothamreddy6749/Text-Summarization.git
cd Text-Summarization
2. Create and activate virtual environment
# Create
python -m venv venv
# Windows
venv\S cripts\a ctivate
# Mac/Linux
source venv/bin/activate
pip install -r requirements.txt
python -c " import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('punkt_tab')"
Input Text
│
▼
┌─────────────────────────┐
│ Preprocessing Pipeline │
│ • Text Cleaning │
│ • Sentence Tokenization │
│ • Word Tokenization │
│ • Stop-word Removal │
│ • Vocabulary Analysis │
└─────────────────────────┘
│
▼
┌─────────────────────────┐
│ Summarization │
│ • Extractive → TF Score │
│ • Abstractive → BART │
└─────────────────────────┘
│
▼
┌─────────────────────────┐
│ ROUGE Evaluation │
│ • ROUGE-1 (Content) │
│ • ROUGE-2 (Fluency) │
│ • ROUGE-L (Structure) │
└─────────────────────────┘
│
▼
Output (Paragraph / Bullet Points)
📈 ROUGE Metrics Explained
Metric
Measures
Description
ROUGE-1
Unigram overlap
Content coverage
ROUGE-2
Bigram overlap
Fluency
ROUGE-L
Longest common subsequence
Structural accuracy
R. Purushotham Reddy
This project is licensed under the MIT License .