An intelligent chatbot system that extracts YouTube video transcripts and answers questions about them using RAG (Retrieval-Augmented Generation) and embeddings.
✨ Core Features:
- 🎬 Extract transcripts from YouTube videos
- 🔍 Semantic search using embeddings
- 🤖 AI-powered question answering
- 💾 Session-based video storage
- 🌐 RESTful FastAPI backend
- 🔌 Chrome extension frontend
- ⚡ Real-time processing
youtube_chatbot/
├── backend/
│ ├── app.py # FastAPI server & endpoints
│ ├── video_processor.py # YouTube transcript extraction
│ ├── rag_pipeline.py # RAG pipeline & embeddings
│ ├── models.py # Pydantic request/response models
│ ├── config.py # Configuration & settings
│ ├── requirements.txt # Python dependencies
│ └── __pycache__/
├── chrome-extension/
│ ├── manifest.json # Extension metadata
│ ├── popup.html # UI
│ ├── popup.js # JavaScript logic
│ └── styles.css # Styling
└── README.md # This file
- Python 3.10+
- pip or conda
- Virtual environment (recommended)
- Navigate to project directory:
cd e:\project\youtube_chatbot- Create and activate virtual environment:
# Windows
python -m venv .venv
.venv\Scripts\activate
# macOS/Linux
python3 -m venv .venv
source .venv/bin/activate- Install dependencies:
pip install -r backend/requirements.txt- Run the backend server:
python backend/app.pyThe server will start on http://localhost:8000
- Open Chrome and go to
chrome://extensions/ - Enable "Developer mode" (top-right toggle)
- Click "Load unpacked"
- Select the
chrome-extension/folder - The extension will appear in your toolbar
- Start the backend:
cd e:\project\youtube_chatbot
.venv\Scripts\activate
python backend/app.py-
Open interactive API docs:
- Navigate to
http://localhost:8000/docs - This provides an interactive Swagger UI for testing
- Navigate to
-
Process a video:
- Click
/process-videoendpoint - Enter a YouTube URL:
https://www.youtube.com/watch?v=VIDEO_ID - Execute and save the
session_id
- Click
-
Ask questions:
- Click
/queryendpoint - Paste the
session_id - Enter your question
- Get instant answers!
- Click
The extension popup displaying video processing and Q&A interface:

Successful video processing with chunks created:

Sending a question to the chatbot:

AI-generated answer with sources and confidence score:

Formatted response showing the generated answer:

GET /health
Check if the system is ready.
Response:
{
"status": "healthy",
"timestamp": "2026-01-17T12:00:00",
"models_loaded": true,
"active_sessions": 1
}POST /process-video
Extract and index a YouTube video.
Request:
{
"video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}Response:
{
"success": true,
"session_id": "session_abc123xyz",
"message": "Video processed successfully",
"video_title": "Video dQw4w9WgXcQ",
"transcript_length": 2089,
"chunks_created": 5,
"processing_time": 3.45
}POST /query
Ask a question about a processed video.
Request:
{
"session_id": "session_abc123xyz",
"query": "What is the main topic?"
}Response:
{
"success": true,
"answer": "Based on the video, the main topic is...",
"sources": ["chunk_0", "chunk_2"],
"confidence": 0.92,
"message": "Answer generated"
}# Health check
curl http://localhost:8000/health
# Process video
curl -X POST http://localhost:8000/process-video \
-H "Content-Type: application/json" \
-d '{"video_url":"https://www.youtube.com/watch?v=jNQXAC9IVRw"}'
# Query
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"session_id":"your_session_id","query":"Your question here"}'import requests
BASE_URL = "http://localhost:8000"
# Process video
response = requests.post(
f"{BASE_URL}/process-video",
json={"video_url": "https://www.youtube.com/watch?v=jNQXAC9IVRw"}
)
session_id = response.json()["session_id"]
# Query
response = requests.post(
f"{BASE_URL}/query",
json={"session_id": session_id, "query": "What's the main topic?"}
)
answer = response.json()["answer"]
print(answer)User Input (Video URL)
↓
VideoProcessor (Extract Transcript)
↓
RAG Pipeline:
├─ Text Splitter (Create chunks)
├─ Embedding Model (sentence-transformers)
├─ Vector Store (FAISS)
└─ LLM (Generate answers)
↓
AI-Generated Answer
-
Video Processing:
- Extract video ID from YouTube URL
- Download transcript using
youtube-transcript-api - Split into meaningful chunks (recursive character splitter)
-
Indexing:
- Create embeddings for each chunk using sentence-transformers
- Store embeddings in FAISS vector database
- Create session for future queries
-
Query Processing:
- Embed user question
- Search FAISS for most relevant chunks
- Send chunks + question to LLM
- Generate contextual answer
- FastAPI - Web framework
- Uvicorn - ASGI server
- Pydantic - Data validation
- LangChain - RAG orchestration
- Sentence-Transformers - Embeddings
- FAISS - Vector database
- youtube-transcript-api - Transcript extraction
- Chrome Extension API - Browser integration
- HTML/CSS/JavaScript - UI
- PyTorch - Deep learning
- Transformers - Hugging Face models
- sentence-transformers/all-MiniLM-L6-v2 - Embedding model
Edit backend/config.py to customize:
# Model settings
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200# Kill process using port 8000
lsof -i :8000
kill -9 <PID>
# Or use a different port
python backend/app.py --port 8001- Video must have captions enabled
- Try with a different video
- Check YouTube's captioning settings
# Reinstall dependencies
pip install -r backend/requirements.txt --upgrade- First run downloads models (~500MB)
- Subsequent runs use cached models
- Consider running on GPU for faster processing
- First request: Slow (models load)
- Subsequent requests: Fast (cached models)
- Large videos: May take longer to process
- Specific questions: More accurate answers than vague ones
- Multi-language support
- Video summarization
- Chat history storage
- User authentication
- Database persistence
- GPU optimization
- Docker containerization
- Deployment to cloud (AWS/GCP/Azure)
- Support for other video platforms
Interactive API documentation available at:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc - OpenAPI Schema:
http://localhost:8000/openapi.json
YouTube videos with transcripts that work well:
https://www.youtube.com/watch?v=jNQXAC9IVRw(First YouTube video)https://www.youtube.com/watch?v=dQw4w9WgXcQ(Popular music video)- Most TED Talks
- Most educational content
- Most podcasts on YouTube
See backend/requirements.txt for complete list:
fastapi==0.128.0
uvicorn[standard]==0.40.0
pydantic==2.12.5
langchain==0.1.11
langchain-community==0.0.25
langchain-core==0.1.29
faiss-cpu==1.13.2
sentence-transformers==3.0.1
youtube-transcript-api==1.2.3
python-dotenv==1.0.0
torch==2.2.0
transformers==4.35.2
This project is open source and available under the MIT License.
Contributions are welcome! To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open a Pull Request
For issues or questions:
- Check the Troubleshooting section
- Review API documentation at
/docs - Check terminal logs for error messages
- Test with example videos first
- LangChain for RAG framework
- Hugging Face for sentence-transformers
- Facebook for FAISS library
- OpenAI for LLM inspiration
- YouTube for transcript API
- Initial release
- Core RAG functionality
- YouTube transcript extraction
- FastAPI backend
- Chrome extension
Happy Chatting! 🚀