YouTube Chatbot 🎥💬

An intelligent chatbot system that extracts YouTube video transcripts and answers questions about them using RAG (Retrieval-Augmented Generation) and embeddings.

Features

✨ Core Features:

🎬 Extract transcripts from YouTube videos
🔍 Semantic search using embeddings
🤖 AI-powered question answering
💾 Session-based video storage
🌐 RESTful FastAPI backend
🔌 Chrome extension frontend
⚡ Real-time processing

Project Structure

youtube_chatbot/
├── backend/
│   ├── app.py                 # FastAPI server & endpoints
│   ├── video_processor.py     # YouTube transcript extraction
│   ├── rag_pipeline.py        # RAG pipeline & embeddings
│   ├── models.py              # Pydantic request/response models
│   ├── config.py              # Configuration & settings
│   ├── requirements.txt        # Python dependencies
│   └── __pycache__/
├── chrome-extension/
│   ├── manifest.json          # Extension metadata
│   ├── popup.html             # UI
│   ├── popup.js               # JavaScript logic
│   └── styles.css             # Styling
└── README.md                  # This file

Installation & Setup

Prerequisites

Python 3.10+
pip or conda
Virtual environment (recommended)

Backend Setup

Navigate to project directory:

cd e:\project\youtube_chatbot

Create and activate virtual environment:

# Windows
python -m venv .venv
.venv\Scripts\activate

# macOS/Linux
python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r backend/requirements.txt

Run the backend server:

python backend/app.py

The server will start on http://localhost:8000

Chrome Extension Setup

Open Chrome and go to chrome://extensions/
Enable "Developer mode" (top-right toggle)
Click "Load unpacked"
Select the chrome-extension/ folder
The extension will appear in your toolbar

Usage

Quick Start

Start the backend:

cd e:\project\youtube_chatbot
.venv\Scripts\activate
python backend/app.py

Open interactive API docs:
- Navigate to http://localhost:8000/docs
- This provides an interactive Swagger UI for testing
Process a video:
- Click /process-video endpoint
- Enter a YouTube URL: https://www.youtube.com/watch?v=VIDEO_ID
- Execute and save the session_id
Ask questions:
- Click /query endpoint
- Paste the session_id
- Enter your question
- Get instant answers!

Screenshots

Chrome Extension Demo

The extension popup displaying video processing and Q&A interface:

API Process Video Response

Successful video processing with chunks created:

Query API Request

Sending a question to the chatbot:

Query API Response

AI-generated answer with sources and confidence score:

Response Message Display

Formatted response showing the generated answer:

API Endpoints

1. Health Check

GET /health

Check if the system is ready.

Response:

{
  "status": "healthy",
  "timestamp": "2026-01-17T12:00:00",
  "models_loaded": true,
  "active_sessions": 1
}

2. Process Video

POST /process-video

Extract and index a YouTube video.

Request:

{
  "video_url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}

Response:

{
  "success": true,
  "session_id": "session_abc123xyz",
  "message": "Video processed successfully",
  "video_title": "Video dQw4w9WgXcQ",
  "transcript_length": 2089,
  "chunks_created": 5,
  "processing_time": 3.45
}

3. Query Video

POST /query

Ask a question about a processed video.

Request:

{
  "session_id": "session_abc123xyz",
  "query": "What is the main topic?"
}

Response:

{
  "success": true,
  "answer": "Based on the video, the main topic is...",
  "sources": ["chunk_0", "chunk_2"],
  "confidence": 0.92,
  "message": "Answer generated"
}

Using with cURL

# Health check
curl http://localhost:8000/health

# Process video
curl -X POST http://localhost:8000/process-video \
  -H "Content-Type: application/json" \
  -d '{"video_url":"https://www.youtube.com/watch?v=jNQXAC9IVRw"}'

# Query
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"session_id":"your_session_id","query":"Your question here"}'

Using with Python

import requests

BASE_URL = "http://localhost:8000"

# Process video
response = requests.post(
    f"{BASE_URL}/process-video",
    json={"video_url": "https://www.youtube.com/watch?v=jNQXAC9IVRw"}
)
session_id = response.json()["session_id"]

# Query
response = requests.post(
    f"{BASE_URL}/query",
    json={"session_id": session_id, "query": "What's the main topic?"}
)
answer = response.json()["answer"]
print(answer)

How It Works

Architecture

User Input (Video URL)
    ↓
VideoProcessor (Extract Transcript)
    ↓
RAG Pipeline:
  ├─ Text Splitter (Create chunks)
  ├─ Embedding Model (sentence-transformers)
  ├─ Vector Store (FAISS)
  └─ LLM (Generate answers)
    ↓
AI-Generated Answer

Process Flow

Video Processing:
- Extract video ID from YouTube URL
- Download transcript using youtube-transcript-api
- Split into meaningful chunks (recursive character splitter)
Indexing:
- Create embeddings for each chunk using sentence-transformers
- Store embeddings in FAISS vector database
- Create session for future queries
Query Processing:
- Embed user question
- Search FAISS for most relevant chunks
- Send chunks + question to LLM
- Generate contextual answer

Technologies Used

Backend

FastAPI - Web framework
Uvicorn - ASGI server
Pydantic - Data validation
LangChain - RAG orchestration
Sentence-Transformers - Embeddings
FAISS - Vector database
youtube-transcript-api - Transcript extraction

Frontend

Chrome Extension API - Browser integration
HTML/CSS/JavaScript - UI

ML/AI

PyTorch - Deep learning
Transformers - Hugging Face models
sentence-transformers/all-MiniLM-L6-v2 - Embedding model

Configuration

Edit backend/config.py to customize:

# Model settings
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200

Troubleshooting

Port 8000 Already in Use

# Kill process using port 8000
lsof -i :8000
kill -9 <PID>

# Or use a different port
python backend/app.py --port 8001

Transcript Not Found

Video must have captions enabled
Try with a different video
Check YouTube's captioning settings

Import Errors

# Reinstall dependencies
pip install -r backend/requirements.txt --upgrade

Models Loading Slowly

First run downloads models (~500MB)
Subsequent runs use cached models
Consider running on GPU for faster processing

Performance Tips

First request: Slow (models load)
Subsequent requests: Fast (cached models)
Large videos: May take longer to process
Specific questions: More accurate answers than vague ones

Future Enhancements

API Documentation

Interactive API documentation available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI Schema: http://localhost:8000/openapi.json

Example Videos to Test

YouTube videos with transcripts that work well:

https://www.youtube.com/watch?v=jNQXAC9IVRw (First YouTube video)
https://www.youtube.com/watch?v=dQw4w9WgXcQ (Popular music video)
Most TED Talks
Most educational content
Most podcasts on YouTube

Requirements

See backend/requirements.txt for complete list:

fastapi==0.128.0
uvicorn[standard]==0.40.0
pydantic==2.12.5
langchain==0.1.11
langchain-community==0.0.25
langchain-core==0.1.29
faiss-cpu==1.13.2
sentence-transformers==3.0.1
youtube-transcript-api==1.2.3
python-dotenv==1.0.0
torch==2.2.0
transformers==4.35.2

License

This project is open source and available under the MIT License.

Contributing

Contributions are welcome! To contribute:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit changes (git commit -m 'Add AmazingFeature')
Push to branch (git push origin feature/AmazingFeature)
Open a Pull Request

Support

For issues or questions:

Check the Troubleshooting section
Review API documentation at /docs
Check terminal logs for error messages
Test with example videos first

Acknowledgments

LangChain for RAG framework
Hugging Face for sentence-transformers
Facebook for FAISS library
OpenAI for LLM inspiration
YouTube for transcript API

Changelog

v1.0.0 (2026-01-17)

Initial release
Core RAG functionality
YouTube transcript extraction
FastAPI backend
Chrome extension

Happy Chatting! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
chrome-extension		chrome-extension
images		images
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

YouTube Chatbot 🎥💬

Features

Project Structure

Installation & Setup

Prerequisites

Backend Setup

Chrome Extension Setup

Usage

Quick Start

Screenshots

Chrome Extension Demo

API Process Video Response

Query API Request

Query API Response

Response Message Display

API Endpoints

1. Health Check

2. Process Video

3. Query Video

Using with cURL

Using with Python

How It Works

Architecture

Process Flow

Technologies Used

Backend

Frontend

ML/AI

Configuration

Troubleshooting

Port 8000 Already in Use

Transcript Not Found

Import Errors

Models Loading Slowly

Performance Tips

Future Enhancements

API Documentation

Example Videos to Test

Requirements

License

Contributing

Support

Acknowledgments

Changelog

v1.0.0 (2026-01-17)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages