AskMyDocs: Building a Plug-and-Play RAG Pipeline

This guide transforms the AskMyDocs project into a modular blueprint for building your own Retrieval-Augmented Generation (RAG) applications using LangChain.

View the Project Demo Video Here

1. Project Goal and Context

1.1 What is AskMyDocs?

Goal: To create an intelligent Q&A assistant that can converse with your private or specialized documents (PDFs).
Core Technology: Built using Streamlit (UI) and the LangChain framework (pipeline).
Benefit: Allows you to upload documents and ask questions in natural language, eliminating the need for manual searching.

1.2 The Problem RAG Solves

LLM Limitation: Large Language Models (LLMs) like GPT-4o-mini have a knowledge cutoff and can't answer questions about current, private, or specialized data.
RAG Solution: Retrieval-Augmented Generation (RAG) connects the LLM to an external knowledge base (your documents).
Result: The LLM's answers are grounded in your specific source material, making them factual and highly relevant.

2. The Conventional RAG Pipeline Explained

RAG is a two-step process: Ingestion (data prep) and Retrieval & Generation (answering the question).

Phase A: Ingestion (Data Preparation)

Load & Clean

The pipeline reads raw data (PDFs).
Text is cleaned to remove noise like page numbers, headers, and footers.

Split (Chunking)

Documents are broken down into small, semantically meaningful chunks because LLMs have input size limits (context window).

Embeddings

Each text chunk is converted into a high-dimensional vector (numerical representation) that captures its meaning.

Vector Store

These vectors are stored in a specialized database (the Vector Store) for ultra-fast similarity searching.

Phase B: Retrieval & Generation (Answering the Query)

User Query

A user asks a question (e.g., "What were the Q3 results?").

Query Embedding

The question is converted into a vector using the same embedding model.

Retrieval (Similarity Search)

The query vector is matched against the Vector Store to find the top K (e.g., top 3) most relevant document chunks.

Augmentation

The LLM receives a prompt containing the original question, the conversation history, and the retrieved document chunks.

Generation

The LLM synthesizes a final answer based only on the provided context.

3. Implementation in `app.ipynb`

This project uses specific functions and LangChain components to map the RAG process:

RAG Step	Function	Logic
Load & Clean	`extract_pdf_text`	Uses PyPDF2 for extraction followed by RegEx cleaning to strip document artifacts.
Chunking	`create_text_chunks`	Uses RecursiveCharacterTextSplitter with chunk size `3000` and overlap `200`. This preserves context across chunks.
Embeddings & Vector Store	`create_vector_db`	Generates embeddings via OpenAI's model and stores them in the in-memory FAISS index.
Orchestration & Memory	`converse_using_history`	Sets up ConversationalRetrievalChain and ConversationBufferMemory to maintain chat history.
Retrieval & Generation	`process_user_input`	Passes the augmented prompt to the LLM (GPT-4o-mini) and displays the grounded response.

4. Appendix: Plug-and-Play RAG Component Choices

The project is modular, allowing you to easily swap components to build a custom RAG pipeline.

RAG Component: LLM (Brain)
- Current Choice: gpt-4o-mini
- Plug-and-Play Alternatives (Examples): GPT-4o (Higher performance/cost), Llama 3, Mistral (Open-Source via HuggingFace).
RAG Component: Embeddings (Meaning)
- Current Choice: text-embedding-ada-002
- Plug-and-Play Alternatives (Examples): text-embedding-3-small (Newer/cheaper), all-mpnet-base-v2 (Open-Source).
RAG Component: Vector Store (Memory)
- Current Choice: FAISS
- Plug-and-Play Alternatives (Examples): Chroma (Lightweight), Qdrant, Milvus (Scalable production databases).
RAG Component: Chunking
- Current Choice: RecursiveCharacterTextSplitter
- Plug-and-Play Alternatives (Examples): CharacterTextSplitter (Simple), NLTKTextSplitter (Sentence-based).
RAG Component: Chain (Logic)
- Current Choice: ConversationalRetrievalChain
- Plug-and-Play Alternatives (Examples): Map-Reduce or Refine chains (For summarizing large contexts).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
AskMyDocs.mp4		AskMyDocs.mp4
README.md		README.md
Streamlit-start.ipynb		Streamlit-start.ipynb
app.ipynb		app.ipynb
cleaned_test_data.csv		cleaned_test_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AskMyDocs: Building a Plug-and-Play RAG Pipeline

View the Project Demo Video Here

1. Project Goal and Context

1.1 What is AskMyDocs?

1.2 The Problem RAG Solves

2. The Conventional RAG Pipeline Explained

Phase A: Ingestion (Data Preparation)

Load & Clean

Split (Chunking)

Embeddings

Vector Store

Phase B: Retrieval & Generation (Answering the Query)

User Query

Query Embedding

Retrieval (Similarity Search)

Augmentation

Generation

3. Implementation in `app.ipynb`

4. Appendix: Plug-and-Play RAG Component Choices

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AskMyDocs: Building a Plug-and-Play RAG Pipeline

View the Project Demo Video Here

1. Project Goal and Context

1.1 What is AskMyDocs?

1.2 The Problem RAG Solves

2. The Conventional RAG Pipeline Explained

Phase A: Ingestion (Data Preparation)

Load & Clean

Split (Chunking)

Embeddings

Vector Store

Phase B: Retrieval & Generation (Answering the Query)

User Query

Query Embedding

Retrieval (Similarity Search)

Augmentation

Generation

3. Implementation in app.ipynb

4. Appendix: Plug-and-Play RAG Component Choices

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Implementation in `app.ipynb`

Packages