Real-Time RAG Pipeline with OpenSearch and Gradio

Overview

This project builds a real-time Retrieval Augmented Generation (RAG) pipeline by:

Watching a folder for file creations, updates, and deletions.
Auto-indexing documents into an OpenSearch vector index with embeddings.
Querying via a Gradio UI where users ask questions and retrieve top-matching documents.

Inspired by real-world observability and GenAI use cases!

Features

Real-Time File Monitoring: Automatically detect new/updated/deleted files.
Automatic Embedding: Use Sentence-Transformer to generate vector embeddings.
OpenSearch Vector Search: Store and search document embeddings.
Simple Gradio UI: Query documents naturally using LLM-augmented search.
LightWeight LLM Model: Custom fine-tuned qwen2.5-coder1.5b quantized to 8bit to run on systems without GPU's and still provide a reasonable response time.

Architecture Diagram

Tech Stack

Python 3.10+
OpenSearch (Vector KNN Index)
Sentence-Transformers
Gradio
Watchdog
LLM Model

Use Cases:

Knowledge bases.
Real-time document monitoring.
AI-assisted search apps

Quickstart

1. Clone Repository

git clone https://github.com/Mandark-droid/rag-workflow-opensearch.git 
cd realtime-rag-pipeline-opensearch
pip install -r requirements.txt
# Start OpenSearch locally (via Docker):
docker run -p 9200:9200 -e "discovery.type=single-node" opensearchproject/opensearch:latest

python watcher.py      # start folder watcher
# Add/Update/Delete files in watched_folder to have them auto-indexed and query on them in the next step through the UI
python gradio_app.py   # launch UI

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sample_files		sample_files
watched_folder		watched_folder
.gitignore		.gitignore
Architechture_diagram.png		Architechture_diagram.png
LICENSE		LICENSE
README.md		README.md
config.py		config.py
embedder.py		embedder.py
example_queries.md		example_queries.md
gradio_app.py		gradio_app.py
opensearch_utils.py		opensearch_utils.py
query_handler.py		query_handler.py
requirements.txt		requirements.txt
watcher.py		watcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time RAG Pipeline with OpenSearch and Gradio

Overview

Features

Architecture Diagram

Tech Stack

Use Cases:

Quickstart

1. Clone Repository

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-Time RAG Pipeline with OpenSearch and Gradio

Overview

Features

Architecture Diagram

Tech Stack

Use Cases:

Quickstart

1. Clone Repository

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages