A Streamlit-based multi-PDF document Question & Answer system using
Retrieval-Augmented Generation (RAG) powered by Llama-3 via Groq and ChromaDB.
- 📄 Upload multiple PDFs
- 🔍 Semantic search using HuggingFace embeddings
- 🧠 Accurate answers using Llama-3.3-70B (Groq)
- 🧩 Vector storage with ChromaDB
- ❓ Ask multiple questions at once
- 🧠 Clean Question → Answer UI
- ⚡ Fast inference via Groq API
├── app.py # Streamlit app
├── rag_utility.py # PDF processing + RAG logic
├── requirements.txt # Dependencies
├── env_template.txt # Environment variable template
├── .gitignore
├── LICENSE
└── README.md
- Upload PDFs using Streamlit
- PDFs are:
- Loaded using
UnstructuredPDFLoader - Split into chunks
- Converted into embeddings
- Loaded using
- Embeddings are stored in ChromaDB
- User questions are:
- Retrieved via similarity search
- Passed to Llama-3 with context
- Model returns grounded answers
git clone https://github.com/your-username/multi-pdf-rag-streamlit.git
cd multi-pdf-rag-streamlitconda create -n rag python=3.10
conda activate rag
pip install -r requirements.txt
(or copy env_template.txt → .env)
Create a .env file in the root directory:
GROQ_API_KEY=your_groq_api_key_here
You can refer to env_template.txt for guidance.
streamlit run app.py
- Upload one or more PDFs
- Enter questions (one per line), for example:
What is an ecosystem?
What are the types of ecosystems?
Forerunners of Evo-Devo?
- Click Answer
- Get structured Question → Answer results
- Frontend: Streamlit
- LLM: Llama-3.3-70B (Groq)
- Embeddings: all-MiniLM-L6-v2
- Vector Database: ChromaDB
- Framework: LangChain
- Language: Python
This application is ready to deploy on:
- Streamlit Cloud
- Docker
- Any cloud VM (AWS / GCP / Azure)
This project is licensed under the MIT License. You are free to use, modify, and distribute it.
- Groq
- LangChain
- HuggingFace
- Streamlit