Skip to content

aun151214/ai-document-chatbot-streamlit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Document Chatbot

A document question-answering app built with Python and Streamlit.

This project allows users to upload PDF or TXT documents, ask questions, retrieve the most relevant document sections, and optionally generate concise answers using the OpenAI API.

Project Purpose

Long documents are often difficult to search manually, especially when users need quick answers from FAQs, reports, policies, study notes, or business documents. This project demonstrates a simple document chatbot workflow that retrieves relevant information from uploaded files and avoids answering when the uploaded document does not contain enough information.

Features

  • Upload PDF or TXT files
  • Extract text from uploaded documents
  • Split long documents into searchable text chunks
  • Retrieve relevant sections using TF-IDF similarity
  • Ask natural-language questions
  • Show retrieved source sections
  • Detect when a question is not supported by the uploaded document
  • Optional OpenAI API support for generated answers
  • Simple, clean, and modular Python project structure

Demo Screenshots

App Home

App Home

Correct Answer Example

Correct Answer Demo

Not Found Example

Not Found Demo

Tech Stack

  • Python
  • Streamlit
  • scikit-learn
  • pypdf
  • OpenAI API
  • Git and GitHub

Folder Structure

ai_document_chatbot/
├── app.py
├── requirements.txt
├── README.md
├── assets/
│   ├── app_home.png
│   ├── correct_answer_demo.png
│   └── not_found_demo.png
├── sample_docs/
│   └── business_faq.txt
├── src/
│   ├── document_loader.py
│   ├── generator.py
│   ├── retriever.py
│   └── text_splitter.py
└── tests/
    └── test_splitter.py

How It Works

  1. The user uploads a PDF or TXT document.
  2. The document text is extracted.
  3. The text is split into smaller chunks.
  4. A TF-IDF retriever finds the most relevant chunks for the user question.
  5. If an OpenAI API key is available, the app generates an answer using the retrieved context.
  6. If no OpenAI API key is available, the app shows the most relevant document sections.
  7. If the question is not supported by the uploaded document, the app avoids inventing an answer.

Installation

Create and activate a virtual environment:

python -m venv .venv

On Windows PowerShell:

.venv\Scripts\Activate.ps1

If PowerShell blocks script activation, run this once:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Then activate the environment again:

.venv\Scripts\Activate.ps1

Install dependencies:

pip install -r requirements.txt

Run the app:

streamlit run app.py

Optional OpenAI API Setup

The app can work without an OpenAI API key by showing the most relevant document sections. To enable generated answers, set an OpenAI API key as an environment variable.

On Windows PowerShell:

$env:OPENAI_API_KEY="your_api_key_here"

Then run:

streamlit run app.py

Do not save API keys directly inside the code or upload them to GitHub.

Example Questions

For the sample clinic FAQ document, users can ask:

What services does the clinic provide?
How can new patients book an appointment?
What are the opening hours?
Who is the CEO of the clinic?

The final question is not answered because the uploaded document does not contain CEO information.

Business Use Case

This type of app can be adapted for:

  • Company FAQ assistants
  • Internal document search
  • Customer support knowledge bases
  • Student study assistants
  • Research paper question-answering
  • Policy and procedure document search

Limitations

  • This is a portfolio prototype, not a production-ready chatbot.
  • TF-IDF retrieval is useful for simple search, but it does not capture meaning as deeply as embedding-based retrieval.
  • Large document collections may require a vector database.
  • PDF extraction quality depends on the structure and formatting of the uploaded PDF.
  • OpenAI answer generation requires a valid API key.

Future Improvements

  • Add embedding-based semantic search
  • Add ChromaDB or FAISS vector storage
  • Add chat history
  • Add source highlighting
  • Add user authentication
  • Deploy the app online
  • Add support for DOCX files and web pages

Author

Aun Ali
Applied AI, Machine Learning, and Computer Vision Developer
GitHub: https://github.com/aun151214

About

Document Q&A chatbot built with Python and Streamlit. Upload PDF/TXT files, retrieve relevant sections with TF-IDF, and optionally generate answers with the OpenAI API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages