AI Document Chatbot

A document question-answering app built with Python and Streamlit.

This project allows users to upload PDF or TXT documents, ask questions, retrieve the most relevant document sections, and optionally generate concise answers using the OpenAI API.

Project Purpose

Long documents are often difficult to search manually, especially when users need quick answers from FAQs, reports, policies, study notes, or business documents. This project demonstrates a simple document chatbot workflow that retrieves relevant information from uploaded files and avoids answering when the uploaded document does not contain enough information.

Features

Upload PDF or TXT files
Extract text from uploaded documents
Split long documents into searchable text chunks
Retrieve relevant sections using TF-IDF similarity
Ask natural-language questions
Show retrieved source sections
Detect when a question is not supported by the uploaded document
Optional OpenAI API support for generated answers
Simple, clean, and modular Python project structure

Demo Screenshots

App Home

Correct Answer Example

Not Found Example

Tech Stack

Python
Streamlit
scikit-learn
pypdf
OpenAI API
Git and GitHub

Folder Structure

ai_document_chatbot/
├── app.py
├── requirements.txt
├── README.md
├── assets/
│   ├── app_home.png
│   ├── correct_answer_demo.png
│   └── not_found_demo.png
├── sample_docs/
│   └── business_faq.txt
├── src/
│   ├── document_loader.py
│   ├── generator.py
│   ├── retriever.py
│   └── text_splitter.py
└── tests/
    └── test_splitter.py

How It Works

The user uploads a PDF or TXT document.
The document text is extracted.
The text is split into smaller chunks.
A TF-IDF retriever finds the most relevant chunks for the user question.
If an OpenAI API key is available, the app generates an answer using the retrieved context.
If no OpenAI API key is available, the app shows the most relevant document sections.
If the question is not supported by the uploaded document, the app avoids inventing an answer.

Installation

Create and activate a virtual environment:

python -m venv .venv

On Windows PowerShell:

.venv\Scripts\Activate.ps1

If PowerShell blocks script activation, run this once:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Then activate the environment again:

.venv\Scripts\Activate.ps1

Install dependencies:

pip install -r requirements.txt

Run the app:

streamlit run app.py

Optional OpenAI API Setup

The app can work without an OpenAI API key by showing the most relevant document sections. To enable generated answers, set an OpenAI API key as an environment variable.

On Windows PowerShell:

$env:OPENAI_API_KEY="your_api_key_here"

Then run:

streamlit run app.py

Do not save API keys directly inside the code or upload them to GitHub.

Example Questions

For the sample clinic FAQ document, users can ask:

What services does the clinic provide?
How can new patients book an appointment?
What are the opening hours?
Who is the CEO of the clinic?

The final question is not answered because the uploaded document does not contain CEO information.

Business Use Case

This type of app can be adapted for:

Company FAQ assistants
Internal document search
Customer support knowledge bases
Student study assistants
Research paper question-answering
Policy and procedure document search

Limitations

This is a portfolio prototype, not a production-ready chatbot.
TF-IDF retrieval is useful for simple search, but it does not capture meaning as deeply as embedding-based retrieval.
Large document collections may require a vector database.
PDF extraction quality depends on the structure and formatting of the uploaded PDF.
OpenAI answer generation requires a valid API key.

Future Improvements

Add embedding-based semantic search
Add ChromaDB or FAISS vector storage
Add chat history
Add source highlighting
Add user authentication
Deploy the app online
Add support for DOCX files and web pages

Author

Aun Ali
Applied AI, Machine Learning, and Computer Vision Developer
GitHub: https://github.com/aun151214

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Document Chatbot

Project Purpose

Features

Demo Screenshots

App Home

Correct Answer Example

Not Found Example

Tech Stack

Folder Structure

How It Works

Installation

Optional OpenAI API Setup

Example Questions

Business Use Case

Limitations

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
sample_docs		sample_docs
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Document Chatbot

Project Purpose

Features

Demo Screenshots

App Home

Correct Answer Example

Not Found Example

Tech Stack

Folder Structure

How It Works

Installation

Optional OpenAI API Setup

Example Questions

Business Use Case

Limitations

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages