Skip to content

talhabinjaved/ai-document-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Document Assistant

A comprehensive, secure, and multilingual AI-powered document assistant built with Retrieval-Augmented Generation (RAG) technology. This application enables users to upload documents, engage in intelligent conversations about document content, and automatically process email threads for summarization.

πŸš€ Features

Core Functionality

  • Document Management: Upload, view, and manage documents (PDF, DOCX, XLSX, TXT)
  • Intelligent Q&A: Ask questions about uploaded documents and receive AI-generated answers
  • Conversation History: Maintains context across conversations for natural interactions
  • Email Integration: Automatically monitor and summarize email threads
  • Multilingual Support: Interface and responses in English, Arabic, and French

User Management

  • Secure Authentication: JWT-based login/registration system
  • Role-based Access: Admin and user roles with different permissions
  • User Profiles: Manage personal information and preferences

Advanced Features

  • RAG Technology: Advanced retrieval-augmented generation for accurate document analysis
  • Vector Search: FAISS-based vector storage for efficient document retrieval
  • Real-time Processing: Live document analysis and email monitoring
  • Responsive Design: Modern, mobile-friendly interface

πŸ—οΈ Project Architecture

Backend Structure

app/
β”œβ”€β”€ config/                 # Configuration management
β”‚   └── config.py          # Environment variables and settings
β”œβ”€β”€ models/                # Data models
β”‚   β”œβ”€β”€ user.py           # User model and authentication
β”‚   β”œβ”€β”€ document.py       # Document storage and metadata
β”‚   β”œβ”€β”€ conversation.py   # Chat history and context
β”‚   └── email.py          # Email processing models
β”œβ”€β”€ routes/               # API endpoints and routing
β”‚   β”œβ”€β”€ auth.py          # Authentication routes
β”‚   β”œβ”€β”€ documents.py     # Document management
β”‚   β”œβ”€β”€ conversations.py # Chat functionality
β”‚   β”œβ”€β”€ email_management.py # Email processing
β”‚   └── main.py          # Main application routes
β”œβ”€β”€ services/            # Business logic layer
β”‚   β”œβ”€β”€ auth/           # Authentication services
β”‚   β”œβ”€β”€ document/       # Document processing
β”‚   β”œβ”€β”€ rag/           # RAG implementation
β”‚   └── email/         # Email services
β”œβ”€β”€ static/            # Frontend assets
β”‚   β”œβ”€β”€ css/          # Stylesheets
β”‚   β”œβ”€β”€ js/           # JavaScript files
β”‚   └── img/          # Images and icons
└── templates/        # HTML templates
    β”œβ”€β”€ auth/         # Authentication pages
    β”œβ”€β”€ documents/    # Document management UI
    β”œβ”€β”€ conversations/ # Chat interface
    └── email/        # Email management UI

Technology Stack

Backend

  • Framework: Flask (Python web framework)
  • Database: MongoDB (NoSQL document database)
  • Authentication: JWT tokens with secure password hashing
  • API Integration: OpenAI GPT models / Azure OpenAI

AI & ML

  • RAG Implementation: LangChain framework
  • Vector Storage: FAISS (Facebook AI Similarity Search)
  • Document Processing: PyPDF2, python-docx, openpyxl
  • Text Processing: NLTK, spaCy for text analysis

Frontend

  • UI Framework: Bootstrap 5
  • Styling: Custom CSS with responsive design
  • JavaScript: Vanilla JS for dynamic interactions
  • Icons: Custom SVG icons and Bootstrap icons

Email Processing

  • IMAP: Email monitoring and retrieval
  • SMTP: Automated email responses
  • Thread Detection: Intelligent email thread identification

πŸ“‹ Prerequisites

System Requirements

  • Python: 3.12
  • MongoDB: latest lts
  • Memory: Minimum 2GB RAM (4GB recommended)
  • Storage: 1GB free space for documents and vector storage

External Services

  • OpenAI API: Access to GPT-3.5/GPT-4 models
  • Email Account: IMAP/SMTP enabled email account (optional)

πŸ› οΈ Installation & Setup

1. Clone the Repository

git clone https://github.com/yourusername/chatbot-rag.git
cd chatbot-rag

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Database Setup

# Install MongoDB (Ubuntu/Debian)
sudo apt-get install mongodb

# Install MongoDB (macOS)
brew install mongodb-community

# Start MongoDB service
sudo systemctl start mongod  # Linux
brew services start mongodb-community  # macOS

5. Environment Configuration

Create a .env file in the project root:

# Flask Configuration
SECRET_KEY=your-super-secret-key-here
DEBUG=False
FLASK_ENV=production

# Database Configuration
MONGO_URI=mongodb://localhost:27017/document_assistant

# OpenAI Configuration
OPENAI_API_TYPE=openai
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_API_VERSION=2023-05-15

# Azure OpenAI (Alternative)
# OPENAI_API_TYPE=azure
# AZURE_ENDPOINT=your-azure-endpoint
# AZURE_DEPLOYMENT=your-deployment-name

# Vector Store Configuration
VECTOR_STORE_TYPE=faiss
VECTOR_STORE_PATH=./vector_store

# Email Configuration (Optional)
EMAIL_IMAP_SERVER=imap.gmail.com
EMAIL_IMAP_PORT=993
EMAIL_SMTP_SERVER=smtp.gmail.com
EMAIL_SMTP_PORT=587
EMAIL_USERNAME=your-email@gmail.com
EMAIL_PASSWORD=your-app-password

# Upload Configuration
UPLOAD_FOLDER=./uploads
MAX_CONTENT_LENGTH=16777216  # 16MB
ALLOWED_EXTENSIONS=pdf,docx,txt,xlsx

# Language Configuration
DEFAULT_LANGUAGE=en
SUPPORTED_LANGUAGES=en,ar,fr

6. Initialize the Application

python run.py

The application will automatically:

  • Create the database collections
  • Set up default admin and user accounts
  • Initialize the vector store directory

πŸš€ Running the Application

Development Mode

export FLASK_ENV=development
python run.py

Production Mode

export FLASK_ENV=production
gunicorn -w 4 -b 0.0.0.0:5000 run:app

Docker Deployment (Optional)

# Build the image
docker build -t ai-document-assistant .

# Run the container
docker run -p 5000:5000 --env-file .env ai-document-assistant

πŸ‘₯ Default Accounts

The application comes with pre-configured test accounts:

Role Username Password Email
Admin admin admin123 admin@test.com
User user user123 user@test.com

⚠️ Important: Change these default credentials in production!

πŸ“– Usage Guide

1. Authentication

  • Navigate to http://localhost:5000
  • Login with default credentials or register a new account
  • Admin users have access to user management features

2. Document Management

  • Upload: Go to Documents β†’ Upload new files
  • Supported Formats: PDF, DOCX, TXT, XLSX
  • Processing: Documents are automatically processed and indexed
  • View: Browse uploaded documents with metadata

3. Conversations

  • Create: Start a new conversation linked to specific documents
  • Chat: Ask questions about document content
  • History: View previous conversations and context
  • Export: Download conversation transcripts

4. Email Integration (Optional)

  • Setup: Configure IMAP/SMTP settings in Email section
  • Monitoring: Enable automatic email thread detection
  • Summarization: AI-generated summaries of email threads
  • Responses: Automated email responses based on document content

πŸ”§ Configuration Options

Vector Store Types

  • FAISS: Fast similarity search (default)
  • ChromaDB: Alternative vector database

Language Support

  • English: Default language
  • Arabic: RTL support included
  • French: Full localization

Email Providers

  • Gmail: IMAP/SMTP with app passwords
  • Outlook: Office 365 integration
  • Custom: Any IMAP/SMTP server

πŸ”’ Security Features

Authentication & Authorization

  • JWT Tokens: Secure session management
  • Password Hashing: bcrypt with salt
  • Role-based Access: Admin and user permissions
  • Session Timeout: Automatic logout for security

Data Protection

  • Document Encryption: Secure file storage
  • API Key Security: Environment variable storage
  • Input Validation: XSS and injection protection
  • CORS Configuration: Cross-origin request security

Privacy

  • User Data Isolation: Documents restricted to owners
  • Audit Logging: User activity tracking
  • Data Retention: Configurable cleanup policies

πŸ§ͺ Testing

Run Tests

# Install test dependencies
pip install pytest pytest-flask

# Run all tests
pytest

# Run with coverage
pytest --cov=app

Test Categories

  • Unit Tests: Individual component testing
  • Integration Tests: API endpoint testing
  • Authentication Tests: Login/registration flows
  • Document Tests: Upload and processing

πŸ“Š Monitoring & Logging

Application Logs

  • Level: INFO, WARNING, ERROR
  • Format: Structured JSON logging
  • Rotation: Daily log file rotation
  • Monitoring: Health check endpoints

Performance Metrics

  • Response Times: API endpoint performance
  • Document Processing: Upload and indexing times
  • Memory Usage: Application resource monitoring
  • Database Queries: MongoDB performance tracking

πŸš€ Deployment

Production Checklist

  • Change default admin credentials
  • Set strong SECRET_KEY
  • Configure production database
  • Set up SSL/TLS certificates
  • Configure reverse proxy (Nginx)
  • Set up monitoring and logging
  • Configure backup strategy
  • Test email integration
  • Performance optimization

Environment Variables

# Production settings
export FLASK_ENV=production
export DEBUG=False
export SECRET_KEY=your-production-secret-key
export MONGO_URI=mongodb://your-production-db:27017/document_assistant

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

Code Style

  • Python: PEP 8 compliance
  • JavaScript: ESLint configuration
  • CSS: BEM methodology
  • Documentation: Inline comments and docstrings

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built with ❀️ using Flask, MongoDB, and OpenAI

About

AI-powered document assistant with RAG technology. Upload docs, ask questions, get intelligent answers. Features email integration, multilingual support, and secure authentication

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors