A comprehensive, secure, and multilingual AI-powered document assistant built with Retrieval-Augmented Generation (RAG) technology. This application enables users to upload documents, engage in intelligent conversations about document content, and automatically process email threads for summarization.
- Document Management: Upload, view, and manage documents (PDF, DOCX, XLSX, TXT)
- Intelligent Q&A: Ask questions about uploaded documents and receive AI-generated answers
- Conversation History: Maintains context across conversations for natural interactions
- Email Integration: Automatically monitor and summarize email threads
- Multilingual Support: Interface and responses in English, Arabic, and French
- Secure Authentication: JWT-based login/registration system
- Role-based Access: Admin and user roles with different permissions
- User Profiles: Manage personal information and preferences
- RAG Technology: Advanced retrieval-augmented generation for accurate document analysis
- Vector Search: FAISS-based vector storage for efficient document retrieval
- Real-time Processing: Live document analysis and email monitoring
- Responsive Design: Modern, mobile-friendly interface
app/
βββ config/ # Configuration management
β βββ config.py # Environment variables and settings
βββ models/ # Data models
β βββ user.py # User model and authentication
β βββ document.py # Document storage and metadata
β βββ conversation.py # Chat history and context
β βββ email.py # Email processing models
βββ routes/ # API endpoints and routing
β βββ auth.py # Authentication routes
β βββ documents.py # Document management
β βββ conversations.py # Chat functionality
β βββ email_management.py # Email processing
β βββ main.py # Main application routes
βββ services/ # Business logic layer
β βββ auth/ # Authentication services
β βββ document/ # Document processing
β βββ rag/ # RAG implementation
β βββ email/ # Email services
βββ static/ # Frontend assets
β βββ css/ # Stylesheets
β βββ js/ # JavaScript files
β βββ img/ # Images and icons
βββ templates/ # HTML templates
βββ auth/ # Authentication pages
βββ documents/ # Document management UI
βββ conversations/ # Chat interface
βββ email/ # Email management UI
- Framework: Flask (Python web framework)
- Database: MongoDB (NoSQL document database)
- Authentication: JWT tokens with secure password hashing
- API Integration: OpenAI GPT models / Azure OpenAI
- RAG Implementation: LangChain framework
- Vector Storage: FAISS (Facebook AI Similarity Search)
- Document Processing: PyPDF2, python-docx, openpyxl
- Text Processing: NLTK, spaCy for text analysis
- UI Framework: Bootstrap 5
- Styling: Custom CSS with responsive design
- JavaScript: Vanilla JS for dynamic interactions
- Icons: Custom SVG icons and Bootstrap icons
- IMAP: Email monitoring and retrieval
- SMTP: Automated email responses
- Thread Detection: Intelligent email thread identification
- Python: 3.12
- MongoDB: latest lts
- Memory: Minimum 2GB RAM (4GB recommended)
- Storage: 1GB free space for documents and vector storage
- OpenAI API: Access to GPT-3.5/GPT-4 models
- Email Account: IMAP/SMTP enabled email account (optional)
git clone https://github.com/yourusername/chatbot-rag.git
cd chatbot-ragpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txt# Install MongoDB (Ubuntu/Debian)
sudo apt-get install mongodb
# Install MongoDB (macOS)
brew install mongodb-community
# Start MongoDB service
sudo systemctl start mongod # Linux
brew services start mongodb-community # macOSCreate a .env file in the project root:
# Flask Configuration
SECRET_KEY=your-super-secret-key-here
DEBUG=False
FLASK_ENV=production
# Database Configuration
MONGO_URI=mongodb://localhost:27017/document_assistant
# OpenAI Configuration
OPENAI_API_TYPE=openai
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_API_VERSION=2023-05-15
# Azure OpenAI (Alternative)
# OPENAI_API_TYPE=azure
# AZURE_ENDPOINT=your-azure-endpoint
# AZURE_DEPLOYMENT=your-deployment-name
# Vector Store Configuration
VECTOR_STORE_TYPE=faiss
VECTOR_STORE_PATH=./vector_store
# Email Configuration (Optional)
EMAIL_IMAP_SERVER=imap.gmail.com
EMAIL_IMAP_PORT=993
EMAIL_SMTP_SERVER=smtp.gmail.com
EMAIL_SMTP_PORT=587
EMAIL_USERNAME=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
# Upload Configuration
UPLOAD_FOLDER=./uploads
MAX_CONTENT_LENGTH=16777216 # 16MB
ALLOWED_EXTENSIONS=pdf,docx,txt,xlsx
# Language Configuration
DEFAULT_LANGUAGE=en
SUPPORTED_LANGUAGES=en,ar,frpython run.pyThe application will automatically:
- Create the database collections
- Set up default admin and user accounts
- Initialize the vector store directory
export FLASK_ENV=development
python run.pyexport FLASK_ENV=production
gunicorn -w 4 -b 0.0.0.0:5000 run:app# Build the image
docker build -t ai-document-assistant .
# Run the container
docker run -p 5000:5000 --env-file .env ai-document-assistantThe application comes with pre-configured test accounts:
| Role | Username | Password | |
|---|---|---|---|
| Admin | admin |
admin123 |
admin@test.com |
| User | user |
user123 |
user@test.com |
- Navigate to
http://localhost:5000 - Login with default credentials or register a new account
- Admin users have access to user management features
- Upload: Go to Documents β Upload new files
- Supported Formats: PDF, DOCX, TXT, XLSX
- Processing: Documents are automatically processed and indexed
- View: Browse uploaded documents with metadata
- Create: Start a new conversation linked to specific documents
- Chat: Ask questions about document content
- History: View previous conversations and context
- Export: Download conversation transcripts
- Setup: Configure IMAP/SMTP settings in Email section
- Monitoring: Enable automatic email thread detection
- Summarization: AI-generated summaries of email threads
- Responses: Automated email responses based on document content
- FAISS: Fast similarity search (default)
- ChromaDB: Alternative vector database
- English: Default language
- Arabic: RTL support included
- French: Full localization
- Gmail: IMAP/SMTP with app passwords
- Outlook: Office 365 integration
- Custom: Any IMAP/SMTP server
- JWT Tokens: Secure session management
- Password Hashing: bcrypt with salt
- Role-based Access: Admin and user permissions
- Session Timeout: Automatic logout for security
- Document Encryption: Secure file storage
- API Key Security: Environment variable storage
- Input Validation: XSS and injection protection
- CORS Configuration: Cross-origin request security
- User Data Isolation: Documents restricted to owners
- Audit Logging: User activity tracking
- Data Retention: Configurable cleanup policies
# Install test dependencies
pip install pytest pytest-flask
# Run all tests
pytest
# Run with coverage
pytest --cov=app- Unit Tests: Individual component testing
- Integration Tests: API endpoint testing
- Authentication Tests: Login/registration flows
- Document Tests: Upload and processing
- Level: INFO, WARNING, ERROR
- Format: Structured JSON logging
- Rotation: Daily log file rotation
- Monitoring: Health check endpoints
- Response Times: API endpoint performance
- Document Processing: Upload and indexing times
- Memory Usage: Application resource monitoring
- Database Queries: MongoDB performance tracking
- Change default admin credentials
- Set strong SECRET_KEY
- Configure production database
- Set up SSL/TLS certificates
- Configure reverse proxy (Nginx)
- Set up monitoring and logging
- Configure backup strategy
- Test email integration
- Performance optimization
# Production settings
export FLASK_ENV=production
export DEBUG=False
export SECRET_KEY=your-production-secret-key
export MONGO_URI=mongodb://your-production-db:27017/document_assistant- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
- Python: PEP 8 compliance
- JavaScript: ESLint configuration
- CSS: BEM methodology
- Documentation: Inline comments and docstrings
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ using Flask, MongoDB, and OpenAI