A comprehensive AI-powered therapy assistant built with FastAPI backend and React frontend, featuring advanced machine learning capabilities for psychological pattern detection, topic modeling, and personalized therapeutic responses.
- Dataset Integration: Uses Amod's "mental_health_counseling_conversations" dataset
- Psychological Pattern Detection: ML-enhanced detection of cognitive distortions, defense mechanisms, and emotional patterns
- Topic Modeling: BERTopic-based conversation clustering and classification
- Semantic Search: Vector-based retrieval using sentence transformers and ChromaDB
- Embedding Generation: Automated embedding creation for similarity matching
- Real-time Communication: WebSocket support for instant responses
- Contextual Memory: Session-based conversation history and context retention
- Psychological Insights: Real-time pattern analysis and therapeutic suggestions
- Emotional State Detection: Multi-level emotional intensity analysis
- Confidence Scoring: ML-based confidence metrics for responses
- Data Anonymization: Automatic removal of personal identifiers
- Session Isolation: Individual session management and cleanup
- Secure Communication: End-to-end encrypted conversations
- No Data Retention: Optional session deletion capabilities
├── backend/
│ ├── src/
│ │ ├── api/ # FastAPI routes and endpoints
│ │ ├── core/ # Configuration and database setup
│ │ ├── ml/ # Machine learning components
│ │ │ ├── dataset_loader.py # Amod dataset integration
│ │ │ ├── embeddings.py # Sentence transformer embeddings
│ │ │ ├── topic_modeling.py # BERTopic implementation
│ │ │ ├── pattern_detection.py # ML pattern recognition
│ │ │ └── training_pipeline.py # Complete ML pipeline
│ │ ├── models/ # Database and Pydantic models
│ │ ├── services/ # Business logic services
│ │ └── utils/ # Utility functions
│ ├── setup_database.py # Database initialization
│ ├── train_models.py # ML model training
│ └── requirements.txt # Python dependencies
├── frontend/
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom React hooks
│ │ └── services/ # API communication
│ └── package.json # Node.js dependencies
└── README.md
- Python 3.11+
- Node.js 18+
- PostgreSQL (optional, defaults to SQLite)
- Install Dependencies
cd backend
pip install -r requirements.txt- Environment Configuration
cp .env.example .env
# Edit .env with your settings- Database Setup
python setup_database.py- Train ML Models
# Complete training pipeline
python train_models.py
# Or train specific components
python train_models.py --dataset-only
python train_models.py --embeddings-only
python train_models.py --topic-only- Start Backend Server
python -m src.main- Install Dependencies
cd frontend
npm install- Start Development Server
npm run devThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
The system uses the Amod "mental_health_counseling_conversations" dataset:
- Automatic loading and preprocessing
- Text cleaning and anonymization
- Conversation pair extraction
- Database storage with metadata
Advanced psychological pattern recognition:
- Cognitive Distortions: All-or-nothing thinking, catastrophizing, mind reading
- Defense Mechanisms: Denial, projection, rationalization
- Emotional Patterns: Rumination, emotional suppression, perfectionism
- ML Enhancement: TF-IDF vectorization and similarity matching
- Severity Assessment: Automatic severity and confidence scoring
BERTopic-based conversation clustering:
- UMAP: Dimensionality reduction for better clustering
- HDBSCAN: Hierarchical clustering for topic discovery
- Custom Vectorization: Optimized for mental health terminology
- Topic Prediction: Real-time topic classification for new messages
- Semantic Mapping: Human-readable topic names and descriptions
Sentence transformer-based embeddings:
- Batch Processing: Efficient embedding generation
- Similarity Search: Vector-based conversation retrieval
- Context Matching: Session-aware similarity scoring
- ChromaDB Integration: Persistent vector storage
POST /api/v1/chat- Send message and get therapeutic responseWS /api/v1/ws/{session_id}- WebSocket for real-time chatGET /api/v1/session/{session_id}/history- Get conversation history
GET /api/v1/topics- Available conversation topicsGET /api/v1/stats- Processing statisticsGET /api/v1/health- System health check
DELETE /api/v1/session/{session_id}- Delete session dataPOST /api/v1/admin/reindex- Reindex vector store
python train_models.py# Load and process dataset only
python train_models.py --dataset-only
# Generate embeddings only
python train_models.py --embeddings-only
# Train topic model only
python train_models.py --topic-only
# Update existing models
python train_models.py --update--min-topic-size: Minimum cluster size for topic modeling (default: 10)--skip-dataset: Skip dataset loading--skip-embeddings: Skip embedding generation--skip-topics: Skip topic model training
# Database
DATABASE_URL=sqlite:///./therapy_ai.db
# Models
SENTENCE_TRANSFORMER_MODEL=all-MiniLM-L6-v2
BERTOPIC_MODEL_PATH=./models/bertopic_model
# Vector Database
CHROMA_PERSIST_DIRECTORY=./data/chroma_db
# External APIs
OPENAI_API_KEY=your_openai_key_here
# Processing
MAX_CONTEXT_LENGTH=4000
MAX_RETRIEVED_DOCUMENTS=10VITE_API_URL=http://localhost:8000cd backend
pytest tests/# Format code
black src/
# Type checking
mypy src/- Edit
src/ml/pattern_detection.py - Add pattern definition to
_initialize_patterns() - Include keywords, phrases, and therapeutic approaches
- Retrain models:
python train_models.py --topic-only
- Use PostgreSQL for production database
- Set up proper environment variables
- Configure CORS for your domain
- Set up SSL/TLS certificates
- Use a process manager (PM2, systemd)
# Build and run
docker build -t therapy-assistant .
docker run -p 8000:8000 --env-file .env therapy-assistant- Fork the repository
- Create a feature branch
- Make changes with tests
- Run code quality checks
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset: Amod's "mental_health_counseling_conversations" from Hugging Face
- ML Libraries: BERTopic, sentence-transformers, scikit-learn
- Inspiration: Tranquilo project architecture and ML approaches