An advanced, flexible machine learning system for classifying metal detector audio signals. The system learns from any labeled training data you provide - no hardcoded assumptions about what the labels represent.
- π― Completely Flexible: Train on any labels (gold, iron, banana, type_A, etc.) - zero hardcoded assumptions
- π Advanced Audio Processing: Handles continuous detector audio with intelligent silence-based segmentation
- β‘ Time-Invariant: Robust to different sweep speeds (slow/fast over same target give same pattern)
- π§ State-of-the-Art ML: Ensemble of CNN, Transformer, and traditional ML models
- π Real-time Processing: Live audio streaming with WebSocket communication and visualization
- π΅ Multiple Formats: Supports WAV, MP3, and M4A audio files
- π Pattern Recognition: Analyzes spectral patterns, temporal dynamics, and harmonic content
- ποΈ Smart Segmentation: Identifies natural breaks in tone patterns (high-low or low-high) for complete event capture
- β‘ Optimized Performance: M1 GPU acceleration with efficient audio processing pipelines
-
Install Dependencies:
pip install -r requirements.txt
-
Set up training data structure:
# Create directories for your labels mkdir -p data/gold data/iron data/other # Or use the setup script python train_model.py --setup-data
-
Add your labeled audio files to the created directories:
data/gold/- Audio samples of gold detectiondata/iron/- Audio samples of iron detectiondata/other/- Audio samples of other metals- You can create any labels you want!
-
Train the models (choose one):
# Option 1: Advanced ensemble (CNN + Transformer + Traditional ML) python train_model.py --data-dir data --epochs 10 # Option 2: Deep Learning with Wav2Vec2 python train_dl.py --data-dir data --epochs 10 # Option 3: Quick baseline model python train_baseline.py --data-dir data
-
Classify new audio:
python classify.py path/to/audio.wav # Also supports: .mp3, .m4a files python classify.py recording.mp3 python classify.py test_audio/gold_test.m4a -
Web Interface:
./start_web_portal.sh # or python enhanced_web_app.py # Visit http://localhost:5002
metal-detector-ai/
βββ data/ # Training data (flexible labels)
β βββ [your_label_1]/ # Any label you want
β βββ [your_label_2]/ # Another label
β βββ [your_label_3]/ # Yet another label
βββ models/advanced/ # Trained ML models
βββ src/
β βββ audio/ # Audio processing (anomaly detection, features)
β βββ ml/ # Advanced ML (CNN, Transformer, ensemble)
β βββ data/ # Data pipeline and event detection
β βββ web/ # Web interface templates
βββ static/
β βββ js/components/ # Modular JavaScript components
β βββ css/ # Stylesheets
β βββ [libraries]/ # Local JavaScript libraries
βββ templates/
β βββ components/ # Reusable template components
β βββ dataset.html # Advanced dataset browser
β βββ dashboard.html # Main dashboard
β βββ realtime.html # Real-time streaming interface
β βββ [other pages]/ # Additional interface pages
βββ train_model.py # Advanced ensemble training
βββ train_dl.py # Deep learning training
βββ train_baseline.py # Quick baseline training
βββ classify.py # Single file classification
βββ stream_classify.py # Real-time classification
βββ enhanced_web_app.py # Web portal (port 5002)
βββ requirements.txt # Dependencies
Unlike traditional approaches that look for silence between sounds, this system understands that metal detectors produce continuous background audio (warbling/humming). Detection events are anomalies or pattern changes in this baseline.
The system extracts features that are robust to sweep speed variations:
- β Same pattern detected whether you sweep fast or slow over the same target
- β Normalized temporal features (relative timing, not absolute duration)
- β Spectral invariants (frequency relationships don't change with sweep speed)
- β Envelope shape analysis (pattern morphology preserved)
- Temporal: Onset density, relative peak positions, rise/decay ratios
- Spectral: Centroid, bandwidth, contrast, rolloff statistics
- Harmonic: Pitch stability, harmonic ratios, chroma features
- Wavelet: Multi-scale time-frequency analysis across 4 wavelet types
- Statistical: Distribution shape, entropy measures, envelope characteristics
Combines multiple advanced models with weighted voting:
- CNN: 4-layer architecture with attention for spectral pattern recognition
- Transformer: 6-layer encoder with self-attention for temporal modeling
- Traditional ML: Random Forest + Gradient Boosting + SVM ensemble
- Final Prediction: Weighted ensemble (CNN: 40%, Transformer: 40%, Traditional: 20%)
The system automatically discovers labels from your directory structure:
data/
βββ gold/ # β System learns this as "gold"
βββ iron/ # β System learns this as "iron"
βββ copper/ # β System learns this as "copper"
βββ aluminum/ # β System learns this as "aluminum"
βββ banana/ # β System learns this as "banana" (if you want!)No hardcoded metal types! Use any labels that make sense for your use case.
Full-featured web portal with real-time streaming, analytics, and training:
-
Start the portal:
./start_web_portal.sh # Or directly: python enhanced_web_app.py -
Open your browser to http://localhost:5002
-
Features:
- π Dashboard: System overview, quick stats, recent activity
- π― Classification: Upload files or record from microphone
- π Waveform Visualization: Interactive audio analysis with click-to-seek
- π‘ Real-time Detection: Stream from microphone with live results
- π Analytics: Confusion matrices, accuracy charts, training history
- π Dataset Management: Browse samples with advanced filtering, search, and modal preview
- π¬ Advanced Visualizations: Spectrograms with zoom/pan, feature analysis, CSV export
- ποΈ Modular Components: Reusable audio players, waveform displays, and controls
- π§ Web Training: Configure and train models from browser
- π API Integration: RESTful endpoints and WebSocket support
# Segment audio files to detect individual metal detection events
python segment_dataset.py --input-dir data --output-dir data_segmented
# Segment in-place (adds segments to existing structure)
python segment_dataset.py --in-place
# Prepare dataset with custom splitting
python prepare_dataset.py --split-duration 3.0 --overlap 0.5# Set up data directories
python train_model.py --setup-data
# Train with your data
python train_model.py --data-dir data --epochs 100
# Train with custom settings
python train_model.py --batch-size 16 --model-dir custom_models# Classify a single file (supports .wav, .mp3, .m4a)
python classify.py recording.wav
python classify.py audio.mp3
python classify.py detector.m4a
# Save detailed results
python classify.py recording.wav --output results.json
# Verbose analysis
python classify.py recording.wav --verbose# Start real-time classification
python stream_classify.py --device 0 --duration 60
# List available audio devices
python stream_classify.py --list-devices
# Save detected patterns
python stream_classify.py --save-detections output_dir- Establishes baseline characteristics from initial 2 seconds of continuous detector audio
- Uses 10-feature frame analysis (RMS, spectral centroid, bandwidth, rolloff, ZCR, MFCCs)
- Calculates anomaly scores via Euclidean distance, Mahalanobis-like distance, and max deviation
- Adaptive thresholding: mean + (sensitivity Γ std) of baseline scores
- Filters anomalies by duration (0.1s to 3.0s) to capture realistic detection events
- Temporal Normalization: All timing features converted to relative scales (0-1)
- Onset Analysis: Density and distribution patterns independent of absolute time
- Envelope Characteristics: Rise/decay ratios, peak positions, symmetry measures
- Spectral Preservation: Frequency relationships maintained across sweep speeds
- Statistical Invariants: Distribution shapes, entropy measures, harmonic ratios
- CNN: 32β64β128β256 filters, attention mechanism, adaptive pooling
- Transformer: 256-dim embeddings, 8 attention heads, positional encoding
- Traditional: Random Forest (200 trees) + Gradient Boosting (200 est.) + RBF SVM
- Ensemble Logic: Soft voting with weighted probabilities, confidence analysis
Comprehensive evaluation tools for assessing model performance:
# Evaluate the advanced ensemble model
python evaluate_models.py --data-dir data --model advanced
# Evaluate the baseline model
python evaluate_models.py --data-dir data --model baseline
# Evaluate all models
python evaluate_models.py --data-dir data --model allThe evaluation script generates:
- π Classification reports with precision, recall, and F1 scores
- π² Confusion matrices
- π ROC curves for each class
- π Feature importance analysis
- π Cross-validation results
Results are saved to evaluation_results/ with plots and detailed metrics.
The system provides comprehensive evaluation:
- Individual Model Accuracies: CNN, Transformer, Traditional ML performance
- Ensemble Performance: Combined weighted voting accuracy with confidence analysis
- Feature Importance: Rankings of most discriminative features per model
- Cross-Validation: 5-fold stratified validation with mean Β± std scores
- Processing Time: Real-time classification speed (typically < 0.5s per sample)
- Confidence Levels: 5-tier confidence system (Very High β₯95%, High β₯80%, Medium β₯60%, Low β₯40%, Very Low <40%)
The system includes high-performance alternatives using torchaudio:
# Use efficient training with torchaudio pipeline
python train_efficient.py --data-dir data --epochs 50
# Benchmark standard vs efficient loading
python train_efficient.py --benchmark --data-dir data
# Configure system-wide efficiency settings
python config.py --showBenefits of the efficient pipeline:
- 3-5x faster audio loading with torchaudio
- On-the-fly augmentation reduces storage needs
- Native M1 GPU support via Metal Performance Shaders
- Streaming data pipeline for large datasets
- Automatic class balancing for better training
Export trained models to TorchScript for optimized CPU/GPU inference:
# Export models to TorchScript format
python export_torchscript.py --model-dir models/advanced --output-dir models/torchscript
# Use the fast inference script
python models/torchscript/fast_inference.py models/torchscriptBenefits:
- β‘ 2-5x faster inference
- π§ No Python dependencies required
- π― Optimized for production deployment
- πΎ Smaller memory footprint
The prepare_dataset.py tool helps you build high-quality training datasets:
# Split a long recording into 5-second segments
python prepare_dataset.py split long_recording.wav --duration 5.0 --overlap 0.5
# Creates:
# - segments/long_recording_segment_0001.wav
# - segments/long_recording_segment_0002.wav
# - segments/long_recording_annotations.json (for labeling)Edit the generated annotations.json file:
{
"segments": [
{
"filename": "segment_0001.wav",
"label": "gold", // Change from "NEEDS_LABELING"
"confidence": "high",
"notes": "Clear gold signal"
}
]
}# Organize segments into label directories
python prepare_dataset.py organize annotations.json --output-dir data# Generate augmented versions (time stretch, pitch shift, noise)
python prepare_dataset.py augment data --factor 3# Get detailed statistics and balance information
python prepare_dataset.py analyze data# Create 80/20 train/validation split
python prepare_dataset.py split-data data --val-ratio 0.2Run the comprehensive test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m "not slow" # Skip slow testsUse heuristic-based optimization to find the best parameters:
# Run parameter optimization
python test_model_optimization.py --optimize --data-dir data
# Generate optimization report
python test_model_optimization.py --report optimization_report.json
# Run optimization tests
python test_model_optimization.py --testThe optimization tool:
- Automatic Hyperparameter Tuning: Uses Optuna for Bayesian optimization
- Heuristic Analysis: Tests data quality, feature importance, model complexity
- Performance Benchmarking: Measures inference speed and memory usage
- Recommendations: Provides actionable insights for improvement
The test suite includes:
- Unit Tests: Audio processing, feature extraction, model components
- Integration Tests: End-to-end pipeline, file format handling
- Performance Tests: Speed benchmarks, memory usage monitoring
- Optimization Tests: Parameter tuning, heuristic analysis
All aspects are highly configurable:
- CNN layers, filter sizes, attention mechanisms
- Transformer heads, encoder layers, embedding dimensions
- Traditional ML estimators, depth, regularization
- Epochs, batch size, learning rates, optimizers
- Data augmentation, dropout rates, early stopping
- Cross-validation folds, test split ratios
- Sample rate, chunk sizes, buffer durations
- Anomaly sensitivity, baseline duration, thresholds
- Frame analysis parameters, feature extraction settings
- Classification intervals, confidence thresholds
- Visualization refresh rates, detection logging
- Audio device selection, buffer management
- Diverse Scenarios: Various distances, detector settings, environments
- Sweep Speed Variety: Mix of slow, medium, and fast sweep speeds
- Clean Signal: Minimize wind, background noise, handling sounds
- Balanced Dataset: Similar amounts of data per label (20-50 samples minimum)
- Realistic Conditions: Include typical field recording conditions
- Consistent Setup: Same detector settings within each label category
- Multiple Angles: Different coil orientations relative to targets
- Distance Variation: Close, medium, and far detection distances
- Duration: 10-30 second clips work well (system extracts patterns automatically)
- Format: WAV preferred for training, MP3/M4A acceptable for classification
- Start Small: Begin with 2-3 labels, expand gradually
- Iterative Improvement: Train β test β add data β retrain
- Cross-Validation: Monitor for overfitting with validation scores
- Feature Analysis: Use feature importance to understand what the model learns
- Ensemble Trust: Higher confidence when all models agree on prediction
-
Core System
- Advanced audio processing with anomaly detection
- Time-invariant feature extraction
- Multiple ML architectures (CNN, Transformer, Wav2Vec2, Random Forest)
- Flexible label system (no hardcoded metal types)
- Multi-format support (WAV, MP3, M4A)
-
Training Tools
- Three training pipelines (baseline, deep learning, ensemble)
- Dataset preparation and augmentation
- Model evaluation with detailed metrics
- TorchScript export for fast inference
-
User Interfaces
- Command-line classification
- Real-time streaming with visualization
- Web interface with drag-and-drop
- JSON API endpoints
-
Documentation
- Comprehensive README
- Code comments and docstrings
- Usage examples
- Best practices guide
- Enhanced Sample Browser: Interactive grid with search, filtering, and sorting capabilities
- Modal Preview System: Detailed sample inspection with tabbed interface for Info/Events, Spectrograms, and Features
- Interactive Spectrograms: Canvas-based visualization with zoom, pan, and multiple colormaps
- Feature Visualization: Chart.js displays for temporal, spectral, and MFCC features with radar chart comparisons
- Advanced Audio Controls: Click-to-seek waveforms, playback speed control, smart time formatting
- Modular Architecture: Reusable components for audio players, waveform displays, and controls
- DRY Compliance: Removed 350+ lines of duplicate HTML template code
- Dead Code Removal: Eliminated 5 debug routes and 3 unused template files
- Enhanced Documentation: Added comprehensive docstrings and inline comments
- Performance Optimization: Cleaned up debug logs and unnecessary console output
- Modular Components: Extracted reusable JavaScript components for better maintainability
- Smart Event Detection: Replaced fixed-length segmentation with intelligent silence-based detection
- Pattern-Aware: Identifies natural breaks in tone patterns (high-low or low-high transitions)
- Variable Length Support: Segments are now properly sized based on actual audio content
- Complete Event Capture: Preserves full metal detection events instead of arbitrary cuts
- WebSocket Reliability: Fixed connection errors with improved threading architecture
- Session Management: Better handling of multiple client connections and disconnections
- Error Prevention: Added safeguards against duplicate events and session conflicts
- Canvas Optimization: Improved rendering performance with
willReadFrequentlyattribute
- M1 GPU Acceleration: Proper tensor device management for Apple Silicon
- Efficient Processing: Enhanced
torchaudio-based pipeline for better performance - Robust Error Handling: Comprehensive error handling throughout training pipeline
- Memory Optimization: Improved memory usage during large dataset processing
The system is fully operational with all major components completed. Phase 4 (Advanced Visualizations) is complete, and the codebase has been thoroughly cleaned and optimized. Ready for Phase 5 performance and polish improvements.
- Issue: "No active stream" errors or no detections showing
- Solution: Ensure you click "Start Streaming" before making sounds, and check browser console for errors
- Note: Audio is processed in 2-second chunks, so wait a few seconds for first detection
- Issue: "Invalid frame header" or connection failures
- Solution: The system now uses threading mode for better compatibility - restart the server if issues persist
- Prevention: Avoid multiple browser tabs with real-time streaming open simultaneously
- Issue: "No training data found" during model training
- Solution: Ensure audio files are in correct directories (
data/gold/,data/iron/, etc.) and are valid audio formats - Check: Run
python train_model.py --data-dir data --verboseto see detailed processing logs
- Issue: Segments are too short or don't capture complete events
- Solution: The new silence-based segmentation automatically detects natural breaks - ensure your audio has clear quiet periods between detections
- Tip: Longer audio files with multiple clear detection events work best
- Issue: Slow processing or high memory usage
- Solution: The system now uses M1 GPU acceleration and efficient processing - ensure you have sufficient RAM (8GB+ recommended)
- Optimization: Use shorter audio files for training if memory is limited
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
https://github.com/onyxdigitaldev/metal-detector-ai
- Nikko Vellios - Designer & Primary Developer
- oskodiak / Onyx Digital Intelligence Development - Support Development
- API Reference - Detailed docs for all endpoints and WebSocket events
- SCHEDULE.md - Development schedule and phase tracking