Skip to content

onyxdigitaldev/projectgoldrobot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Metal Detector AI

An advanced, flexible machine learning system for classifying metal detector audio signals. The system learns from any labeled training data you provide - no hardcoded assumptions about what the labels represent.

🌟 Key Features

  • 🎯 Completely Flexible: Train on any labels (gold, iron, banana, type_A, etc.) - zero hardcoded assumptions
  • πŸ”Š Advanced Audio Processing: Handles continuous detector audio with intelligent silence-based segmentation
  • ⚑ Time-Invariant: Robust to different sweep speeds (slow/fast over same target give same pattern)
  • 🧠 State-of-the-Art ML: Ensemble of CNN, Transformer, and traditional ML models
  • πŸ“Š Real-time Processing: Live audio streaming with WebSocket communication and visualization
  • 🎡 Multiple Formats: Supports WAV, MP3, and M4A audio files
  • πŸ” Pattern Recognition: Analyzes spectral patterns, temporal dynamics, and harmonic content
  • πŸŽ›οΈ Smart Segmentation: Identifies natural breaks in tone patterns (high-low or low-high) for complete event capture
  • ⚑ Optimized Performance: M1 GPU acceleration with efficient audio processing pipelines

πŸš€ Quick Start

  1. Install Dependencies:

    pip install -r requirements.txt
  2. Set up training data structure:

    # Create directories for your labels
    mkdir -p data/gold data/iron data/other
    
    # Or use the setup script
    python train_model.py --setup-data
  3. Add your labeled audio files to the created directories:

    • data/gold/ - Audio samples of gold detection
    • data/iron/ - Audio samples of iron detection
    • data/other/ - Audio samples of other metals
    • You can create any labels you want!
  4. Train the models (choose one):

    # Option 1: Advanced ensemble (CNN + Transformer + Traditional ML)
    python train_model.py --data-dir data --epochs 10
    
    # Option 2: Deep Learning with Wav2Vec2
    python train_dl.py --data-dir data --epochs 10
    
    # Option 3: Quick baseline model
    python train_baseline.py --data-dir data
  5. Classify new audio:

    python classify.py path/to/audio.wav
    # Also supports: .mp3, .m4a files
    python classify.py recording.mp3
    python classify.py test_audio/gold_test.m4a
  6. Web Interface:

    ./start_web_portal.sh  # or python enhanced_web_app.py
    # Visit http://localhost:5002

πŸ“ Project Structure

metal-detector-ai/
β”œβ”€β”€ data/                    # Training data (flexible labels)
β”‚   β”œβ”€β”€ [your_label_1]/     # Any label you want
β”‚   β”œβ”€β”€ [your_label_2]/     # Another label
β”‚   └── [your_label_3]/     # Yet another label
β”œβ”€β”€ models/advanced/         # Trained ML models
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ audio/              # Audio processing (anomaly detection, features)
β”‚   β”œβ”€β”€ ml/                 # Advanced ML (CNN, Transformer, ensemble)
β”‚   β”œβ”€β”€ data/               # Data pipeline and event detection
β”‚   └── web/                # Web interface templates
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ js/components/      # Modular JavaScript components
β”‚   β”œβ”€β”€ css/                # Stylesheets
β”‚   └── [libraries]/        # Local JavaScript libraries
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ components/         # Reusable template components
β”‚   β”œβ”€β”€ dataset.html        # Advanced dataset browser
β”‚   β”œβ”€β”€ dashboard.html      # Main dashboard
β”‚   β”œβ”€β”€ realtime.html       # Real-time streaming interface
β”‚   └── [other pages]/      # Additional interface pages
β”œβ”€β”€ train_model.py          # Advanced ensemble training
β”œβ”€β”€ train_dl.py             # Deep learning training
β”œβ”€β”€ train_baseline.py       # Quick baseline training
β”œβ”€β”€ classify.py             # Single file classification
β”œβ”€β”€ stream_classify.py      # Real-time classification
β”œβ”€β”€ enhanced_web_app.py     # Web portal (port 5002)
└── requirements.txt        # Dependencies

🎡 How It Works

1. Audio Analysis Philosophy

Unlike traditional approaches that look for silence between sounds, this system understands that metal detectors produce continuous background audio (warbling/humming). Detection events are anomalies or pattern changes in this baseline.

2. Time-Invariant Processing

The system extracts features that are robust to sweep speed variations:

  • βœ… Same pattern detected whether you sweep fast or slow over the same target
  • βœ… Normalized temporal features (relative timing, not absolute duration)
  • βœ… Spectral invariants (frequency relationships don't change with sweep speed)
  • βœ… Envelope shape analysis (pattern morphology preserved)

3. Advanced Feature Extraction (33 Features)

  • Temporal: Onset density, relative peak positions, rise/decay ratios
  • Spectral: Centroid, bandwidth, contrast, rolloff statistics
  • Harmonic: Pitch stability, harmonic ratios, chroma features
  • Wavelet: Multi-scale time-frequency analysis across 4 wavelet types
  • Statistical: Distribution shape, entropy measures, envelope characteristics

4. Ensemble Learning Architecture

Combines multiple advanced models with weighted voting:

  • CNN: 4-layer architecture with attention for spectral pattern recognition
  • Transformer: 6-layer encoder with self-attention for temporal modeling
  • Traditional ML: Random Forest + Gradient Boosting + SVM ensemble
  • Final Prediction: Weighted ensemble (CNN: 40%, Transformer: 40%, Traditional: 20%)

πŸ“Š Flexible Training System

The system automatically discovers labels from your directory structure:

data/
β”œβ”€β”€ gold/           # ← System learns this as "gold"
β”œβ”€β”€ iron/           # ← System learns this as "iron"  
β”œβ”€β”€ copper/         # ← System learns this as "copper"
β”œβ”€β”€ aluminum/       # ← System learns this as "aluminum"
└── banana/         # ← System learns this as "banana" (if you want!)

No hardcoded metal types! Use any labels that make sense for your use case.

Web Interface

Full-featured web portal with real-time streaming, analytics, and training:

  1. Start the portal:

    ./start_web_portal.sh
    # Or directly: python enhanced_web_app.py
  2. Open your browser to http://localhost:5002

  3. Features:

    • πŸ“Š Dashboard: System overview, quick stats, recent activity
    • 🎯 Classification: Upload files or record from microphone
    • 🌊 Waveform Visualization: Interactive audio analysis with click-to-seek
    • πŸ“‘ Real-time Detection: Stream from microphone with live results
    • πŸ“ˆ Analytics: Confusion matrices, accuracy charts, training history
    • πŸ“ Dataset Management: Browse samples with advanced filtering, search, and modal preview
    • πŸ”¬ Advanced Visualizations: Spectrograms with zoom/pan, feature analysis, CSV export
    • πŸŽ›οΈ Modular Components: Reusable audio players, waveform displays, and controls
    • 🧠 Web Training: Configure and train models from browser
    • πŸ”„ API Integration: RESTful endpoints and WebSocket support

🎯 Usage Examples

Dataset Preparation

# Segment audio files to detect individual metal detection events
python segment_dataset.py --input-dir data --output-dir data_segmented

# Segment in-place (adds segments to existing structure)
python segment_dataset.py --in-place

# Prepare dataset with custom splitting
python prepare_dataset.py --split-duration 3.0 --overlap 0.5

Training

# Set up data directories
python train_model.py --setup-data

# Train with your data
python train_model.py --data-dir data --epochs 100

# Train with custom settings
python train_model.py --batch-size 16 --model-dir custom_models

Classification

# Classify a single file (supports .wav, .mp3, .m4a)
python classify.py recording.wav
python classify.py audio.mp3
python classify.py detector.m4a

# Save detailed results
python classify.py recording.wav --output results.json

# Verbose analysis
python classify.py recording.wav --verbose

Real-time Streaming

# Start real-time classification
python stream_classify.py --device 0 --duration 60

# List available audio devices
python stream_classify.py --list-devices

# Save detected patterns
python stream_classify.py --save-detections output_dir

πŸ”¬ Technical Details

Anomaly Detection Algorithm

  • Establishes baseline characteristics from initial 2 seconds of continuous detector audio
  • Uses 10-feature frame analysis (RMS, spectral centroid, bandwidth, rolloff, ZCR, MFCCs)
  • Calculates anomaly scores via Euclidean distance, Mahalanobis-like distance, and max deviation
  • Adaptive thresholding: mean + (sensitivity Γ— std) of baseline scores
  • Filters anomalies by duration (0.1s to 3.0s) to capture realistic detection events

Time-Invariant Feature Engineering

  • Temporal Normalization: All timing features converted to relative scales (0-1)
  • Onset Analysis: Density and distribution patterns independent of absolute time
  • Envelope Characteristics: Rise/decay ratios, peak positions, symmetry measures
  • Spectral Preservation: Frequency relationships maintained across sweep speeds
  • Statistical Invariants: Distribution shapes, entropy measures, harmonic ratios

Model Architecture Details

  • CNN: 32β†’64β†’128β†’256 filters, attention mechanism, adaptive pooling
  • Transformer: 256-dim embeddings, 8 attention heads, positional encoding
  • Traditional: Random Forest (200 trees) + Gradient Boosting (200 est.) + RBF SVM
  • Ensemble Logic: Soft voting with weighted probabilities, confidence analysis

πŸ“Š Model Evaluation

Comprehensive evaluation tools for assessing model performance:

# Evaluate the advanced ensemble model
python evaluate_models.py --data-dir data --model advanced

# Evaluate the baseline model
python evaluate_models.py --data-dir data --model baseline

# Evaluate all models
python evaluate_models.py --data-dir data --model all

The evaluation script generates:

  • πŸ“‹ Classification reports with precision, recall, and F1 scores
  • πŸ”² Confusion matrices
  • πŸ“ˆ ROC curves for each class
  • πŸ“Š Feature importance analysis
  • πŸ”„ Cross-validation results

Results are saved to evaluation_results/ with plots and detailed metrics.

πŸ“ˆ Performance Metrics

The system provides comprehensive evaluation:

  • Individual Model Accuracies: CNN, Transformer, Traditional ML performance
  • Ensemble Performance: Combined weighted voting accuracy with confidence analysis
  • Feature Importance: Rankings of most discriminative features per model
  • Cross-Validation: 5-fold stratified validation with mean Β± std scores
  • Processing Time: Real-time classification speed (typically < 0.5s per sample)
  • Confidence Levels: 5-tier confidence system (Very High β‰₯95%, High β‰₯80%, Medium β‰₯60%, Low β‰₯40%, Very Low <40%)

πŸš€ Performance Optimization

Efficient Audio Processing

The system includes high-performance alternatives using torchaudio:

# Use efficient training with torchaudio pipeline
python train_efficient.py --data-dir data --epochs 50

# Benchmark standard vs efficient loading
python train_efficient.py --benchmark --data-dir data

# Configure system-wide efficiency settings
python config.py --show

Benefits of the efficient pipeline:

  • 3-5x faster audio loading with torchaudio
  • On-the-fly augmentation reduces storage needs
  • Native M1 GPU support via Metal Performance Shaders
  • Streaming data pipeline for large datasets
  • Automatic class balancing for better training

TorchScript Export for Fast Inference

Export trained models to TorchScript for optimized CPU/GPU inference:

# Export models to TorchScript format
python export_torchscript.py --model-dir models/advanced --output-dir models/torchscript

# Use the fast inference script
python models/torchscript/fast_inference.py models/torchscript

Benefits:

  • ⚑ 2-5x faster inference
  • πŸ”§ No Python dependencies required
  • 🎯 Optimized for production deployment
  • πŸ’Ύ Smaller memory footprint

πŸ“Š Dataset Preparation

The prepare_dataset.py tool helps you build high-quality training datasets:

1. Split Long Recordings

# Split a long recording into 5-second segments
python prepare_dataset.py split long_recording.wav --duration 5.0 --overlap 0.5

# Creates:
# - segments/long_recording_segment_0001.wav
# - segments/long_recording_segment_0002.wav
# - segments/long_recording_annotations.json (for labeling)

2. Label Your Data

Edit the generated annotations.json file:

{
  "segments": [
    {
      "filename": "segment_0001.wav",
      "label": "gold",    // Change from "NEEDS_LABELING"
      "confidence": "high",
      "notes": "Clear gold signal"
    }
  ]
}

3. Organize by Labels

# Organize segments into label directories
python prepare_dataset.py organize annotations.json --output-dir data

4. Augment Dataset

# Generate augmented versions (time stretch, pitch shift, noise)
python prepare_dataset.py augment data --factor 3

5. Analyze Dataset

# Get detailed statistics and balance information
python prepare_dataset.py analyze data

6. Create Train/Val Split

# Create 80/20 train/validation split
python prepare_dataset.py split-data data --val-ratio 0.2

πŸ§ͺ Testing & Optimization

Automated Testing

Run the comprehensive test suite:

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test categories
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only
pytest -m "not slow"    # Skip slow tests

Model Parameter Optimization

Use heuristic-based optimization to find the best parameters:

# Run parameter optimization
python test_model_optimization.py --optimize --data-dir data

# Generate optimization report
python test_model_optimization.py --report optimization_report.json

# Run optimization tests
python test_model_optimization.py --test

The optimization tool:

  • Automatic Hyperparameter Tuning: Uses Optuna for Bayesian optimization
  • Heuristic Analysis: Tests data quality, feature importance, model complexity
  • Performance Benchmarking: Measures inference speed and memory usage
  • Recommendations: Provides actionable insights for improvement

Test Coverage

The test suite includes:

  • Unit Tests: Audio processing, feature extraction, model components
  • Integration Tests: End-to-end pipeline, file format handling
  • Performance Tests: Speed benchmarks, memory usage monitoring
  • Optimization Tests: Parameter tuning, heuristic analysis

πŸŽ›οΈ Advanced Configuration

All aspects are highly configurable:

Model Architecture

  • CNN layers, filter sizes, attention mechanisms
  • Transformer heads, encoder layers, embedding dimensions
  • Traditional ML estimators, depth, regularization

Training Parameters

  • Epochs, batch size, learning rates, optimizers
  • Data augmentation, dropout rates, early stopping
  • Cross-validation folds, test split ratios

Audio Processing

  • Sample rate, chunk sizes, buffer durations
  • Anomaly sensitivity, baseline duration, thresholds
  • Frame analysis parameters, feature extraction settings

Real-time Streaming

  • Classification intervals, confidence thresholds
  • Visualization refresh rates, detection logging
  • Audio device selection, buffer management

πŸ’‘ Best Practices for Optimal Results

Training Data Quality

  1. Diverse Scenarios: Various distances, detector settings, environments
  2. Sweep Speed Variety: Mix of slow, medium, and fast sweep speeds
  3. Clean Signal: Minimize wind, background noise, handling sounds
  4. Balanced Dataset: Similar amounts of data per label (20-50 samples minimum)
  5. Realistic Conditions: Include typical field recording conditions

Audio Recording Tips

  • Consistent Setup: Same detector settings within each label category
  • Multiple Angles: Different coil orientations relative to targets
  • Distance Variation: Close, medium, and far detection distances
  • Duration: 10-30 second clips work well (system extracts patterns automatically)
  • Format: WAV preferred for training, MP3/M4A acceptable for classification

Model Training Strategy

  • Start Small: Begin with 2-3 labels, expand gradually
  • Iterative Improvement: Train β†’ test β†’ add data β†’ retrain
  • Cross-Validation: Monitor for overfitting with validation scores
  • Feature Analysis: Use feature importance to understand what the model learns
  • Ensemble Trust: Higher confidence when all models agree on prediction

πŸ“‹ Project Status

βœ… Completed Features

  • Core System

    • Advanced audio processing with anomaly detection
    • Time-invariant feature extraction
    • Multiple ML architectures (CNN, Transformer, Wav2Vec2, Random Forest)
    • Flexible label system (no hardcoded metal types)
    • Multi-format support (WAV, MP3, M4A)
  • Training Tools

    • Three training pipelines (baseline, deep learning, ensemble)
    • Dataset preparation and augmentation
    • Model evaluation with detailed metrics
    • TorchScript export for fast inference
  • User Interfaces

    • Command-line classification
    • Real-time streaming with visualization
    • Web interface with drag-and-drop
    • JSON API endpoints
  • Documentation

    • Comprehensive README
    • Code comments and docstrings
    • Usage examples
    • Best practices guide

πŸ”§ Recent Improvements (Latest Updates)

Advanced Dataset Interface (Phase 4 - December 2025)

  • Enhanced Sample Browser: Interactive grid with search, filtering, and sorting capabilities
  • Modal Preview System: Detailed sample inspection with tabbed interface for Info/Events, Spectrograms, and Features
  • Interactive Spectrograms: Canvas-based visualization with zoom, pan, and multiple colormaps
  • Feature Visualization: Chart.js displays for temporal, spectral, and MFCC features with radar chart comparisons
  • Advanced Audio Controls: Click-to-seek waveforms, playback speed control, smart time formatting
  • Modular Architecture: Reusable components for audio players, waveform displays, and controls

Code Quality Improvements (December 2025)

  • DRY Compliance: Removed 350+ lines of duplicate HTML template code
  • Dead Code Removal: Eliminated 5 debug routes and 3 unused template files
  • Enhanced Documentation: Added comprehensive docstrings and inline comments
  • Performance Optimization: Cleaned up debug logs and unnecessary console output
  • Modular Components: Extracted reusable JavaScript components for better maintainability

Enhanced Audio Segmentation

  • Smart Event Detection: Replaced fixed-length segmentation with intelligent silence-based detection
  • Pattern-Aware: Identifies natural breaks in tone patterns (high-low or low-high transitions)
  • Variable Length Support: Segments are now properly sized based on actual audio content
  • Complete Event Capture: Preserves full metal detection events instead of arbitrary cuts

Real-time Processing Stability

  • WebSocket Reliability: Fixed connection errors with improved threading architecture
  • Session Management: Better handling of multiple client connections and disconnections
  • Error Prevention: Added safeguards against duplicate events and session conflicts
  • Canvas Optimization: Improved rendering performance with willReadFrequently attribute

Model Training Improvements

  • M1 GPU Acceleration: Proper tensor device management for Apple Silicon
  • Efficient Processing: Enhanced torchaudio-based pipeline for better performance
  • Robust Error Handling: Comprehensive error handling throughout training pipeline
  • Memory Optimization: Improved memory usage during large dataset processing

πŸš€ Production Ready

The system is fully operational with all major components completed. Phase 4 (Advanced Visualizations) is complete, and the codebase has been thoroughly cleaned and optimized. Ready for Phase 5 performance and polish improvements.

πŸ”§ Troubleshooting

Common Issues and Solutions

Real-time Classification Not Working

  • Issue: "No active stream" errors or no detections showing
  • Solution: Ensure you click "Start Streaming" before making sounds, and check browser console for errors
  • Note: Audio is processed in 2-second chunks, so wait a few seconds for first detection

WebSocket Connection Errors

  • Issue: "Invalid frame header" or connection failures
  • Solution: The system now uses threading mode for better compatibility - restart the server if issues persist
  • Prevention: Avoid multiple browser tabs with real-time streaming open simultaneously

Training Data Issues

  • Issue: "No training data found" during model training
  • Solution: Ensure audio files are in correct directories (data/gold/, data/iron/, etc.) and are valid audio formats
  • Check: Run python train_model.py --data-dir data --verbose to see detailed processing logs

Segmentation Problems

  • Issue: Segments are too short or don't capture complete events
  • Solution: The new silence-based segmentation automatically detects natural breaks - ensure your audio has clear quiet periods between detections
  • Tip: Longer audio files with multiple clear detection events work best

Performance Issues

  • Issue: Slow processing or high memory usage
  • Solution: The system now uses M1 GPU acceleration and efficient processing - ensure you have sufficient RAM (8GB+ recommended)
  • Optimization: Use shorter audio files for training if memory is limited

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is open source and available under the MIT License.

Repository

https://github.com/onyxdigitaldev/metal-detector-ai

Contributors

  • Nikko Vellios - Designer & Primary Developer
  • oskodiak / Onyx Digital Intelligence Development - Support Development

Documentation

  • API Reference - Detailed docs for all endpoints and WebSocket events
  • SCHEDULE.md - Development schedule and phase tracking

About

Nikko collab

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors