Metal Detector AI

An advanced, flexible machine learning system for classifying metal detector audio signals. The system learns from any labeled training data you provide - no hardcoded assumptions about what the labels represent.

🌟 Key Features

🎯 Completely Flexible: Train on any labels (gold, iron, banana, type_A, etc.) - zero hardcoded assumptions
🔊 Advanced Audio Processing: Handles continuous detector audio with intelligent silence-based segmentation
⚡ Time-Invariant: Robust to different sweep speeds (slow/fast over same target give same pattern)
🧠 State-of-the-Art ML: Ensemble of CNN, Transformer, and traditional ML models
📊 Real-time Processing: Live audio streaming with WebSocket communication and visualization
🎵 Multiple Formats: Supports WAV, MP3, and M4A audio files
🔍 Pattern Recognition: Analyzes spectral patterns, temporal dynamics, and harmonic content
🎛️ Smart Segmentation: Identifies natural breaks in tone patterns (high-low or low-high) for complete event capture
⚡ Optimized Performance: M1 GPU acceleration with efficient audio processing pipelines

🚀 Quick Start

Install Dependencies:
```
pip install -r requirements.txt
```

Set up training data structure:

# Create directories for your labels
mkdir -p data/gold data/iron data/other

# Or use the setup script
python train_model.py --setup-data

Add your labeled audio files to the created directories:
- data/gold/ - Audio samples of gold detection
- data/iron/ - Audio samples of iron detection
- data/other/ - Audio samples of other metals
- You can create any labels you want!

Train the models (choose one):

# Option 1: Advanced ensemble (CNN + Transformer + Traditional ML)
python train_model.py --data-dir data --epochs 10

# Option 2: Deep Learning with Wav2Vec2
python train_dl.py --data-dir data --epochs 10

# Option 3: Quick baseline model
python train_baseline.py --data-dir data

Classify new audio:

python classify.py path/to/audio.wav
# Also supports: .mp3, .m4a files
python classify.py recording.mp3
python classify.py test_audio/gold_test.m4a

Web Interface:

./start_web_portal.sh  # or python enhanced_web_app.py
# Visit http://localhost:5002

📁 Project Structure

metal-detector-ai/
├── data/                    # Training data (flexible labels)
│   ├── [your_label_1]/     # Any label you want
│   ├── [your_label_2]/     # Another label
│   └── [your_label_3]/     # Yet another label
├── models/advanced/         # Trained ML models
├── src/
│   ├── audio/              # Audio processing (anomaly detection, features)
│   ├── ml/                 # Advanced ML (CNN, Transformer, ensemble)
│   ├── data/               # Data pipeline and event detection
│   └── web/                # Web interface templates
├── static/
│   ├── js/components/      # Modular JavaScript components
│   ├── css/                # Stylesheets
│   └── [libraries]/        # Local JavaScript libraries
├── templates/
│   ├── components/         # Reusable template components
│   ├── dataset.html        # Advanced dataset browser
│   ├── dashboard.html      # Main dashboard
│   ├── realtime.html       # Real-time streaming interface
│   └── [other pages]/      # Additional interface pages
├── train_model.py          # Advanced ensemble training
├── train_dl.py             # Deep learning training
├── train_baseline.py       # Quick baseline training
├── classify.py             # Single file classification
├── stream_classify.py      # Real-time classification
├── enhanced_web_app.py     # Web portal (port 5002)
└── requirements.txt        # Dependencies

🎵 How It Works

1. Audio Analysis Philosophy

Unlike traditional approaches that look for silence between sounds, this system understands that metal detectors produce continuous background audio (warbling/humming). Detection events are anomalies or pattern changes in this baseline.

2. Time-Invariant Processing

The system extracts features that are robust to sweep speed variations:

✅ Same pattern detected whether you sweep fast or slow over the same target
✅ Normalized temporal features (relative timing, not absolute duration)
✅ Spectral invariants (frequency relationships don't change with sweep speed)
✅ Envelope shape analysis (pattern morphology preserved)

3. Advanced Feature Extraction (33 Features)

Temporal: Onset density, relative peak positions, rise/decay ratios
Spectral: Centroid, bandwidth, contrast, rolloff statistics
Harmonic: Pitch stability, harmonic ratios, chroma features
Wavelet: Multi-scale time-frequency analysis across 4 wavelet types
Statistical: Distribution shape, entropy measures, envelope characteristics

4. Ensemble Learning Architecture

Combines multiple advanced models with weighted voting:

CNN: 4-layer architecture with attention for spectral pattern recognition
Transformer: 6-layer encoder with self-attention for temporal modeling
Traditional ML: Random Forest + Gradient Boosting + SVM ensemble
Final Prediction: Weighted ensemble (CNN: 40%, Transformer: 40%, Traditional: 20%)

📊 Flexible Training System

The system automatically discovers labels from your directory structure:

data/
├── gold/           # ← System learns this as "gold"
├── iron/           # ← System learns this as "iron"  
├── copper/         # ← System learns this as "copper"
├── aluminum/       # ← System learns this as "aluminum"
└── banana/         # ← System learns this as "banana" (if you want!)

No hardcoded metal types! Use any labels that make sense for your use case.

Web Interface

Full-featured web portal with real-time streaming, analytics, and training:

Start the portal:

./start_web_portal.sh
# Or directly: python enhanced_web_app.py

Open your browser to http://localhost:5002
Features:
- 📊 Dashboard: System overview, quick stats, recent activity
- 🎯 Classification: Upload files or record from microphone
- 🌊 Waveform Visualization: Interactive audio analysis with click-to-seek
- 📡 Real-time Detection: Stream from microphone with live results
- 📈 Analytics: Confusion matrices, accuracy charts, training history
- 📁 Dataset Management: Browse samples with advanced filtering, search, and modal preview
- 🔬 Advanced Visualizations: Spectrograms with zoom/pan, feature analysis, CSV export
- 🎛️ Modular Components: Reusable audio players, waveform displays, and controls
- 🧠 Web Training: Configure and train models from browser
- 🔄 API Integration: RESTful endpoints and WebSocket support

🎯 Usage Examples

Dataset Preparation

# Segment audio files to detect individual metal detection events
python segment_dataset.py --input-dir data --output-dir data_segmented

# Segment in-place (adds segments to existing structure)
python segment_dataset.py --in-place

# Prepare dataset with custom splitting
python prepare_dataset.py --split-duration 3.0 --overlap 0.5

Training

# Set up data directories
python train_model.py --setup-data

# Train with your data
python train_model.py --data-dir data --epochs 100

# Train with custom settings
python train_model.py --batch-size 16 --model-dir custom_models

Classification

# Classify a single file (supports .wav, .mp3, .m4a)
python classify.py recording.wav
python classify.py audio.mp3
python classify.py detector.m4a

# Save detailed results
python classify.py recording.wav --output results.json

# Verbose analysis
python classify.py recording.wav --verbose

Real-time Streaming

# Start real-time classification
python stream_classify.py --device 0 --duration 60

# List available audio devices
python stream_classify.py --list-devices

# Save detected patterns
python stream_classify.py --save-detections output_dir

🔬 Technical Details

Anomaly Detection Algorithm

Establishes baseline characteristics from initial 2 seconds of continuous detector audio
Uses 10-feature frame analysis (RMS, spectral centroid, bandwidth, rolloff, ZCR, MFCCs)
Calculates anomaly scores via Euclidean distance, Mahalanobis-like distance, and max deviation
Adaptive thresholding: mean + (sensitivity × std) of baseline scores
Filters anomalies by duration (0.1s to 3.0s) to capture realistic detection events

Time-Invariant Feature Engineering

Temporal Normalization: All timing features converted to relative scales (0-1)
Onset Analysis: Density and distribution patterns independent of absolute time
Envelope Characteristics: Rise/decay ratios, peak positions, symmetry measures
Spectral Preservation: Frequency relationships maintained across sweep speeds
Statistical Invariants: Distribution shapes, entropy measures, harmonic ratios

Model Architecture Details

CNN: 32→64→128→256 filters, attention mechanism, adaptive pooling
Transformer: 256-dim embeddings, 8 attention heads, positional encoding
Traditional: Random Forest (200 trees) + Gradient Boosting (200 est.) + RBF SVM
Ensemble Logic: Soft voting with weighted probabilities, confidence analysis

📊 Model Evaluation

Comprehensive evaluation tools for assessing model performance:

# Evaluate the advanced ensemble model
python evaluate_models.py --data-dir data --model advanced

# Evaluate the baseline model
python evaluate_models.py --data-dir data --model baseline

# Evaluate all models
python evaluate_models.py --data-dir data --model all

The evaluation script generates:

📋 Classification reports with precision, recall, and F1 scores
🔲 Confusion matrices
📈 ROC curves for each class
📊 Feature importance analysis
🔄 Cross-validation results

Results are saved to evaluation_results/ with plots and detailed metrics.

📈 Performance Metrics

The system provides comprehensive evaluation:

Individual Model Accuracies: CNN, Transformer, Traditional ML performance
Ensemble Performance: Combined weighted voting accuracy with confidence analysis
Feature Importance: Rankings of most discriminative features per model
Cross-Validation: 5-fold stratified validation with mean ± std scores
Processing Time: Real-time classification speed (typically < 0.5s per sample)
Confidence Levels: 5-tier confidence system (Very High ≥95%, High ≥80%, Medium ≥60%, Low ≥40%, Very Low <40%)

🚀 Performance Optimization

Efficient Audio Processing

The system includes high-performance alternatives using torchaudio:

# Use efficient training with torchaudio pipeline
python train_efficient.py --data-dir data --epochs 50

# Benchmark standard vs efficient loading
python train_efficient.py --benchmark --data-dir data

# Configure system-wide efficiency settings
python config.py --show

Benefits of the efficient pipeline:

3-5x faster audio loading with torchaudio
On-the-fly augmentation reduces storage needs
Native M1 GPU support via Metal Performance Shaders
Streaming data pipeline for large datasets
Automatic class balancing for better training

TorchScript Export for Fast Inference

Export trained models to TorchScript for optimized CPU/GPU inference:

# Export models to TorchScript format
python export_torchscript.py --model-dir models/advanced --output-dir models/torchscript

# Use the fast inference script
python models/torchscript/fast_inference.py models/torchscript

Benefits:

⚡ 2-5x faster inference
🔧 No Python dependencies required
🎯 Optimized for production deployment
💾 Smaller memory footprint

📊 Dataset Preparation

The prepare_dataset.py tool helps you build high-quality training datasets:

1. Split Long Recordings

# Split a long recording into 5-second segments
python prepare_dataset.py split long_recording.wav --duration 5.0 --overlap 0.5

# Creates:
# - segments/long_recording_segment_0001.wav
# - segments/long_recording_segment_0002.wav
# - segments/long_recording_annotations.json (for labeling)

2. Label Your Data

Edit the generated annotations.json file:

{
  "segments": [
    {
      "filename": "segment_0001.wav",
      "label": "gold",    // Change from "NEEDS_LABELING"
      "confidence": "high",
      "notes": "Clear gold signal"
    }
  ]
}

3. Organize by Labels

# Organize segments into label directories
python prepare_dataset.py organize annotations.json --output-dir data

4. Augment Dataset

# Generate augmented versions (time stretch, pitch shift, noise)
python prepare_dataset.py augment data --factor 3

5. Analyze Dataset

# Get detailed statistics and balance information
python prepare_dataset.py analyze data

6. Create Train/Val Split

# Create 80/20 train/validation split
python prepare_dataset.py split-data data --val-ratio 0.2

🧪 Testing & Optimization

Automated Testing

Run the comprehensive test suite:

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test categories
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only
pytest -m "not slow"    # Skip slow tests

Model Parameter Optimization

Use heuristic-based optimization to find the best parameters:

# Run parameter optimization
python test_model_optimization.py --optimize --data-dir data

# Generate optimization report
python test_model_optimization.py --report optimization_report.json

# Run optimization tests
python test_model_optimization.py --test

The optimization tool:

Automatic Hyperparameter Tuning: Uses Optuna for Bayesian optimization
Heuristic Analysis: Tests data quality, feature importance, model complexity
Performance Benchmarking: Measures inference speed and memory usage
Recommendations: Provides actionable insights for improvement

Test Coverage

The test suite includes:

Unit Tests: Audio processing, feature extraction, model components
Integration Tests: End-to-end pipeline, file format handling
Performance Tests: Speed benchmarks, memory usage monitoring
Optimization Tests: Parameter tuning, heuristic analysis

🎛️ Advanced Configuration

All aspects are highly configurable:

Model Architecture

CNN layers, filter sizes, attention mechanisms
Transformer heads, encoder layers, embedding dimensions
Traditional ML estimators, depth, regularization

Training Parameters

Epochs, batch size, learning rates, optimizers
Data augmentation, dropout rates, early stopping
Cross-validation folds, test split ratios

Audio Processing

Sample rate, chunk sizes, buffer durations
Anomaly sensitivity, baseline duration, thresholds
Frame analysis parameters, feature extraction settings

Real-time Streaming

Classification intervals, confidence thresholds
Visualization refresh rates, detection logging
Audio device selection, buffer management

💡 Best Practices for Optimal Results

Training Data Quality

Diverse Scenarios: Various distances, detector settings, environments
Sweep Speed Variety: Mix of slow, medium, and fast sweep speeds
Clean Signal: Minimize wind, background noise, handling sounds
Balanced Dataset: Similar amounts of data per label (20-50 samples minimum)
Realistic Conditions: Include typical field recording conditions

Audio Recording Tips

Consistent Setup: Same detector settings within each label category
Multiple Angles: Different coil orientations relative to targets
Distance Variation: Close, medium, and far detection distances
Duration: 10-30 second clips work well (system extracts patterns automatically)
Format: WAV preferred for training, MP3/M4A acceptable for classification

Model Training Strategy

Start Small: Begin with 2-3 labels, expand gradually
Iterative Improvement: Train → test → add data → retrain
Cross-Validation: Monitor for overfitting with validation scores
Feature Analysis: Use feature importance to understand what the model learns
Ensemble Trust: Higher confidence when all models agree on prediction

📋 Project Status

✅ Completed Features

Core System
- Advanced audio processing with anomaly detection
- Time-invariant feature extraction
- Multiple ML architectures (CNN, Transformer, Wav2Vec2, Random Forest)
- Flexible label system (no hardcoded metal types)
- Multi-format support (WAV, MP3, M4A)
Training Tools
- Three training pipelines (baseline, deep learning, ensemble)
- Dataset preparation and augmentation
- Model evaluation with detailed metrics
- TorchScript export for fast inference
User Interfaces
- Command-line classification
- Real-time streaming with visualization
- Web interface with drag-and-drop
- JSON API endpoints
Documentation
- Comprehensive README
- Code comments and docstrings
- Usage examples
- Best practices guide

🔧 Recent Improvements (Latest Updates)

Advanced Dataset Interface (Phase 4 - December 2025)

Enhanced Sample Browser: Interactive grid with search, filtering, and sorting capabilities
Modal Preview System: Detailed sample inspection with tabbed interface for Info/Events, Spectrograms, and Features
Interactive Spectrograms: Canvas-based visualization with zoom, pan, and multiple colormaps
Feature Visualization: Chart.js displays for temporal, spectral, and MFCC features with radar chart comparisons
Advanced Audio Controls: Click-to-seek waveforms, playback speed control, smart time formatting
Modular Architecture: Reusable components for audio players, waveform displays, and controls

Code Quality Improvements (December 2025)

DRY Compliance: Removed 350+ lines of duplicate HTML template code
Dead Code Removal: Eliminated 5 debug routes and 3 unused template files
Enhanced Documentation: Added comprehensive docstrings and inline comments
Performance Optimization: Cleaned up debug logs and unnecessary console output
Modular Components: Extracted reusable JavaScript components for better maintainability

Enhanced Audio Segmentation

Smart Event Detection: Replaced fixed-length segmentation with intelligent silence-based detection
Pattern-Aware: Identifies natural breaks in tone patterns (high-low or low-high transitions)
Variable Length Support: Segments are now properly sized based on actual audio content
Complete Event Capture: Preserves full metal detection events instead of arbitrary cuts

Real-time Processing Stability

WebSocket Reliability: Fixed connection errors with improved threading architecture
Session Management: Better handling of multiple client connections and disconnections
Error Prevention: Added safeguards against duplicate events and session conflicts
Canvas Optimization: Improved rendering performance with willReadFrequently attribute

Model Training Improvements

M1 GPU Acceleration: Proper tensor device management for Apple Silicon
Efficient Processing: Enhanced torchaudio-based pipeline for better performance
Robust Error Handling: Comprehensive error handling throughout training pipeline
Memory Optimization: Improved memory usage during large dataset processing

🚀 Production Ready

The system is fully operational with all major components completed. Phase 4 (Advanced Visualizations) is complete, and the codebase has been thoroughly cleaned and optimized. Ready for Phase 5 performance and polish improvements.

🔧 Troubleshooting

Common Issues and Solutions

Real-time Classification Not Working

Issue: "No active stream" errors or no detections showing
Solution: Ensure you click "Start Streaming" before making sounds, and check browser console for errors
Note: Audio is processed in 2-second chunks, so wait a few seconds for first detection

WebSocket Connection Errors

Issue: "Invalid frame header" or connection failures
Solution: The system now uses threading mode for better compatibility - restart the server if issues persist
Prevention: Avoid multiple browser tabs with real-time streaming open simultaneously

Training Data Issues

Issue: "No training data found" during model training
Solution: Ensure audio files are in correct directories (data/gold/, data/iron/, etc.) and are valid audio formats
Check: Run python train_model.py --data-dir data --verbose to see detailed processing logs

Segmentation Problems

Issue: Segments are too short or don't capture complete events
Solution: The new silence-based segmentation automatically detects natural breaks - ensure your audio has clear quiet periods between detections
Tip: Longer audio files with multiple clear detection events work best

Performance Issues

Issue: Slow processing or high memory usage
Solution: The system now uses M1 GPU acceleration and efficient processing - ensure you have sufficient RAM (8GB+ recommended)
Optimization: Use shorter audio files for training if memory is limited

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is open source and available under the MIT License.

Repository

https://github.com/onyxdigitaldev/metal-detector-ai

Contributors

Nikko Vellios - Designer & Primary Developer
oskodiak / Onyx Digital Intelligence Development - Support Development

Documentation

API Reference - Detailed docs for all endpoints and WebSocket events
SCHEDULE.md - Development schedule and phase tracking

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
data		data
docs		docs
models		models
scripts		scripts
segmented_data		segmented_data
src		src
static/js		static/js
templates		templates
test_audio		test_audio
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DOCKER_SETUP.md		DOCKER_SETUP.md
Dockerfile		Dockerfile
Dockerfile.debian		Dockerfile.debian
LICENSE		LICENSE
Makefile		Makefile
NEXT_STEPS.md		NEXT_STEPS.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
SCHEDULE.md		SCHEDULE.md
classify.py		classify.py
config.py		config.py
dataset_enhancement_progress.md		dataset_enhancement_progress.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
enhanced_web_app.py		enhanced_web_app.py
evaluate_models.py		evaluate_models.py
export_torchscript.py		export_torchscript.py
nginx.conf		nginx.conf
optimization_report.json		optimization_report.json
prepare_dataset.py		prepare_dataset.py
pytest.ini		pytest.ini
requirements-docker-full.txt		requirements-docker-full.txt
requirements-full.txt		requirements-full.txt
requirements.txt		requirements.txt
segment_dataset.py		segment_dataset.py
start_web_portal.sh		start_web_portal.sh
stream_classify.py		stream_classify.py
test_model_optimization.py		test_model_optimization.py
train_advanced_cv.py		train_advanced_cv.py
train_baseline.py		train_baseline.py
train_dl.py		train_dl.py
train_efficient.py		train_efficient.py
train_model.py		train_model.py
verify_installation.py		verify_installation.py

Folders and files

Latest commit

History

Repository files navigation

Metal Detector AI

🌟 Key Features

🚀 Quick Start

📁 Project Structure

🎵 How It Works

1. Audio Analysis Philosophy

2. Time-Invariant Processing

3. Advanced Feature Extraction (33 Features)

4. Ensemble Learning Architecture

📊 Flexible Training System

Web Interface

🎯 Usage Examples

Dataset Preparation

Training

Classification

Real-time Streaming

🔬 Technical Details

Anomaly Detection Algorithm

Time-Invariant Feature Engineering

Model Architecture Details

📊 Model Evaluation

📈 Performance Metrics

🚀 Performance Optimization

Efficient Audio Processing

TorchScript Export for Fast Inference

📊 Dataset Preparation

1. Split Long Recordings

2. Label Your Data

3. Organize by Labels

4. Augment Dataset

5. Analyze Dataset

6. Create Train/Val Split

🧪 Testing & Optimization

Automated Testing

Model Parameter Optimization

Test Coverage

🎛️ Advanced Configuration

Model Architecture

Training Parameters

Audio Processing

Real-time Streaming

💡 Best Practices for Optimal Results

Training Data Quality

Audio Recording Tips

Model Training Strategy

📋 Project Status

✅ Completed Features

🔧 Recent Improvements (Latest Updates)

Advanced Dataset Interface (Phase 4 - December 2025)

Code Quality Improvements (December 2025)

Enhanced Audio Segmentation

Real-time Processing Stability

Model Training Improvements

🚀 Production Ready

🔧 Troubleshooting

Common Issues and Solutions

Real-time Classification Not Working

WebSocket Connection Errors

Training Data Issues

Segmentation Problems

Performance Issues

🤝 Contributing

📄 License

Repository

Contributors

Documentation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages