|
| 1 | +# YOLO Document Segmentation Training with MIDV500 Dataset |
| 2 | + |
| 3 | +This directory contains a complete pipeline for training YOLO segmentation models for document detection using the [MIDV500 dataset](https://github.com/fcakyon/midv500/tree/master) and [ultralytics](https://pypi.org/project/ultralytics/). |
| 4 | + |
| 5 | +## 🚀 Quick Start |
| 6 | + |
| 7 | +### 1. Install Dependencies |
| 8 | +```bash |
| 9 | +pip install -r requirements.txt |
| 10 | +``` |
| 11 | + |
| 12 | +### 2. Prepare Dataset |
| 13 | +Download MIDV500 dataset and convert to YOLO format: |
| 14 | +```bash |
| 15 | +python data.py |
| 16 | +python prep_midv500_to_yolov11seg.py |
| 17 | +``` |
| 18 | + |
| 19 | +### 3. Visualize Dataset |
| 20 | +Verify dataset quality with the GUI viewer: |
| 21 | +```bash |
| 22 | +python dataset_viewer.py |
| 23 | +``` |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +### 4. Train Model |
| 28 | +Train YOLO11 segmentation model using the provided script: |
| 29 | +```bash |
| 30 | +python train_yolo_doc_detection.py |
| 31 | +``` |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | +### 5. Run GUI Application |
| 36 | +Launch the document detection GUI: |
| 37 | +```bash |
| 38 | +python document_detector_gui.py |
| 39 | +``` |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +## GPU Setup for Faster Training |
| 44 | + |
| 45 | +### Prerequisites |
| 46 | + |
| 47 | +1. **NVIDIA GPU** with CUDA support (GTX 1060 or better recommended) |
| 48 | +2. **NVIDIA drivers** installed and up-to-date |
| 49 | +3. **CUDA toolkit** (automatically installed with PyTorch) |
| 50 | + |
| 51 | +### Check GPU Availability |
| 52 | + |
| 53 | +First, verify your GPU is detected: |
| 54 | + |
| 55 | +```bash |
| 56 | +# Check NVIDIA GPU and driver |
| 57 | +nvidia-smi |
| 58 | + |
| 59 | +# Check current PyTorch GPU support |
| 60 | +python -c "import torch; print('CUDA available:', torch.cuda.is_available())" |
| 61 | +``` |
| 62 | + |
| 63 | +### Install GPU-Enabled PyTorch |
| 64 | + |
| 65 | +If CUDA is not available, install GPU-enabled PyTorch: |
| 66 | + |
| 67 | +```bash |
| 68 | +# Uninstall CPU-only PyTorch |
| 69 | +pip uninstall torch torchvision torchaudio -y |
| 70 | + |
| 71 | +# Install GPU-enabled PyTorch (CUDA 12.1) |
| 72 | +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 |
| 73 | +``` |
| 74 | + |
| 75 | +For other CUDA versions, visit: https://pytorch.org/get-started/locally/ |
| 76 | + |
| 77 | +### Verify GPU Setup |
| 78 | + |
| 79 | +```bash |
| 80 | +python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')" |
| 81 | +``` |
| 82 | + |
| 83 | +### GPU Training Commands |
| 84 | + |
| 85 | +Once GPU is enabled, use `device=0` for GPU training: |
| 86 | + |
| 87 | +```bash |
| 88 | +# GPU training (recommended) |
| 89 | +yolo task=segment mode=train model=yolov8s-seg.pt data=dataset/doc.yaml imgsz=640 epochs=80 batch=16 device=0 |
| 90 | + |
| 91 | +# CPU training (fallback) |
| 92 | +yolo task=segment mode=train model=yolov8s-seg.pt data=dataset/doc.yaml imgsz=640 epochs=80 batch=8 device=cpu |
| 93 | +``` |
| 94 | + |
0 commit comments