Skip to content

Commit 92a1940

Browse files
committed
Add document detection
1 parent b8616d5 commit 92a1940

File tree

9 files changed

+1766
-0
lines changed

9 files changed

+1766
-0
lines changed
585 KB
Loading
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# YOLO Document Segmentation Training with MIDV500 Dataset
2+
3+
This directory contains a complete pipeline for training YOLO segmentation models for document detection using the [MIDV500 dataset](https://github.com/fcakyon/midv500/tree/master) and [ultralytics](https://pypi.org/project/ultralytics/).
4+
5+
## 🚀 Quick Start
6+
7+
### 1. Install Dependencies
8+
```bash
9+
pip install -r requirements.txt
10+
```
11+
12+
### 2. Prepare Dataset
13+
Download MIDV500 dataset and convert to YOLO format:
14+
```bash
15+
python data.py
16+
python prep_midv500_to_yolov11seg.py
17+
```
18+
19+
### 3. Visualize Dataset
20+
Verify dataset quality with the GUI viewer:
21+
```bash
22+
python dataset_viewer.py
23+
```
24+
25+
![YOLO dataset viewer](https://www.dynamsoft.com/codepool/img/2025/09/yolo-dataset-viewer.png)
26+
27+
### 4. Train Model
28+
Train YOLO11 segmentation model using the provided script:
29+
```bash
30+
python train_yolo_doc_detection.py
31+
```
32+
33+
![YOLO segmentation training](https://www.dynamsoft.com/codepool/img/2025/09/yolo-training.png)
34+
35+
### 5. Run GUI Application
36+
Launch the document detection GUI:
37+
```bash
38+
python document_detector_gui.py
39+
```
40+
41+
![ID detection with YOLO segmentation](https://www.dynamsoft.com/codepool/img/2025/09/document-id-yolo-segmentation.png)
42+
43+
## GPU Setup for Faster Training
44+
45+
### Prerequisites
46+
47+
1. **NVIDIA GPU** with CUDA support (GTX 1060 or better recommended)
48+
2. **NVIDIA drivers** installed and up-to-date
49+
3. **CUDA toolkit** (automatically installed with PyTorch)
50+
51+
### Check GPU Availability
52+
53+
First, verify your GPU is detected:
54+
55+
```bash
56+
# Check NVIDIA GPU and driver
57+
nvidia-smi
58+
59+
# Check current PyTorch GPU support
60+
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"
61+
```
62+
63+
### Install GPU-Enabled PyTorch
64+
65+
If CUDA is not available, install GPU-enabled PyTorch:
66+
67+
```bash
68+
# Uninstall CPU-only PyTorch
69+
pip uninstall torch torchvision torchaudio -y
70+
71+
# Install GPU-enabled PyTorch (CUDA 12.1)
72+
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
73+
```
74+
75+
For other CUDA versions, visit: https://pytorch.org/get-started/locally/
76+
77+
### Verify GPU Setup
78+
79+
```bash
80+
python -c "import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')"
81+
```
82+
83+
### GPU Training Commands
84+
85+
Once GPU is enabled, use `device=0` for GPU training:
86+
87+
```bash
88+
# GPU training (recommended)
89+
yolo task=segment mode=train model=yolov8s-seg.pt data=dataset/doc.yaml imgsz=640 epochs=80 batch=16 device=0
90+
91+
# CPU training (fallback)
92+
yolo task=segment mode=train model=yolov8s-seg.pt data=dataset/doc.yaml imgsz=640 epochs=80 batch=8 device=cpu
93+
```
94+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
import midv500
2+
3+
dataset_dir = 'midv500_data/'
4+
5+
dataset_name = "all"
6+
midv500.download_dataset(dataset_dir, dataset_name)

0 commit comments

Comments
 (0)