A comprehensive computer vision solution for multi-label shape and color classification in synthetic images using hybrid OpenCV-CNN architectures.
This project addresses the Shapes and Colors Prediction Challenge, which involves predicting geometric shapes (circles, squares, triangles) and their corresponding colors (red, blue, green) in synthetic images. Each image contains randomly placed, rotated geometric shapes without overlaps, presenting a multi-label image recognition task.
- Input: Synthetic images with geometric shapes
- Output: List of (shape, color) tuples for each detected object
- Evaluation: Jaccard similarity between predicted and ground truth sets
- Shapes: Circle, Square, Triangle
- Colors: Red, Blue, Green
| Model | Exact Match Ratio (EMR) | Jaccard Score | Architecture |
|---|---|---|---|
| OpenCV Baseline | 68.30% | 66.19% | HSV + Contours |
| Simple CNN | 88.00% | 81.51% | Basic ConvNet |
| Hybrid CNN | 98.00% | 98.68% | OpenCV + CNN |
| ResNet Hybrid | 99.50% | 99.22% | ResNet + OpenCV |
- Dataset Balance: Analyzed distribution of shapes, colors, and object counts
- Data Characteristics: Identified noise patterns, overlapping objects, and edge cases
- Class Imbalance: Images with 5 objects are significantly underrepresented
- Color Detection: HSV color space thresholding
- Shape Classification: Contour analysis with vertex counting
- Performance: 66.19% Jaccard score
- Issues: Struggles with overlapping objects and complex scenes
- Simple CNN: Basic convolutional architecture with multi-label classification
- Hybrid Approach: OpenCV for color masking + CNN for shape classification
- ResNet Enhancement: Residual connections for better feature learning
- Color Masking: HSV-based color isolation
- Noise Reduction: Morphological operations for clean segmentation
- Multi-channel Input: 4-channel input (RGB + mask) for enhanced feature extraction
-
Color Detection Module
HSV_RANGES = { 'red': [((0, 120, 70), (10, 255, 255)), ((170, 120, 70), (180, 255, 255))], 'green': [((40, 100, 100), (80, 255, 255))], 'blue': [((90, 100, 100), (130, 255, 255))] }
-
ResNet-based Shape Classifier
- 4-channel input (RGB + binary mask)
- Residual blocks for deep feature learning
- Multi-label output with BCEWithLogitsLoss
-
Hybrid Pipeline
- Color-specific masking using OpenCV
- Shape classification using trained CNN
- Ensemble predictions across color channels
Input (4-channel: RGB + Mask) →
Initial Conv + BN + ReLU →
Residual Blocks (32→64→128→256) →
Global Average Pooling →
Classifier (Linear + Dropout)
-
OpenCV Limitations:
- Severe under-prediction on complex images (3+ objects)
- Missed 1,585 circles out of 2,880 total
- All 465 false positives were misclassified as squares
-
CNN Improvements:
- Reduced false negatives from 2,072 to 129 objects
- Dramatically decreased hallucinations from 465 to 32 objects
- Better handling of overlapping and noisy objects
-
Hybrid Success Factors:
- Color masking isolates objects effectively
- CNN focuses on shape-specific features
- Residual connections enable deeper feature learning
- Hidden Objects: Objects of same color overlapping (e.g., red triangle inside red circle)
- Spurious Correlations: Model learning spatial arrangements rather than geometric features
- Edge Cases: Subtle features competing with dominant patterns
├── 00.BRF_report.ipynb # Main analysis notebook
├── Archive/ # Dataset files
│ ├── train.csv
│ ├── test.csv
│ ├── train_dataset/ # Training images
└── └── test_dataset/ # Test images
- Strategy: Stratified 5-fold cross-validation
- Stratification: Based on number of objects per image
- Reason: Address class imbalance (especially 5-object images)
- Image Size: 128×128 pixels
- Batch Size: 32
- Epochs: 30
- Optimizer: AdamW with learning rate 0.001
- Loss Function: BCEWithLogitsLoss (multi-label)
- Random horizontal flips
- Random rotations (±15°)
- Random resized crops
- D4 dihedral transformations
- 99.22% Jaccard Score: Near-perfect object detection and classification
- Robust Architecture: Handles overlapping objects and noise effectively
- Comprehensive Analysis: Deep insights into model failures and improvements
- Scalable Pipeline: Modular design for easy extension and modification
pip install torch torchvision opencv-python pandas numpy matplotlib seaborn pillow albumentations scikit-learn tqdmjupyter notebook 00.BRF_report.ipynb- Object Detection Integration: YOLO/R-CNN for instance segmentation
- Attention Mechanisms: Focus on relevant image regions
- Synthetic Data Augmentation: Generate more complex training scenarios
- Real-world Adaptation: Transfer learning for natural images
This project demonstrates advanced computer vision techniques combining classical image processing with deep learning for robust multi-label classification.
This project was developed as part of a Kaggle competition focused on shape and color recognition in synthetic images. The solution showcases a progression from basic OpenCV methods to sophisticated hybrid architectures achieving near-perfect performance.