This document describes the internal structure and organization of the Pixel Pointing Benchmark codebase. For usage instructions, see README.md.
PixelPointingBenchmark/
├── evaluation/ # Evaluation logic
│ ├── __init__.py # Module exports
│ ├── vlm_evaluator.py # VLM querying and coordinate extraction
│ ├── metrics.py # Accuracy metrics and statistics
│ ├── results_manager.py # Results storage, consolidation, and indexing
│ ├── runner.py # Main evaluation runner
│ └── utils.py # CLI utilities for result management
│
├── test_generation/ # Test image generation
│ ├── __init__.py # Module exports
│ └── image_generator.py # Synthetic image generation
│
├── test_suites/ # Test suite management
│ ├── __init__.py # Module exports
│ ├── base.py # Base classes for test suites
│ └── registry.py # Test suite registry
│
├── evaluate.py # Main entry point
├── serve_viewer.py # Web server for viewer
└── index.html # Results viewer
- vlm_evaluator.py: No internal dependencies
- metrics.py: No internal dependencies
- results_manager.py: No internal dependencies
- runner.py: Depends on vlm_evaluator, metrics, results_manager, test_suites.registry
- utils.py: Depends on results_manager
- image_generator.py: No internal dependencies
- base.py: Depends on test_generation.image_generator (lazy import)
- registry.py: Depends on base
- vlm_evaluator.py: Handles communication with VLMs via LiteLLM, extracts coordinates from responses
- metrics.py: Calculates accuracy metrics (distance, extraction rate, statistics across passes)
- results_manager.py: Manages result storage, consolidation, indexing, and utilities
- runner.py: Orchestrates the evaluation process
- utils.py: CLI tool for result management operations
- image_generator.py: Generates synthetic test images with various shapes and configurations
- base.py: Abstract base classes for test suites (SyntheticTestSuite, ScreenshotTestSuite)
- registry.py: Manages registration and retrieval of test suites
save_run(): Save individual evaluation runs with timestampsload_runs(): Load runs for a test suite and screen sizeconsolidate_results(): Create consolidated results JSON for viewerupdate_test_suites_index(): Update test_suites.json index filefix_consolidated_results(): Fix missing models list in consolidated results
- Test Generation:
test_generationcreates synthetic images - Test Suites:
test_suitesmanages test configurations - Evaluation:
evaluation.runnerorchestrates VLM queries - Results:
evaluation.results_managerstores and consolidates results - Visualization:
index.htmldisplays consolidated results