Keywords: image captioning, scene graph, DAG, FastAPI, multimodal, machine learning, open source, artificial intelligence, datasets, open model initiative, OMI
graphcap is an open source system for generating image captions and scene graphs using multiple analytical perspectives. The project combines a React based user interface, a TypeScript data service and a Python inference bridge to produce structured captions that conform to declarative JSON schemas.
- Multi-perspective captioning – captions are produced using declarative "perspectives" that describe prompts and output schemas.
- Modular architecture – separate microservices for the UI, data service, inference bridge and media processing, all coordinated through a local workspace volume.
- Provider abstraction – easily integrate OpenAI, Ollama, Gemini or other vision-language providers through the provider factory API.
- Extensible dataset management – upload, edit and organise images directly from the web interface.
- Sphinx documentation – full developer and user documentation is located in the
doc/directory.
The easiest way to run graphcap is with Docker Compose and the provided Taskfile commands. Ensure that Docker and the Task runner are installed, then execute:
# prepare configuration and build base images
task setup
# start all services in the background
task startOnce the services are running visit http://localhost:32200 in your browser. The default workspace is stored inside the workspace/ directory of this repository. For more details on configuration and available services see the installation guide.
apps/ # frontend and service applications
packages/ # shared libraries (TypeScript and Python)
doc/ # Sphinx documentation
workspace/ # local configuration and persistent volumes
Each package or application contains its own README with development instructions.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
