Skip to content
This repository was archived by the owner on Sep 30, 2025. It is now read-only.

CheesyLaZanya/graphcap

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

graphcap

Python version License GitHub last commit GitHub issues OpenSSF Best Practices

Keywords: image captioning, scene graph, DAG, FastAPI, multimodal, machine learning, open source, artificial intelligence, datasets, open model initiative, OMI

Image

graphcap is an open source system for generating image captions and scene graphs using multiple analytical perspectives. The project combines a React based user interface, a TypeScript data service and a Python inference bridge to produce structured captions that conform to declarative JSON schemas.

Features

  • Multi-perspective captioning – captions are produced using declarative "perspectives" that describe prompts and output schemas.
  • Modular architecture – separate microservices for the UI, data service, inference bridge and media processing, all coordinated through a local workspace volume.
  • Provider abstraction – easily integrate OpenAI, Ollama, Gemini or other vision-language providers through the provider factory API.
  • Extensible dataset management – upload, edit and organise images directly from the web interface.
  • Sphinx documentation – full developer and user documentation is located in the doc/ directory.

Quick start

The easiest way to run graphcap is with Docker Compose and the provided Taskfile commands. Ensure that Docker and the Task runner are installed, then execute:

# prepare configuration and build base images
task setup

# start all services in the background
task start

Once the services are running visit http://localhost:32200 in your browser. The default workspace is stored inside the workspace/ directory of this repository. For more details on configuration and available services see the installation guide.

Repository layout

apps/       # frontend and service applications
packages/   # shared libraries (TypeScript and Python)
doc/        # Sphinx documentation
workspace/  # local configuration and persistent volumes

Each package or application contains its own README with development instructions.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

About

graphcap is an application that leverages directed acyclic graphs (DAGs) along with vision and language models to generate structured image captions and scene graphs from multimodal data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • TypeScript 76.3%
  • Python 16.7%
  • JavaScript 4.8%
  • CSS 1.2%
  • Other 1.0%