Skip to content

Lab-RoCoCo-Sapienza/map-vlm

Repository files navigation


Multi-agent Planning using Visual Language Models

Michele Brienza1, Francesco Argenziano1, Vincenzo Suriani2, Domenico D. Bloisi3 Daniele Nardi1,

1 Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy, 2 School of Engineering, University of Basilicata, Potenza, Italy, 3 International University of Rome UNINT, Rome, Italy

arxiv paper arxiv paper license


Dataset

This project uses the G-PlanET dataset, which must be downloaded from Hugging Face:

Dataset: yuchenlin/G-PlanET

Downloading the Dataset

You can download the dataset using the Hugging Face datasets library:

from datasets import load_dataset

dataset = load_dataset("yuchenlin/G-PlanET")

Or using the Hugging Face CLI:

huggingface-cli download yuchenlin/G-PlanET

Trails

The trails (ID and image) used in this project are available on the project website:

Website: https://lab-rococo-sapienza.github.io/map-vlm/

Installation

pip install -r requirements.txt

Installing PG2S Metric

To evaluate the generated plans, you need to install the PG2S metric library:

pip install pg2s

Or install from source:

git clone https://github.com/Lab-RoCoCo-Sapienza/pg2s
cd pg2s
pip install .

Usage

The main script test.py processes JSONL entries and generates planning outputs using both table-based and vision-based agents.

Basic Usage

python test.py

Command Line Arguments

  • --jsonl: Path to input JSONL file (default: example.jsonl)
  • --limit: Maximum number of records to process (default: 1)
  • --id: Process only the record with matching ID (optional)
  • --image: Path to image file for vision planning (default: 4.jpg)
  • --model-table: OpenAI model for table planning (default: gpt-4o)
  • --model-vision: OpenAI model for vision planning (default: gpt-4o)
  • --output-dir: Directory to save generated plans (default: output_plans)

Examples

Process a single record:

python test.py --jsonl example.jsonl --limit 1

Process multiple records with custom output directory:

python test.py --jsonl example.jsonl --limit 5 --output-dir ./results

Process a specific record by ID:

python test.py --jsonl example.jsonl --id "record_123"

Output Structure

For each processed record, the script creates a subdirectory record_{id} containing:

  • input_table.txt - Input table in markdown format
  • single_agent_table.txt - Plan generated by single-agent with table
  • multi_agent_table_env.txt - Environment summary from multi-agent with table
  • multi_agent_table_plan.txt - Plan generated by multi-agent with table
  • single_agent_vision.txt - Plan generated by single-agent with vision
  • multi_agent_vision.txt - Plan generated by multi-agent with vision

Evaluation

This project uses the PG2S metric to evaluate the quality of generated plans.

Using PG2S

from pg2s.metric import pg2s_score

plans = {
    "task-1": {
        'truth': [
            'Turn around and walk to the sink.',
            'Take the left glass out of the sink.',
            'Turn around and walk to the microwave.',
            'Heat the glass in the microwave.',
            'Turn around and face the counter.',
            'Place the glass in the left top cabinet.'
        ],
        'predict': [
            'Walk to the sink.',
            'Pick up the glass from the sink.',
            'Go to the microwave.',
            'Heat the glass.',
            'Walk to the counter.',
            'Put the glass in the cabinet.'
        ]
    },
}

# Calculate the similarity score with a custom alpha value
# alpha controls the balance between goal-wise and sentence-wise similarity
score = pg2s_score(plans, alpha=0.7)
print(f"PG2S Score: {score}")

PG2S Parameters

  • plans: Dictionary containing tasks with ground truth and predicted action sequences
  • alpha: Hyperparameter (default: 0.5) that balances:
    • Goal-wise similarity
    • Sentence-wise similarity

For more information, see the PG2S repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages