Open-ended design exploration with large language models
Anton Savov, Angela Yoo, CheWei Lin, Benjamin Dillenburger
Digital Building Technologies, ETH Zurich
Design++, ETH Zurich
Paper: Savov, A., Yoo, A., Lin, C., & Dillenburger, B. (2025). Generalist Generative Agent: Open-ended design exploration with large language models. Proceedings of the 30th CAADRIA Conference, Vol. 1, 193–202. DOI: 10.52842/conf.caadria.2025.1.193
Architects often navigate ambiguity in early-stage design by using metaphors and conceptual models to transform abstract ideas into architectural forms. However, current computational tools struggle with such exploratory processes due to narrowly defined design spaces.
This repository presents an Agentic AI framework in which LLM agents interpret metaphors, formulate design tasks, and generate procedural 3D concept models. Using this framework, we produced 1,000 procedural designs and 4,000 rendered images based on 20 metaphors, demonstrating the emergent capabilities of LLMs for creating architecturally relevant conceptual models.
The framework consists of four LLM-powered agents arranged in a sequential pipeline:
Metaphor Agent ➜ Interpretation Agent ➜ Modelling Agent ➜ Evaluation Agent
For full end-to-end reproduction, insert the Blender render step between modelling and evaluation so the OBJ outputs become PNGs for scoring.
| Agent | Role |
|---|---|
| Metaphor | Generates novel architectural metaphors (e.g., "rippled grid", "cantilevering corners") as design drivers with formal and spatial qualities. |
| Interpretation | Translates a metaphor into concrete design implications and a succinct design task for an architectural concept model. |
| Modelling | Generates procedural Python code for Rhino/Grasshopper that produces parametric 3D concept models, with iterative error-correction and multiple parameter variations. |
| Evaluation | Uses a Vision Transformer (GPT-4o) to score rendered models on metaphor alignment, conceptual strength, geometric complexity, and design-task adherence. |
├── run_step1_metaphor_agent.py # Generate architectural metaphors
├── run_step2_interpretation_agent.py # Derive design tasks from metaphors
├── run_step3_gh_code_agent.py # Generate Grasshopper/RhinoCommon code
├── run_step3a_render.py # Render GH OBJ outputs in Blender 3.6
├── run_step4_eval_agent.py # Evaluate rendered concept models
│
├── ria/ # Core library
│ ├── agents/ # Agent implementations
│ │ ├── metaphor.py
│ │ ├── interpretation.py
│ │ ├── gh_code_agent.py
│ │ └── evaluation.py
│ ├── prompts/ # System prompts for each agent
│ ├── utils/ # File I/O and text utilities
│ └── visualization/ # Blender rendering & batch processing
│
├── rhino-gh-environment/ # Grasshopper definition & helper components
│ ├── geometric_environment.gh
│ └── gh_components/ # UDP listener, background tasks, OBJ export
│
├── output/ # Generated experiment data
│ └── experiment_data/
│ ├── 00_metaphors_list/
│ ├── 01_metaphors_expanded/
│ ├── 02_interpretations/
│ ├── 03_designs_H_gpt-4o/ # History mode
│ ├── 03_designs_gpt-4o_design_task/ # Design-task mode
│ └── 03_designs_gpt-4o_metaphor/ # Metaphor-only mode
│
├── pyproject.toml # uv project metadata and dependencies
├── uv.lock # Checked-in uv lockfile for reproducible environments
└── LICENSE # MIT
- uv for Python version and environment management
- Python 3.11 for the local orchestration environment
- Rhino 8 with Grasshopper for the Modelling Agent
- Blender 3.6 for rendering OBJ outputs before evaluation
- OpenAI API key for all four agents
Use the official uv installer or package manager instructions from the
uv installation guide.
git clone https://github.com/<your-username>/generalist-generative-agent.git
cd generalist-generative-agentThis repository is pinned to Python 3.11 via .python-version and
pyproject.toml.
uv sync --lockeduv sync --locked creates .venv/ and installs the dependencies declared in
pyproject.toml using the checked-in uv.lock.
If you intentionally need to refresh the lockfile after dependency changes:
uv lock
uv syncCreate a .env file in the repository root:
OPENAI_API_KEY=sk-...The agent scripts load .env automatically, so no extra shell export step is required.
Use uv run so commands execute inside the project environment without manually
activating .venv:
uv run --locked python run_step1_metaphor_agent.py
uv run --locked python run_step2_interpretation_agent.py
uv run --locked python run_step3_gh_code_agent.py
uv run --locked python run_step3a_render.py
uv run --locked python run_step4_eval_agent.pyIf your editor needs the environment activated directly:
source .venv/bin/activateOn Windows PowerShell:
.venv\Scripts\activateThe Modelling Agent depends on Rhino 8 and the checked-in Grasshopper
definition in rhino-gh-environment/geometric_environment.gh.
- Install Rhino 8.
- Open Rhino 8, launch Grasshopper, and open
rhino-gh-environment/geometric_environment.gh. - In the Grasshopper file, first set
Toggle ListenertoTrue. - Then press
Reset Listeneronce. - Confirm the listener component shows
Listening for incoming UDP messages.... - If it shows
[Errno 48] Address already in use, restart Rhino and repeat from step 2. - Keep Rhino and Grasshopper running while executing
run_step3_gh_code_agent.py.
The modelling stage sends generated Grasshopper Python code to Rhino over UDP, executes it inside the open Grasshopper environment, and expects exported OBJ files in response.
The Evaluation Agent scores rendered PNGs, not OBJ files directly, so Blender 3.6 is required for a full end-to-end run of all four agents.
- Install Blender 3.6.
- The default render launcher uses
ria/visualization/230125_render_enviroment.blend. run_step3a_render.pyautomatically prefers Blender 3.6 in the standard install location.- If Blender 3.6 is installed somewhere else, set
BLENDER_BINor pass--blender-bin. - Ensure the Blender scene contains:
SCENEfor the camera and lights,HIDDENfor helper objects that should not render, and a mesh namedplaceholderboxthat defines the target scale for imported OBJ models. - Optional: a Grease Pencil object with a modifier can be present for outline effects.
run_step3a_render.pyreusesrender_materialandwire_materialif they already exist in the scene.
The checked-in evaluation workflow assumes the render output names follow the
existing dataset convention such as *_GHOSTED_1280x1280.png.
output/test-runs/ is git-ignored, so the following smoke-test outputs stay out
of the repository.
uv run --locked python run_step1_metaphor_agent.py \
--output-dir output/test-runs/demo/01_metaphors_expanded \
--expand "Shifted grid"uv run --locked python run_step2_interpretation_agent.py \
--metaphors-dir output/test-runs/demo/01_metaphors_expanded \
--output-dir output/test-runs/demo/02_interpretations \
--filename 0001_shifted_grid.json \
--versions 1Run one smoke test for each GH mode. All three commands use the same interpretation input, but each mode changes what is sent to the LLM:
metaphoruses only the metaphor text.design-taskuses the full interpretation JSON.historyuses the full interpretation JSON plus prior GH runs fromoutput/experiment_data/03_designs_H_gpt-4o/as few-shot history.
Metaphor-only mode:
uv run --locked python run_step3_gh_code_agent.py \
--mode metaphor \
--filename 0001_0001_shifted_grid.json \
--interpretations-dir output/test-runs/demo/02_interpretations \
--output-dir output/test-runs/demo/03_designs_gpt-4o_metaphor \
--designs-per-interpretation 1Design-task mode:
uv run --locked python run_step3_gh_code_agent.py \
--mode design-task \
--filename 0001_0001_shifted_grid.json \
--interpretations-dir output/test-runs/demo/02_interpretations \
--output-dir output/test-runs/demo/03_designs_gpt-4o_design_task \
--designs-per-interpretation 1History mode:
uv run --locked python run_step3_gh_code_agent.py \
--mode history \
--filename 0001_0001_shifted_grid.json \
--interpretations-dir output/test-runs/demo/02_interpretations \
--skills-dir output/experiment_data/03_designs_H_gpt-4o \
--output-dir output/test-runs/demo/03_designs_H_gpt-4o \
--designs-per-interpretation 1uv run --locked python run_step3a_render.py \
--input-dir output/test-runs/demo/03_designs_H_gpt-4o \
--limit 1 \
--resolution-x 256 \
--resolution-y 256 \
--samples 8Replace the input directory with 03_designs_gpt-4o_metaphor/ or
03_designs_gpt-4o_design_task/ if you want to render those GH test outputs
instead.
uv run --locked python run_step4_eval_agent.py \
--data-dir output/test-runs/demo/03_designs_H_gpt-4o \
--eval-crops-dir output/test-runs/demo/evaluation_crops \
--limit 1Replace the data directory with the render folder from the GH mode you want to score.
uv run --locked python run_step1_metaphor_agent.pyThis reads the seed list from
output/experiment_data/00_metaphors_list/metaphors_to_expand.txt and writes
expanded metaphor JSON files to output/experiment_data/01_metaphors_expanded/.
uv run --locked python run_step2_interpretation_agent.pyThis reads output/experiment_data/01_metaphors_expanded/ and writes
interpretation JSON files to output/experiment_data/02_interpretations/.
The GH runner supports three modes:
metaphor: uses only the metaphor text and writes to03_designs_gpt-4o_metaphor/design-task: uses the full interpretation and writes to03_designs_gpt-4o_design_task/history: uses the full interpretation plus prior GH runs as few-shot history and writes to03_designs_H_gpt-4o/
Metaphor-only run:
uv run --locked python run_step3_gh_code_agent.py --mode metaphorDesign-task run:
uv run --locked python run_step3_gh_code_agent.py --mode design-taskHistory run:
uv run --locked python run_step3_gh_code_agent.pyThe default command above is the same as --mode history. All three modes read
from output/experiment_data/02_interpretations/, but write to different
03_* output folders depending on the selected mode.
uv run --locked python run_step3a_render.pyThis opens ria/visualization/230125_render_enviroment.blend, scans
output/experiment_data/03_designs_H_gpt-4o/ for OBJ outputs, and writes the
rendered PNGs next to the OBJ files.
For the two ablation modes, point --input-dir at
output/experiment_data/03_designs_gpt-4o_metaphor/ or
output/experiment_data/03_designs_gpt-4o_design_task/.
uv run --locked python run_step4_eval_agent.pyThis scans output/experiment_data/03_designs_H_gpt-4o/ for rendered PNG
images, writes cropped evaluation images to output/evaluation_crops/, and
stores an evaluation.csv in each rendered design folder.
For the two ablation modes, point --data-dir at
output/experiment_data/03_designs_gpt-4o_metaphor/ or
output/experiment_data/03_designs_gpt-4o_design_task/.
- LangChain + OpenAI API — LLM orchestration
- Rhino 8 + Grasshopper — Geometry host environment
- Blender — Rendering (axonometric projection with neutral colours)
If you use this work in your research, please cite:
@inproceedings{savov2025generalist,
title = {Generalist Generative Agent: Open-ended Design Exploration with Large Language Models},
author = {Savov, Anton and Yoo, Angela and Lin, CheWei and Dillenburger, Benjamin},
booktitle = {Proceedings of the 30th International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA)},
volume = {1},
pages = {193--202},
year = {2025},
doi = {10.52842/conf.caadria.2025.1.193}
}This project is licensed under the MIT License.
This research is supported by an ETH Career Seed Award, a Hasler Stiftung Project Grant, and a Swiss National Science Foundation (SNSF) Spark grant (No. 228564) for the project "RIA: Novel Framework for Generative Architectural Design using AI-Agents". It is affiliated to the Center for Augmented Computational Design in Architecture, Engineering, and Construction (Design++), ETH Zurich.
