top of page

Scene Aware Urban Design

This paper presents a human-in-the-loop computer vision system that uses Grounding DINO and a subset of ADE20K to detect urban objects and learn spatial patterns. From these co-occurrence embeddings, users receive a short list of likely complements to a selected anchor object. A vision–language model then reasons over the scene to suggest another element, supporting micro-scale interventions and more continuous, locally grounded participation.

Date 2025

​

Project Type: Research

​

Keywords: AR, Urban Planning, VLM

​

Team: Rodrigo Gallardo, Oz Fishman, Alex Htet Kyaw

image.png
DESIGN X.png

The pipeline runs lightweight background object detection as scenes are processed. When a scene qualifies for intervention, the user selects an anchor object, triggering a two-branch decision process. In the statistical branch, the system retrieves common co-occurring urban objects from ADE20K-based analysis, and the user chooses one. In the semantic branch, the anchor–object pair is passed to a vision-language model, which proposes up to five additional context-aware objects for the scene.

diagram-1.png
co occurence matrix.png
Screenshot 2025-08-10 004043.png
Screenshot 2025-08-09 191738.png
  • favicon-512x512
  • Instagram
  • LinkedIn

Rodrigo Gallardo © 2020 - 2025. All Rights Reserved.

bottom of page