Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts
Introduces a scene-graph-first reasoning pipeline that makes VLM answers more structured by separating object relationships from higher-level chain-of-thought reasoning.