Acknowledgements
VQA Data Annotators: Siqi Zha, Yeyang Fu, Chuang Deng, Fenghong An, Hanying Li, Jiayi Fan
Internal Testers: Jialu Yu, Yingqi Guo
[2025-05-20] The arxiv article and dataset along with the code will be launched soon..
Planning maps of territorial and spatial planning visually present the concepts, objectives, strategies, and specific measures of territorial and spatial planning in map form, serving to guide and coordinate various development, protection, and utilization activities across territorial space. They not only constitute a critical basis for planning decisions but also act as essential tools for public participation and oversight of plan implementation. Given the complexity and specialized nature of planning work, fully grasping planning maps requires not only grasping fine-grained elements (such as symbols, legends, and geographic features) but also possessing the ability to perform comprehensive analysis and judgment in conjunction with relevant policies. This complexity renders the interpretation of planning maps particularly challenging.With the rapid advancement of multimodal large language models (MLLMs), we have established a benchmark for territorial and spatial planning maps to assess MLLMs’ map-understanding capabilities. Our contributions are as follows:
(1) Data: We constructed the Spatial Planning Map Database (SPMD), an expert-annotated repository characterized by diverse map content and high-quality annotations provided by planning domain specialists.
(2) Framework: We proposed a comprehensive, planning-discipline–based evaluation standard that measures MLLMs’ planning-map comprehension from four perspectives—perception, reasoning, association, and application—comprising eight fine-grained subcategories.
(3) Experiments: By designing question–answer tasks grounded in authoritative question banks (specifically, the practice exam questions for the Chinese Registered Urban Planner qualification), we significantly reduced the incidence of hallucinated normative references by the models.
(4) Results: All models exhibited their weakest performance in the application dimension, while Qwen2.5-VL-32B-Instruct achieved the highest overall score across all four evaluated dimensions.
We propose a conceptual framework tailored to the domain of urban planning visualization.This framework comprises the following 4 dimensions and 8 categories:
Example Question: “A county currently has a population of 980,000, including 520,000 urban residents. By 2035, the plan aims to increase the urbanization rate to 80%. Three central towns—A, B, and C—and the old district of the county seat will undergo upgrading and renovation. Sixty percent of the population will be relocated to newly built residential areas on the outskirts of the county seat. The plan also includes the development of a food industrial park and rural tourism, as shown in the figure. What are the main issues presented in the map? Please extract the key information from the question.”Task-Oriented Image Summarization assesses the ability to identify and extract key information from an image. The image often contains rich information, and some parts are highly relevant to the correct answer and need to be summarized accordingly.
Example Answer: The key points in this question are urbanization rate, central town, population migration, and food industrial park.
Example Question: “Identify the main issues shown in the image.”
Example Answer: “The main issues in the image include the spatial relationship between Central Town A, the food industrial park, and the floodplain; the topological relationship between the expressway and the nationally protected wetland; the distance between the planned interchange on the west side of the expressway and the existing one; and the overall distribution of central towns.”
VQA Data Annotators: Siqi Zha, Yeyang Fu, Chuang Deng, Fenghong An, Hanying Li, Jiayi Fan
Internal Testers: Jialu Yu, Yingqi Guo
Note: This content is part of a manuscript under submission. Please do not cite until it is officially published. The arxiv article and dataset along with the code will be launched soon.