We began by reviewing urban planning curricula from leading institutions in both China and the United States, including Peking University, Tongji University, MIT, and Harvard GSD. This analysis identified core knowledge domains and learning objectives in contemporary planning education.
1. Syllabus Reference: We further examined professional qualification examinations for urban planners across multiple countries, drawing from China’s Registered Urban Planner Exam, the U.S. AICP (American Institute of Certified Planners) certification, the UK’s RTPI (Royal Town Planning Institute) accreditation, Australia’s PIA (Planning Institute of Australia), and Canada’s CIP (Canadian Institute of Planners) to establish a foundational disciplinary framework.
2. Knowledge Classification: By integrating disciplinary knowledge systems and exam syllabi, we classified the dataset into 4 major categories, 24 intermediate classes, and 81 subcategories. Using Content Validity Index (CVI) and Scale-Level CVI (S-CVI=1.0), we confirmed strong alignment between our classification system and international planner certification frameworks.
3. Competency Dimensions: Adopting Bloom’s revised taxonomy, we developed a structured assessment matrix covering five cognitive levels of planning competencies. Each knowledge point was mapped to specific cognitive tasks:
- Remember: Recalling facts, terminology, and fundamental concepts.
- Understand: Interpreting, paraphrasing, and explaining key information.
- Apply: Synthesizing and contextualizing planning scenarios.
- Analyze: Deconstructing text structures, identifying implicit assumptions, and diagnosing issues.
- Evaluate: Comparing standards, assessing judgments, and critiquing solutions. Higher-order tasks also incorporated preliminary Create-level challenges, such as generating actionable recommendations for urban development issues.
4. Reasoning Design: To enhance model performance evaluation, particularly for complex reasoning tasks, we systematically embedded Chain-of-Thought (CoT) principles into question design. Scenarios included contextual preconditions, guided prompts, and deliberate logical fallacies, accompanied by discipline-specific analytical pathways to support CoT-match scoring.
5. Assessment Procedure: For validation, we convened a panel of expert urban planners from academia and practice to workshop core knowledge domains and review assessment items. Each question was paired with detailed scoring rubrics specifying evaluation criteria and acceptable response elements. We employed a dual human-machine scoring system, with algorithmic metrics calibrated against human-annotated benchmarks to ensure reliability.
Figure 1: PlanBench-Text Architecture.