AEC-Bench
A 2025 multimodal benchmark for evaluating AI agents on real-world AEC tasks including drawing understanding, cross-sheet reasoning, and project-level document coordination.
Definition
AEC-Bench is a rigorous evaluation framework published in 2025 for testing multimodal AI agents on authentic architecture, engineering, and construction tasks. The benchmark fills a critical gap: while general AI benchmarks like MMMU and DocVQA evaluate broad visual reasoning, none capture the specific challenges of construction documentation—tightly packed drawing annotations, cross-sheet reference chains, project-level discipline coordination, and the specialized visual grammar of AEC documents. Tasks include drawing understanding (reading dimensions, identifying elements, interpreting symbols), cross-sheet reasoning (following reference bubbles across detail, plan, and elevation sheets), and project-level coordination (identifying conflicts between structural, architectural, and MEP drawings). AEC-Bench reveals that state-of-the-art multimodal LLMs still struggle with AEC-specific tasks that experienced engineers handle routinely—particularly cross-sheet reasoning and dense annotation interpretation. Results are already influencing the architecture of platforms like BIMgent and multimodal construction document AI tools.
Examples
Testing whether an AI can correctly identify that a window schedule references a detail contradicting the elevation
Benchmarking five multimodal models on extracting and cross-checking structural beam sizes across 20 drawing sheets
Using AEC-Bench scores to select the best foundation model for a construction document AI platform
Nomic Use Cases
See how Nomic applies this in production AEC workflows:
Frequently Asked Questions
AEC-Bench is a rigorous evaluation framework published in 2025 for testing multimodal AI agents on authentic architecture, engineering, and construction tasks. The benchmark fills a critical gap: while general AI benchmarks like MMMU and DocVQA evaluate broad visual reasoning, none capture the specific challenges of construction documentation—tightly packed drawing annotations, cross-sheet reference chains, project-level discipline coordination, and the specialized visual grammar of AEC documents. Tasks include drawing understanding (reading dimensions, identifying elements, interpreting symbols), cross-sheet reasoning (following reference bubbles across detail, plan, and elevation sheets), and project-level coordination (identifying conflicts between structural, architectural, and MEP drawings). AEC-Bench reveals that state-of-the-art multimodal LLMs still struggle with AEC-specific tasks that experienced engineers handle routinely—particularly cross-sheet reasoning and dense annotation interpretation. Results are already influencing the architecture of platforms like BIMgent and multimodal construction document AI tools.
Testing whether an AI can correctly identify that a window schedule references a detail contradicting the elevation. Benchmarking five multimodal models on extracting and cross-checking structural beam sizes across 20 drawing sheets. Using AEC-Bench scores to select the best foundation model for a construction document AI platform.
Automated Drawing Review: Automatically review drawings against building codes, internal standards, and client requirements. Automated Code Compliance: Check drawings against 380+ building codes and standards with cited answers.


