Definition
A visual representation of unstructured data, such as a scanned PDF or screenshot, that requires layout-aware ingestion techniques—typically via OCR or Multimodal LLMs—to preserve the spatial and semantic context of tables, headers, and diagrams for accurate retrieval.
In RAG, it's a data source to be parsed, not a UI asset or a generic photograph.
"A topographical map where the physical layout of the peaks and valleys is as critical to the data as the names of the locations themselves."
Conceptual Overview
A visual representation of unstructured data, such as a scanned PDF or screenshot, that requires layout-aware ingestion techniques—typically via OCR or Multimodal LLMs—to preserve the spatial and semantic context of tables, headers, and diagrams for accurate retrieval.
Disambiguation
In RAG, it's a data source to be parsed, not a UI asset or a generic photograph.
Visual Analog
A topographical map where the physical layout of the peaks and valleys is as critical to the data as the names of the locations themselves.