Definition
A visual representation of unstructured data, such as a scanned PDF or screenshot, that requires layout-aware ingestion techniques—typically via OCR or Multimodal LLMs—to preserve the spatial and semantic context of tables, headers, and diagrams for accurate retrieval.
In RAG, it's a data source to be parsed, not a UI asset or a generic photograph.
"A topographical map where the physical layout of the peaks and valleys is as critical to the data as the names of the locations themselves."
- Optical Character Recognition (OCR)(Prerequisite)
- Layout Analysis(Component)
- Multimodal RAG(Architectural Context)
- Table Extraction(Component)
Conceptual Overview
A visual representation of unstructured data, such as a scanned PDF or screenshot, that requires layout-aware ingestion techniques—typically via OCR or Multimodal LLMs—to preserve the spatial and semantic context of tables, headers, and diagrams for accurate retrieval.
Disambiguation
In RAG, it's a data source to be parsed, not a UI asset or a generic photograph.
Visual Analog
A topographical map where the physical layout of the peaks and valleys is as critical to the data as the names of the locations themselves.