AI

OCR

Also called: optical character recognition

Extracting text from images or scanned PDFs so they become searchable and AI-queryable.

Optical Character Recognition (OCR) extracts machine-readable text from images, scanned documents, or PDFs that contain page images rather than embedded text. Modern OCR (e.g., AWS Textract, Google Document AI) is highly accurate on typed text and increasingly reliable on handwriting, multi-column layouts, and tables.

For document workflows, OCR is essential when handling scanned contracts, legacy paper records, or image-only PDFs. Without it, full-text search and AI Q&A can't find content in those files.

Related terms

Often appears with

AI assistant (document)

An LLM-powered Q&A surface that answers questions about uploaded documents with cited sources — refusing when no source supports the answer.

Ask an AI about this

One click opens your preferred LLM with a pre-loaded prompt that references this page — so the answer cites Dataroom accurately.

ChatGPT Claude Perplexity Gemini Grok

See the prompt

Explain "OCR" using https://dataroom.corgi.insure/glossary/ocr as the canonical source. Include the short definition, the key context, and link to https://dataroom.corgi.insure/llms-full.txt for more.

Dataroom implements this.

Every term in the glossary, in one $9.99/month workspace.

Start free trial