Optical Character Recognition (OCR) extracts machine-readable text from images, scanned documents, or PDFs that contain page images rather than embedded text. Modern OCR (e.g., AWS Textract, Google Document AI) is highly accurate on typed text and increasingly reliable on handwriting, multi-column layouts, and tables.
For document workflows, OCR is essential when handling scanned contracts, legacy paper records, or image-only PDFs. Without it, full-text search and AI Q&A can't find content in those files.