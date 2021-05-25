Google Cloud Search has added support for Optical Character Recognition (OCR) based text extraction for PDF files that contain images, improving discoverability of such PDFs and making it easier for users to search relevant documents.

It is worth mentioning that the PDFs must be submitted using the Asynchronous Indexing mode and the PDF file must contain only scanned images for the Cloud Search to use OCR. If it contains any native text content, Cloud Search does not apply OCR to images.

Google Cloud Search uses OCR text extraction for PDF files including:

Physical contract documents

Engineering documents that contain annotations or labels

Physical customer invoices, and more

"Many critical business documents are either in physical form or as scanned versions of those physical documents. With OCR support, admins can now easily index these documents for Cloud Search, making it easier for users to quickly find relevant scanned documents," Google wrote in a blog post on Monday.

Google says the new feature will also eliminate the need to extract the text offline from PDFs containing images before indexing these documents on Cloud Search.

OCR-based text extraction for PDFs containing images is available to Google Workspace Enterprise Plus and Google Cloud Search customers. It is not available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Education Fundamentals, Education Plus, Frontline, and Nonprofits, as well as G Suite Basic and Business customers.