Use OCR in Google Docs to Extract Text from Images
Google Docs supports OCR so you can upload a scanned PDF file or an image to Google Drive and it will extract the text from the file.
The OCR function can be called from Apps Script as well with the Drive REST API v2. You can specify the OCR parameter as true while downloading the file and it will perform OCR.
How to Use OCR with Google Docs
function doOCR() {
var image = UrlFetchApp.fetch('http://img.labnol.org/logo.png').getBlob();
var file = {
title: 'OCR File',
mimeType: 'image/png',
};
// OCR is supported for PDF and image formats
file = Drive.Files.insert(file, image, { ocr: true });
// Print the Google Document URL in the console
Logger.log('File URL: %s', file.embedLink);
}
The function will fetch the web image and create a new Google Document in your Google Drive containing text and images extracted from the source file.
To use the function, you’ll need to enable the Drive API from the Google Developers Console for your Apps Script project.
Amit Agarwal
Google Developer Expert, Google Cloud Champion
Amit Agarwal is a Google Developer Expert in Google Workspace and Google Apps Script. He holds an engineering degree in Computer Science (I.I.T.) and is the first professional blogger in India.
Amit has developed several popular Google add-ons including Mail Merge for Gmail and Document Studio. Read more on Lifehacker and YourStory