A guide to OCR and PDF Data Extraction in Microsoft Power Automate

Tired of Microsoft Power Automate's manual PDF data extraction data workflow? Let's make the process a whole lot easier. Now, you can automatically extract text from images, scanned files, and PDFs, saving time and reducing errors.

Imagine never having to type out information from invoices, contracts, or other documents. That's what automated workflows create for you. In this article, we'll explore how you can integrate Power Automate, and Nanonets to streamline your PDF data extraction process and boost your productivity.

What is OCR and PDF data extraction?

Before we dive into the specifics of Power Automate and Nanonets, let's take a moment to understand what OCR and PDF data extraction entail. OCR, short for Optical Character Recognition, is a technology that enables the conversion of images containing text into machine-readable text. This is particularly useful when dealing with scanned documents or PDFs that are not searchable.

A glimpse into Nanonets OCR extracts structured data from PDFs and scanned documents
A glimpse into Nanonets OCR extracts structured data from PDFs and scanned documents

PDF data extraction, on the other hand, involves extracting specific data fields from PDF documents. This could include information like invoice numbers, dates, customer names, and more. By automating this process, businesses can save time, reduce errors, and free up their employees to focus on higher-value tasks.

What is Microsoft Power Automate?

Microsoft Power Automate, previously known as Microsoft Flow, is a cloud-based service that empowers users to create and automate workflows across various applications and services. By connecting to over 300 services, including Microsoft's offerings like SharePoint, Outlook, Excel, OneDrive, and Dynamics 365, as well as third-party apps such as Twitter, Dropbox, and Google Services, Power Automate streamlines business processes and boosts productivity. 

What is Power Automate used for?
What is Power Automate used for?

Power Automate's key strengths include its ability to handle complex scenarios through conditional logic (if...then...else statements), making it a powerful tool for business process automation, not just single-task automation. It offers pre-built templates for routine tasks, reducing the technical barrier for non-programmers and providing robust tools for developers to create sophisticated automation.

How do document automation workflows work in Power Automate?

Document automation workflows are among Power Automate's most compelling features, enabling businesses to automate repetitive document management tasks, enhance efficiency, and minimize human errors. Power Automate allows users to create and manage workflows involving documents, including generating, editing, sharing, and storing, across various Microsoft services and third-party platforms.

For example, an approval workflow can be created to automate the document approval process. Power Automate can detect when a new document is added to a SharePoint folder or OneDrive directory, automatically email the appropriate person for review, update the status, and send notifications once the document is approved or rejected.

Power Automate templates

Another example is a document generation and storage workflow, where data from Microsoft Forms or Dynamics 365 can be used to automatically generate documents in Word or Excel, convert them to PDF, store them in a specific SharePoint folder, or send them via email.

Power Automate also enables data extraction and integration, allowing users to extract specific data from documents using tools like Nanonets and automatically update records in Dynamics 365 or Excel spreadsheets. Its flexibility allows developers to handle complex scenarios, create custom connectors, and implement error handling and conditional logic.

Power Automate's document automation workflows transform time-consuming manual processes into efficient automated tasks, freeing employees to focus on more value-driven activities. Its versatility, simplicity, and deep integration with various services make it an essential tool for organizations seeking to streamline their document management processes.

Enhancing OCR and PDF data extraction in Power Automate with Nanonets

Nanonets is a powerful tool that offers pre-trained data extraction models that can extract useful data from documents. We support all common document typesand can easily train specialized models ofcustom document types. Leveraging Nanonets API in Power Automate opens up possibilities for developing highly efficient automated workflows, particularly in document data extraction.

To understand this, let's look at a common business scenario. Imagine a company receiving a large volume of invoices daily. With Nanonets and Power Automate, you could automate the process of extracting the necessary data from these invoices and store it in a database or use it in another application like Dynamics 365.

Here's an example of how to automate invoice processing using Power Automate and Nanonets

  1. An invoice document is received and uploaded to a SharePoint folder or received as an email attachment in Office 365 Mail.
  2. A Power Automate workflow triggers upon the addition of this new document. Using the "When a file is created" or "When a new email arrives" trigger, Power Automate can automatically detect the new invoice.
  3. The workflow then sends the document to the Nanonets API via a HTTP POST request. This could be done by using a Custom Connector or the built-in HTTP action in Power Automate.
  4. Nanonets processes the document with its machine learning model, specifically trained for invoice data extraction, and returns the extracted data in a structured format, like JSON.
  5. The Power Automate workflow receives this data and can then parse and use it as required. This could involve updating an Excel spreadsheet, creating a new item in a SharePoint list, or updating a record in Dynamics 365.

In the Dynamics 365 context, Nanonets' ready-to-use integration with Dynamics 365 would make the process even more seamless. Let's explore another scenario to illustrate this:

  1. An invoice document is uploaded into Dynamics 365 as an attachment to a specific record.
  2. A Power Automate workflow is triggered based on this action. The workflow then sends the invoice document to the Nanonets API for processing, taking advantage of the ready-to-use integration that Nanonets offers with D365.
  3. Once the data is returned from Nanonets, the workflow then parses the structured data and updates the relevant fields in the Dynamics 365 record. This could include details like the invoice number, date, total amount, etc.

These workflows help to automate what can typically be a labor-intensive process, saving significant amounts of time and reducing the risk of human error. Moreover, they leverage the power of machine learning to accurately extract required data, even from complex or varying invoice formats.

In addition to invoices, this process can be applied to a range of other document types - receipts, purchase orders, delivery notes, etc. Each document type would require a machine learning model trained for that specific document, which Nanonets is capable of providing.

How Nanonets and Power Automate can streamline your business processes

Nanonets' integration with Power Automate and Dynamics 365 opens up significant possibilities for businesses looking to automate their document data extraction workflows. These integrations make it easier for organizations to harness the power of machine learning in their everyday processes, leading to greater operational efficiency and accuracy.

Here are a variety of examples showcasing how Nanonets can be utilized in Power Automate for different automated document data extraction workflows:

  1. Expense reports: Scan uploaded receipts in SharePoint, extract data with Nanonets, and automatically populate an Excel sheet for expense tracking.
  2. Contract management: Upload contracts to a specific OneDrive folder, extract key details like parties involved, dates, and clauses using Nanonets, and update a SharePoint list for contract management.
  3. Invoice processing: Send invoices received via email to Nanonets for data extraction, and use the returned data to create or update records in Dynamics 365 Finance. Check out our deep dive in to Business Central accounts payable automation.
  4. Order fulfillment: Extract data from purchase orders uploaded to a Teams channel using Nanonets, and trigger a Power Automate workflow to create a new order in Dynamics 365 Supply Chain Management.
  5. HR onboarding: When new employee documents are added to a SharePoint folder, extract key details like name, job title, and start date with Nanonets, and then create a new employee record in Dynamics 365 Human Resources.
  6. Customer correspondence: Extract key information from customer letters or emails using Nanonets, and automatically create or update a customer service case in Dynamics 365 Customer Service.
  7. Project management: When a new project proposal is added to a Teams channel, use Nanonets to extract key details like project title, proposed timeline, and budget, and create a new project record in Dynamics 365 Project Operations.
  8. Sales lead generation: Extract data from business cards using Nanonets and use the returned data to create new leads in Dynamics 365 Sales.
  9. Insurance claims: When an insurance claim form is uploaded to a SharePoint folder, extract the claim details with Nanonets, and update a claim record in a custom-built Power App.
  10. Health records: When medical documents are uploaded to a secure OneDrive folder, extract patient data with Nanonets, and update the patient's record in a healthcare management application.

How to set up Nanonets in Power Automate

Setting up Nanonets in Power Automate involves building a custom connector. Here is a step-by-step guide to creating a custom connector for the Nanonets API:

1. Get your API Key from Nanonets

The first step is to generate an API key from your Nanonets account. This key will be used to authenticate your requests to the Nanonets API. You can find instructions on how to get your API key here.

2. Create a custom connector in Power Automate

  • Navigate to Flow and sign in to your account.
  • From the left navigation bar, select "Data" and then "Custom connectors".
  • Click "+ New custom connector" and choose "Create from blank".
  • Give your connector a name and click "Continue".

3. Set up the general details

  • For "Scheme", choose "HTTPS".
  • In the "Host" field, enter the Nanonets API base URL (it should be something like "app.nanonets.com").
  • Click "Security" in the navigation panel on the left.

4. Set up the security details and connector actions

Note : For this section, you can use the Nanonets API Documentation to configure the security and action details.

  • Define the security details. You can use the api-key authentication method to authenticate using your API key.
  • Create a New Action.
  • Define and fill details of your Nanonets model prediction endpoint to create the action.

6. Test the connector

  • Click "Test" in the navigation panel on the left.
  • You may need to create a new connection. If so, click "+ New connection".
  • Choose an action to test, fill in any required inputs, and click "Test operation".

Once the custom connector is set up, you can use it in your Power Automate flows just like any other connector. You'll be able to choose the actions you defined for the connector and use the data returned from Nanonets in other actions within your workflow.

Final thoughts

In today's fast-paced business environment, automating manual processes is no longer a luxury but a necessity. Power Automate's intuitive interface and wide range of connectors make it an excellent choice for automating tasks across various applications. However, its built-in OCR capabilities may fall short when it comes to document processing and data extraction.

That's where Nanonets can augment your workflow. With its advanced OCR engine and specialized document processing models, Nanonets seamlessly integrates with Power Automate to deliver unparalleled accuracy and efficiency in extracting data from invoices, forms, and other documents.

By combining the strengths of these two platforms, businesses can achieve end-to-end automation of their document workflows, saving time, reducing errors, and freeing up resources to focus on higher-value tasks.
The future of work is automated, and those who embrace it will be well-positioned to thrive in this new landscape.

FAQs

Can I use Power Automate to extract data from scanned documents?

Yes, you can use Power Automate to extract data from scanned documents. Power Automate has built-in OCR capabilities under its AI Builder feature. By integrating with Nanonets, you can enhance the accuracy and efficiency of data extraction from scanned files.

How can I automate data entry using Power Automate?

Power Automate can automate data entry by extracting information from documents using its OCR features and inputting that data into various systems or applications. You can create workflows that process incoming documents, extract the relevant data, and then route that data to the appropriate destination. Nanonets can further improve this process by providing more accurate and efficient OCR, data validation, routing, and automated workflow capabilities.

Is it possible to extract text from invoices using Power Automate?

When integrated with Nanonets' OCR, Power Automate allows you to automate invoice data extraction. Set up flows that automatically process incoming invoices, extract key information like invoice numbers, dates, and amounts, and route the extracted data to your accounting system or trigger approval workflows, saving time and reducing errors.

Can I use Power Automate to automate document processing workflows?

Power Automate is a great tool for automating document processing workflows. You can create flows that automatically route documents through various stages, such as data extraction, validation, approval, and storage. Nanonets can be integrated into these workflows to handle the data extraction step, providing fast and accurate OCR results that can then be fed into the rest of the Power Automate workflow. This combination of tools allows for end-to-end automation of document processing tasks.

How does Nanonets improve the OCR capabilities of Power Automate?

Nanonets significantly improves Power Automate's OCR capabilities by providing a more advanced and accurate OCR engine. While Power Automate's built-in OCR can handle basic data extraction tasks, Nanonets offers a dedicated OCR solution with over 95% accuracy, support for multiple languages, and the ability to handle complex document layouts. By integrating Nanonets into Power Automate workflows, users can benefit from these enhanced OCR capabilities while taking advantage of Power Automate's workflow features.

Can I extract data from PDF forms using Power Automate and Nanonets?

Power Automate can be used to automate the process of routing and managing PDF forms, while Nanonets can be used to accurately extract the data from the forms using its advanced OCR capabilities. Nanonets can handle both standard and non-standard PDF forms and can extract data from specific fields or entire pages as needed. The extracted data can then be fed back into Power Automate workflows for further processing or storage.