What is OCR?

OCR or Optical Character Recognition is the technology that enables us to process images with text in them, extract the text and convert it into a machine readable format. What this means is that a person can take pictures of receipts, invoices, number plates, shipping container numbers, etc and use OCR technology to extract useful information in these images and put them in a format that a computer can read, edit, index and store for future use.

How does it work?

The older OCR methods used computer vision techniques like thresholding and contour detection to separate the characters from the rest of the image. As technology has progressed, so have the methods to solve the problem for character recognition. Now, OCR technology leverages the power of deep neural networks to localise and recognise the text present in an image automatically. These neural networks are trained on images and their ground truth values before they can be used on unseen data with high accuracy.

What is the best OCR library for Python?

There are several open source OCR libraries available in python. These include the Tesseract engine, tensorflow attention OCR, kraken OCR, etc. Tesseract is the most popular OCR engine in the open source community and is built on a convolutional plus recurrent neural network mechanism that allows it to work well on sequential data. Attention OCR uses attention mechanisms to learn long range dependencies better while reading text in an image and hence performs better than tesseract but using tensorflow has a steeper learning curve, something you can circumvent with tesseract. A better and more intuitive alternative is the Nanonets OCR API which allows you to build models on custom data and get predictions in a convenient format without the need for any machine learning or OCR expertise.

Is it safe to use online OCR services?

OCR technology can be applied by organisations and individuals for a variety of tasks to process a lot of data. With different kinds of data and varying business needs, the accuracy of OCR algorithms can make or break your applications. Hence it is important to choose an OCR service that can deliver high accuracies consistently and in less time. The Nanonets OCR API makes this possible by providing superior machine learning models trained on a variety of data to deliver high accuracy. Nanonets also provides the option of deploying models on cloud using docker images or on-premise depending on the sensitivity of data the organisation is dealing with, among other concerns.

What is the best OCR software in the market?

Some of the popular OCR softwares available in the market currently include Abby FineReader, Adobe Acrobat Pro DC and Nanonets. Most of these software come as part of software packages that can’t be modified for your specific needs. They also do not support all the formats your images can be in. Moreover, these softwares are not robust to images that are blurry, noisy, where text is tilted or in different font sizes and formats. This is where the Nanonets API shines. Nanonets lets you build models for custom data that allows you to work with noisy images and still deliver results with high accuracy and greater speeds.

What can OCR be used for?

  • Number plates - number plate detection can be used to implement traffic rules, track cars in your taxi service parking, enhance security in public spaces, corporate buildings, malls, etc.
  • Legal documents - Dealing with different forms of documents - affidavits, judgments, filings, etc. digitizing, databasing and making them searchable.
  • Table extraction - Automatically detect tables in a document, get text in each cell, column headings for research, data entry, data collection, etc.
  • Banking - analyzing cheques, reading and updating passbooks, ensuring KYC compliance, analyzing applications for loans, accounts and other services.
  • Menu digitization - extracting information from menus of different restaurants and putting them into a homogeneous template for food delivery apps like swiggy, zomato, uber eats, etc.
  • Healthcare - have patients medical records, history of illnesses, diagnoses, medication, etc digitized and made searchable for the convenience of doctors.
  • Invoices - automating reading bills, invoices and receipts, extracting products, prices, date-time data, company/service name for retail and logistics industry.

How much money can you save from using the OCR Service?

With Nanonets, we were able to reduce the time taken to process claims by 90% by automating invoice digitization. Though the accuracy was lower compared to humans, the number of manual reviewers was reduced along with the number of passes required for each invoice to make sure there’s no error. This meant that the company reduced its cost by 50% while also providing customers more convenience in their services and it’s employees less repetitive and more engaging work.

Start Building Models for Free Today

Have a query?