How to retrieve results for PDF files using an API?

This document will help you retrieve results of a PDF file using the Nanonets API

Before we start, lets have an example in mind to make things clearer

1. Make sure you have an OCR model on our platform
2. The PDF contains 3 pages


Step 1: Upload PDF with Prediction API
Eg: https://app.nanonets.com/api/v2/OCR/Model/{MODEL_ID}/LabelFile/
Documentation ->https://nanonets.com/documentation/#operation/OCRModelLabelUrlsByModelIdPost

For every page in the PDF you'll get a request_file_id, this will be the same for all the pages. If you want to retrieve data at a future date, you should save this request_file_id

Step 2: Retrieve Page IDs using request_file_id

Every page in the PDF will have a unique page id at nanonets side. Once you have the page ids, you can retrieve the data against the particular page.

You can retrieve the page ID by making a GET request to the following endpoint
https://app.nanonets.com/api/v2/Inferences/Model/{YOUR_MODEL_ID}/InferenceRequestFiles/{request_file_id}

The response will be something like


{   "request_file_id""request_file_id",
    "page_ids":
 [      "page_id_1",
        "page_id_2",
        "page_id_3"
 ]
}

Step 3: Retrieve data for the model and then filter by Page IDs

Once you have the page IDs, you can retrieve data now

You can do this using the Get All Prediction Files API

Once you have this data at your side, you can filter through the moderated_images and unmoderated_images array and for every page in this array you can compare the "id" key against page_id_1, page_id_2, and page_id_3

After this, if the image has been moderated, the moderated_boxes array will have the info. The status of moderated_box will be "moderated"

{
    "moderated_images_count": 0,
    "unmoderated_images_count": 3,
    "moderated_images": [],
    "unmoderated_images": [
        {
            "model_id": "YOUR_MODEL_ID",
            "request_file_id": "request_file_id",
            "day_since_epoch": x,
            "is_moderated": false,
            "hour_of_day": 16,
            "id": "page_id_1",
            "url": "",
            "predicted_boxes" : [],
            "moderated_boxes" : [
                {
                    "label": "label_1",
                    "xmin": 1370,
                    "ymin": 315,
                    "xmax": 1533,
                    "ymax": 340,
                    "score": 0.8414416,
                    "ocr_text": "MUTUAL",
                    "status": "correctly_predicted"
                },
                {
                    "label": "label_2",
                    "xmin": 1152,
                    "ymin": 450,
                    "xmax": 1612,
                    "ymax": 476,
                    "score": 0.5940531,
                    "ocr_text": "ABCD",
                    "status": "moderated"
                }
            ]
            "size": {
                "width": 2479,
                "height": 3508
            },
            "page": 0,
            "original_file_name": "filename.pdf",
            "custom_response": null,
            "assigned_member": "",
            "is_deleted": false,
            "source": "api",
            "no_of_fields": x,
            "cost": 0,
            "payable_cost": 0,
            "status": "success",
            "retries": 0
        }   
}







Was this article helpful?