For engineering teams

Document extraction your pipeline can rely on.

A production-grade API that reads any document and returns structured JSON. Plug into LangChain, LlamaIndex, or your own agent stack. Ranked #1 on the IDP Leaderboard.

How it works

Send. Extract. Validate. Deliver. Your pipeline handles decisions. The API handles documents.

POST /api/v2/predict/urls
{
"urls": [
"invoice-q2.pdf"
],
"model_id": "your-model"
}
200 OK · 1.2s
1
Send any document
POST a file URL, base64, or multipart upload. PDFs, images, Word docs, spreadsheets, scanned pages. No preprocessing, no template setup, no per-sender configuration required.
Response
{
"vendor": "Acme Corp",
"amount": 4200.00,
"gl_code": "6100",
"confidence": 0.98
}
2
Get structured output
Fields extracted as validated JSON or Markdown. Tables preserved with row and column structure. Confidence scores per field. Ready to insert into your database, pass to your agent, or post to your ERP.
Context graph rules
GL code
Map vendor → GL account
Duplicate
Flag if INV seen in 30d
PO match
3-way: PO + receipt + INV
Threshold
>$10k route to CFO
3
Apply your business rules
Validation logic, GL coding rules, approval thresholds, and exception conditions defined once in a context graph. The extraction layer applies them automatically — no conditional logic in your application code.
Delivery targets
ERP posting
SAP · NetSuite · Oracle
Webhook
POST to your endpoint
Agent hand-off
LangChain · LlamaIndex
Review queue
Exceptions only
4
Deliver to any system
Webhooks, REST callbacks, direct ERP posting, or downstream agent hand-off. Exceptions routed to your review queue with full context. The pipeline runs end-to-end without manual intervention.

Use cases

Every document problem engineering teams run into.
One API, one integration.

Document extraction API

POST any document, get structured JSON back. Invoices, contracts, forms, receipts, bank statements. Tables preserved as arrays. Line items extracted with quantities, amounts, and codes. Plugs into any backend.

REST APIJSON outputTable extraction

Agent pipeline integration

Drop Nanonets into your LangChain or LlamaIndex pipeline as a document reader tool. The extraction layer handles unstructured input so your agent logic can focus on decisions, not parsing.

LangChainLlamaIndexAgent tooling

Custom model training

Start with a pre-trained extraction model and fine-tune on your document types. Upload samples, label fields, and deploy a model specific to your vendors, formats, and business rules.

Fine-tuningCustom modelsDomain adaptation

Webhook-triggered automation

Configure webhooks to fire on document receipt, extraction complete, or exception flagged. Build event-driven pipelines without polling. Retry logic and delivery guarantees included.

WebhooksEvent-drivenCallbacks

Multi-format ingestion

One API endpoint handles PDFs, scanned images, Word documents, spreadsheets, and email attachments. No per-format routing, no format detection code. The same structured output regardless of input type.

PDFImagesMulti-format

ERP and database posting

Pre-built connectors push extracted data directly to SAP, NetSuite, Oracle, and Dynamics. Or deliver clean JSON to your own database. The pipeline ends at the system of record, not a staging file.

ERP connectorsDatabase deliveryDirect posting

Built for production

Document extraction that holds up in production. Not just in demos.

Ranked #1 on the IDP Leaderboard

Not a generic LLM wrapper. Nanonets is built specifically for document extraction accuracy — field-level validation, table structure preservation, and business rule application. Benchmarked against every major IDP platform.

Production-ready, not demo-ready

99%+ field accuracy on real documents from real vendors, not clean test sets. Handles handwriting, poor scans, multi-page invoices, and unusual layouts without manual intervention or per-sender template maintenance.

Business rules outside your application code

GL coding logic, duplicate detection, approval routing, and validation rules live in a context graph, not in your codebase. Change a rule without a deployment. Your application stays clean.

Works with the stack you already have

LangChain, LlamaIndex, REST, webhooks, direct ERP connectors. Nanonets fits into the pipeline you are building — it does not ask you to rebuild your architecture around it.

#1
on the IDP Leaderboard for document understanding and business rule application
"We evaluated every major IDP vendor. Nanonets was the only one that handled our document variety out of the box and gave us an API we could actually build on without maintaining per-sender templates."
Engineering Lead
Enterprise customer

See it run on your process, with your documents.

Start free. No credit card. Or talk to our team about your workflow.