Today, we're excited to announce the release of Nanonets-OCR-s, a state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction. This powerful model transforms documents into structured markdown with intelligent content recognition and semantic tagging.
Most publicly available image-to-text models focus mainly on extracting plain text from images. However, they typically fail to distinguish between regular content and elements like watermarks, signatures, or page numbers. Visual elements such as images are often ignored, and complex structures like tables, checkboxes, and equations are not handled effectively, making these models less suitable for downstream tasks.
Unlike conventional OCR systems that simply extract plain text, Nanonets-OCR-s understands document structure and content context (like tables, equations, images, plots, watermarks, checkboxes, etc.), delivering intelligently formatted markdown output that's ready for downstream processing by Large Language Models.
Download the model from Hugging Face.
Key Features & Capabilities
- LaTeX Equation Recognition
- Intelligent Image Description
- Signature Detection & Isolation
- Watermark Extraction
- Smart Checkbox Handling
- Complex Table Extraction
Let's explore each capability in detail:
1. LaTeX Equation Recognition
Automatically converts mathematical equations and formulas into properly formatted LaTeX syntax. Inline mathematical expressions are converted to LaTeX inline equations, while displayed equations are converted to LaTeX display equations.

2. Intelligent Image Description
Describes images within documents using structured tags, making them digestible for LLM processing. The model can describe single or multiple images (logos, charts, graphs, qr codes, etc.) in terms of their content, style, and context. The model predicts the image description within the <img> tag. Page number is predicted within the <page_number> tag.

3. Signature Detection & Isolation
Identifies and isolates signatures from other text in documents, crucial for legal and business document processing. The model predicts the signature text within the <signature> tag.

4. Watermark Extraction
Similar to signature detection, the model can detect and extract watermark text from documents. The model predicts the watermark text within the <watermark> tag.

5. Smart Checkbox Handling
Converts form checkboxes and radio buttons into standardized Unicode symbols for consistent processing. The model predicts the checkbox status within the <checkbox> tag.

6. Complex Table Extraction
Extracts complex tables from documents and converts them into markdown and html tables.

Training Details
To train our new Visual-Language Model (VLM) for precise optical character recognition (OCR), we curated a dataset comprising over 250,000 pages. The dataset includes the following document types: research papers, financial documents, legal documents, healthcare documents, tax forms, receipts, and invoices. Additionally, the collection features documents containing images, plots, equations, signatures, watermarks, checkboxes, and complex tables.
We have used both synthetic and manually annotated datasets. We first trained the model on the synthetic dataset and then fine-tuned it on the manually annotated dataset.
We selected the Qwen2.5-VL-3B
model as the base model for our Visual-Language Model (VLM). This model was subsequently fine-tuned on the curated dataset to improve its performance on document-specific Optical Character Recognition (OCR) tasks.
Limitations:
• We have not trained the model on handwritten text.
• Model can suffer from hallucination.
Use cases
Nanonets-OCR-s streamlines complex document workflows across industries by unlocking structured data from unstructured formats.
Academic & Research: Digitizes papers with LaTeX equations and tables.
Legal & Financial: Extracts data from contracts and financial documents, including signatures and tables.
Healthcare & Pharma: Accurately captures text and checkboxes from medical forms.
Corporate & Enterprise: Transforms reports into searchable, image-aware knowledge bases.
In a world moving towards LLM-driven automation, unstructured data is the biggest bottleneck. Nanonets-OCR-s bridges that gap, transforming messy documents into the clean, structured, and context-rich markdown that modern AI applications demand.
Try it today
We have integrated Nanonets-OCR-s with docext feel free to try it. Feel free to start a discussion on GitHub or Hugging Face if you have any questions.