Structured LLM outputs

LLMs mostly produce syntactically valid outputs when we try generating JSON, XML, code, etc., but they can occasionally fail due to their probabilistic nature. This is a problem for developers as we use LLMs programmatically, for tasks like data extraction, code generation, tool calling, etc.

breaktasks — LLMs came with the promise of agents and automation. But without structured outputs, it’s just a pipe dream.

There are many deterministic ways to ensure structured LLM outputs. If you are a developer, this handbook covers everything you need.

What happens under-the-hood?
What are the best tools & techniques?
How to pick the right tools & techniques?
How to build, deploy, and scale systems?
How to optimize for latency and cost?
How to improve the quality of output?

Motivation

Structured generation is moving too fast. Most resources you find today are already outdated. You have to dig through multiple academic papers, blogs, GitHub repos, and other resources.

This handbook brings it all together in a living document that updates regularly.

How to use this

You can read it start-to-finish, or treat it like a lookup table.

Who are we?

We're the maintainers of Nanonets-OCR models (VLMs to convert documents into clean, structured Markdown) and docstrange (open-source document processing library).

Subscribe to our newsletter

Updates from the LLM developer community in your inbox. Twice a month.

Developer insights
Latest breakthroughs
Useful tools & techniques

Motivation​

How to use this​

Who are we?​

Subscribe to our newsletter

Motivation

How to use this

Who are we?