Skip to main content

Commercial APIs

Commercial APIs implement constrained decoding and inference engines on their end, perform the LLM inference also on their end, and provide APIs to get structured outputs.

The big three APIs

Constrained decoding needs access to the model's logits (essentially its probability distributions) to mask invalid tokens. You can only use open models who have exposed their logits, and not closed models like GPT-5, Gemini 2.5, Sonnet 3.5 etc.

Instead, the big three providers (OpenAI, Gemini, Claude) offer this as a paid server-side feature. You don't have access to logits, and the provider implements constrained decoding on their end. This limits capabilities and optimizations, and doesn't allow complex schemas.

Specialized APIs

Specialized APIs have come up, that provide APIs to get structured outputs on specialized tasks. All the above points also apply to them. Good ones come out cheaper, faster, and more accurate than the big three APIs on the tasks they specialize in.

Use an unconstrained method to get outputs, and then retry with constrained decoding on invalid outputs. This could work in situations where the bad quality of outputs is due to constrained decoding causing a distribution shift in the model's natural path and probability distributiions.

Should you use these APIs?

  1. It is very convenient, if you have simple schemas and can work with the medium-good latency. You don't need to do anything, just send the API calls. You / your dev team saves days of effort you might feel is better spent on other work.
  2. It is the only way to get structured outputs from the absolute best models available today, i.e., models from the big three providers and specialized private models behind the specialized APIs.
  3. If output volumes are not high, cost difference between self-hosting and using commercial APIs is minimal. It can even be cheaper in some cases.
  4. You need to be okay with data leaving your physical servers and VPCs.
  5. You have limited schemas to choose from. If you want to try new models in the future, you'll eventually have to setup self-hosting later.
note

Commercial models have safety filters, and their APIs will return a "Refusal" string for prompts that trigger a safety violation (e.g., hate speech, self-harm, or sometimes just "sensitive topics"). If your data triggers generic safety filters (medical, legal, etc.), you have to use open source models.