Templates
Some libraries, built on top of constrained decoding backends, can take input via:
- common schema templates (e.g., JSON templates)
- programming constructs (e.g., Pydantic models)
- proprietary schemas optimized for latency, costs, output quality (e.g., BAML)
Under the hood, these libraries compile the templates into Regex or CFGs, abstracting away the pain of writing, maintaining, debugging them directly. For example, here is a pydantic model that represents a JSON schema of fields and tables extracted from a receipt:
Regex
\{\s*"store_name":\s*"[^"]+",\s*"store_address":\s*"[^"]+",\s*"store_number":\s*(null|[0-9]+),\s*"items":\s*\[\s*((\{\s*"name":\s*"[^"]+",\s*"quantity":\s*(null|[0-9]+),\s*"price_per_unit":\s*(null|[0-9]+(\.[0-9]+)?),\s*"total_price":\s*(null|[0-9]+(\.[0-9]+)?)\s*\})(,\s*(\{\s*"name":\s*"[^"]+",\s*"quantity":\s*(null|[0-9]+),\s*"price_per_unit":\s*(null|[0-9]+(\.[0-9]+)?),\s*"total_price":\s*(null|[0-9]+(\.[0-9]+)?)\s*\}))*)?\s*\],\s*"tax":\s*(null|[0-9]+(\.[0-9]+)?),\s*"total":\s*(null|[0-9]+(\.[0-9]+)?),\s*"date":\s*(null|"[0-9]{4}-[0-9]{2}-[0-9]{2}"),\s*"payment_method":\s*"(cash|credit|debit|check|other)"\s*\}
Pydantic model
class Item(BaseModel):
name: str
quantity: Optional[int]
price_per_unit: Optional[float]
total_price: Optional[float]
class ReceiptSummary(BaseModel):
store_name: str
store_address: str
store_number: Optional[int]
items: List[Item]
tax: Optional[float]
total: Optional[float]
date: Optional[str] = Field(pattern=r'\d{4}-\d{2}-\d{2}', description="Date in the format YYYY-MM-DD")
payment_method: Literal["cash", "credit", "debit", "check", "other"]