Few shot prompting

We have already seen an example of few shot prompting, but we'll discuss it a bit more:

SYSTEM: You are an expense auditing engine. 
You output only valid JSON. You do not output conversational text.

USER:
### INSTRUCTIONS
Analyze the expense report below.
1. Extract "billable_items" (transport, lodging). Ignore personal items (food, entertainment).
2. Calculate "total_claim" by summing the cost of only the billable items.
3. Calculate "trip_duration_days" by counting the days between the start and end dates (inclusive).

Output valid JSON matching this schema:

{ 
	"billable_items": ["string", "string"], 
	"total_claim": number, 
	"trip_duration_days": number 
}

### EXAMPLES
Input: "Flew to London on March 1st ($300). Stayed at Marriott until March 3rd ($150). Dinner was $50."

Output: 
{ 
	"billable_items": ["Flight ($300)", "Marriott Hotel ($150)"], 
	"total_claim": 450, 
	"trip_duration_days": 3 
}

Input: 
"Rental car from June 10 to June 15 ($200). Bought snacks ($20)."

Output: 
{ 
	"billable_items": ["Rental Car ($200)"], 
	"total_claim": 200, 
	"trip_duration_days": 6 
}

Input: 
"Rental car from June 10 to June 15 ($200). Bought concert tickets ($100)."

Output: 
{ 
	"billable_items": ["Rental Car ($200)"], 
	"total_claim": 200, 
	"trip_duration_days": 6 
}

### DATA
[expense_message].       <-- We insert the actual message here.

### RESPONSE
JSON Output:

Here's what it does:

gives examples to show both the format and the logic. The LLM sees that:
- $300 + $150 equals 450
- June 10 to June 15 equals 6
- March 1st to March 3rd equals 3
- dinner and concert tickets (personal) are excluded
- flight and rental car (billable) are included
guides the LLM to inherit writing style and tone in string outputs.
- include emails
- include how you summarize
- include reasoning fields
guides the LLM how to handle edge cases.
- raising exceptions
- out of scope examples

But it is important to be careful with few shot prompting. Whatever you write in your prompt is biasing the model one way or another. Few shot prompting is a really easy way to bias the model towards a few examples you didn’t intend for.

If your user question happens to be very close but still have one defining difference from your example, you may end up having the model talk more about your example than content relevant to your example. e.g. “I have a stomach ulcer what should i do? i am allergic to antacids” with an example about another patient with stomach ulcer who is not allergic to antacids.

To really hyper-optimize few shot prompting, you can do more sophisticated things like dynamically choosing few shot samples based on the query.

Strategy 1: Pick examples that are most similar to your input. eg. you can do this by finding matches with the highest cosine similarity (cosine similarity closer to 1).
Strategy 2: Represent the spectrum of examples by carefully choosing few of them.
- few similar examples with the highest cosine similarity (cosine similarity closer to 1)
- few opposite examples with the lowest cosine similarity (cosine similarity closer to -1)
- few unrelated examples with cosine similarity closest to 0

Embeddings are not the only way to find similar values. You can also use heursitics, and filters such as: “only examples similar to the patient's profile” or "only examples from the patient's medical history".