In AI agents, a context graph is the part of agent memory that captures decisions.
This post explains why AI agents need to capture decisions, how context graphs capture them, how agents use them, and how agents enhance their capabilities and performance with them.
An agent failure case
It's the last week of the quarter. A renewal agent is working a $480k account. The customer wants 20% off or they walk. The agent's instructions state >$100k accounts should not churn, but the agent's policy caps renewals at 10%. Now what?
If a human was handling this, they'll probably use experience and memory to resolve.
Didn't we do this exact thing with Globex last quarter? It was a similar story. They were threatening to churn, and someone signed off on 20% because the CEO wanted to retain Fortune 500 logos and the risk was worth taking on a $300k account. It worked and Globex renewed shortly after.
This reasoning chain that makes the decision is not written down anywhere your agent can read. The agent will find Globex's exception case in Salesforce, but Salesforce will not tell it that the number was an exception, who approved it, why it was approved, whether the current situation is identical or not.
The why lives in -
- Old slack threads where finance team admits a $300k account is worth the risk.
- Zoom calls where sales veterans mention these kind of accounts pay eventually.
- Emails from the CEO saying retaining Fortune-500 logos is critical.
These are critical pieces of information needed to make the decision, but the agent cannot access them.
So your agent does one of two things -
- It sends an email informing the policy caps at 10%, and you lose the account.
- It escalates to a human, who spends 24 hours doing Slack archaeology to reconstruct a decision the company already made once.
Either way, the organization failed to benefit by adopting an AI agent.
This is the gap. We have gotten extremely good at recording what changed, but we systematically throw away why it changed, which is the one thing your agents need most.
Context graphs store the why in an agent's memory.
Foundation Capital called it "a trillion-dollar AI opportunity", which is the kind of phrase that sends people reaching for the back button. But stick around anyway. The term is new and a little overloaded, but it points at something real. If you use or build agents, you'll end up using a context graph soon.
Flat context is bad
The bad way to give your AI agents context is to just give them all the data, records, documents, rules, policies in a flat context window.
Say you have an invoice processing agent. You dump the invoice, PO, vendor record, contract, policy document into the context. Then watch the agent fail. Why?
1. Context rot
When the agent asks itself “can I pay invoice #842?” To answer, it has to hop - which PO does this invoice reference? does that PO still have budget? was the delivery received? is the vendor on payment hold? does $12,400 cross the approval threshold? what do the contract's payment terms say? is the policy doc accurate and up-to-date? are there undocumented nuances related to this invoice payment currently living on Slack messages and Zoom call transcripts?
Flat retrieval tries to hand this massive amount of information to the LLM model in a pile of disconnected chunks. Some invoice text here, some PO text there, some language from random docs and slack channels, all given as a flat wall of text.
“Can I pay invoice #842?”
Dana 10:21
Acme always pays end of quarter — don’t chase them.
Wes 10:22
noted 👍 leaving #842 as-is
| A · Vendor | B · On hold | |
| 1 | Acme Corporation Ltd | No |
| 2 | Globex LLC | Yes |
| 3 | Initech | No |
The agent is forced to re-derive every one of those connections between these pieces of data from scratch on each turn. Flattening business data into loose text destroys exactly the structure it needs.
And as context becomes large, LLMs struggle to cope up with the size and start failing in their tasks. They fail to follow instructions, drop rules randomly, misunderstand the relation between two pieces of information far apart, ignore middle-of-context data, apply rules and constraints out of order.
Surge AI documents this in their instruction-following benchmark. The best frontier model solves <41% of such complex tasks.

2. Lack of decision traces
Like we saw in our first example, AI Agents run into the same ambiguity humans resolve every day with precedents, experiences, organizational memory. But you can't give these things to an agent in a flat context window.
- Tribal knowledge. "We always waive the $5k onboarding fee for logistics companies but only if they push back on the timeline first." That's not in the CRM. It's tribal knowledge passed down through internal conversations.
- Past decisions. "We structured a deal for account X where they split payments into installments. We should offer this similar account Y the same." No system links the two deals to convey why Y's contract was drafted this way.
- Context across systems of record. An account manager sees usage sliding in the product dashboard, an unpaid invoice in NetSuite, a cold one-line email. They flag the account as "churn risk" in the CRM. The reasoning happened in their head, but the CRM record just shows "churn risk".
- Manual approvals. A VP approves a discount on a Zoom call. The Hubspot record shows the changed price. It doesn't show why this decision was made.
Reasoning behind data, decisions, actions isn't captured in a flat context window.
If you are a developer, this concept hits even harder. Why did we pick this queue over that one in 2019? Why is there a sleep(200) in the retry path that breaks everything when you remove it? It was obvious to whoever wrote it, but that information is gone now. Remember Architecture Decision Records? They were invented back in 2011 to fix exactly this. But most ADR folders die at three entries, because writing them is friction and nobody reads them later.
This is a universal problem. Companies are good at storing what happened. They are bad at storing why. This is because the why is unstructured, spread across systems, and nobody reads it even if you store it.
Both problems, context rot and lack of decision traces, are solved by context graphs.
What is a context graph?
A context graph is a way of structuring an LLM's context as a graph, where nodes hold pieces of information and edges hold the relationships between them. It's optimized for the model to read, not for a human to browse.
Most agent memory today is flat. AI agents embed your data, split them into chunks, and return the few chunks that look most similar to the ongoing task. The LLM gets a pile of text with no sense of how these chunks connect to one another. This is vector RAG, the standard memory used in AI agents today.
A context graph keeps those connections. Instead of "here are five similar paragraphs," it can say "Service A –depends on–> on Service B," "this release –caused–> that outage," or "this invoice –follows–> that policy." The edges carry meaning, and the model can traverse them.
This matters because similarity is not relevance. Two chunks can share words and have nothing to do with your actual question. A typed edge tells the model how two things relate, so it can trace a chain instead of guessing from word overlap.
One caveat, stated plainly: the graph doesn't reason. Your LLM does. The graph just hands it a connected map instead of a shoebox of clippings.
it's two ideas bolted together.
First, store every link once. Treat the things your business cares about as nodes: invoices, POs, vendors, contracts, policies, people. Treat the relationships as edges: this invoice references that PO, that PO is fulfilled by this receipt, this vendor signed that contract. Now the agent doesn't rebuild the web each turn. It walks it. It pulls the small slice a decision touches and leaves the other ten thousand records out of the window. Context rot goes away, because the window stays small and on-point.
“Can I pay invoice #842?”
Second, store the why as data. The unit here is a decision trace: a structured record of one decision and everything that led to it. Not just "Acme moved to Net-60" but the problem that triggered it, the options weighed, why the rejected ones lost, who decided, and the reasoning they used.
Sales 14:31
Acme wants Net-60 to renew. Options?
Finance 14:33
Net-30 is safer but they may walk. Net-90 too exposed given the late history.
Priya · AP lead 14:35
Net-60 it is — I’ll sign off under the exception rule. ✓
| Invoice | Due | Paid | Status |
| #602 | 14 Jan | +14 days | Late |
| #588 | 02 Nov | +9 days | Late |
| #570 | 20 Sep | on time | Paid |
This is what a decision trace stores. This is also what an employee keeps in their head. But with a context graph of decision traces, an agent (harness) can read it.
A context graph is those two things together. Entities and relationships, plus a decision trace on every choice that mattered, stitched across systems and time. Foundation Capital's one-liner for it is a "system of record for decisions" instead of objects. That lands. Most of your systems store the current state of things. A context graph stores how the state got that way.
Example schema of a decision trace:
The decision-trace schema, centered on what matters
DecisionTrace is the hub — every person, policy, part, and vehicle hangs off it by a named relationship. PK = primary key.
Now the second half is turning old decisions into something the agent can lean on.
Capture it on the way in
You capture the why at the moment the decision is made, on the write path, not by mining emails afterward. This sounds like an implementation detail, but it is the most critical aspect of your context graph. Reconstructing context after the fact is guesswork: the meeting is over, the Slack thread scrolled away, the person left.
Capture it in the flow and you get the real reason at full fidelity, for almost no extra effort. When a human overrides the agent's refund denial, that override is the moment to ask why and store the answer. Right then, while it's still true.
record_decision() — how one call fans out
Solid = request, dashed = response. The MCP tool orchestrates every hop — the engineer only sees the opening call and the final decision ID.
This is also why agents change the economics of organizational memory. We have always known we lose the why. Wikis, Confluence, post-mortems, ADRs: every one of them tries to save it, and every one decays, for the same two reasons. Writing it down is friction, and nobody reads it back.
Agents break both at once. The agent sits in the execution path, so capture is a side effect of doing the work, not a chore bolted on afterward. And the agent is a tireless reader that will happily consult ten thousand past decisions before making the next one.
Organizational memory finally has a reader worth writing for. That flips it from a cost you nag people about into an asset that compounds.
Stored decisions become precedent
Once decisions live in the graph, search turns them into precedent.
You embed each decision and match on meaning, so "missed three deliveries" surfaces "late shipments" and "vendor SLA breach" with no shared words. A new case shows up, the agent pulls the closest past cases, sees what was decided and why, and proposes the same, with a citation.
Finding precedents by meaning, not keywords
The new decision is turned into a vector and matched by cosine similarity — surfacing past decisions that mean the same thing, ranked by closeness.
So you use vector embeddings to find semantically similar decisions. Then apply graph-based filters to narrow by entity type, supplier, category, date range, or outcome.
A pile of old decisions becomes memory the agent can actually use. This is also how an agent improves without anyone retraining it.
The agent handling a refund pulls up similar past refunds and sees what was decided, and why. The match works on meaning, not exact words, so "missed three deliveries" also surfaces "late shipments" and "vendor SLA breach". A pile of stored decisions turns into memory the agent can use to self improve against the end goal.
- paid late once last year
- $500k renewal at risk
- #348 on the Fortune 500
- late twice but still granted
- $540k judged worth the risk
- #211 on the Fortune 500. Great logo.
- segment: healthcare
- 9-month procurement cycle
- net-new logo
- segment: healthcare
- procurement cycles are brutal
- build +10% into the quote
- captured at onboarding zoom call transcript in call_id 4329
- usage −30%Product
- invoice 38d overdueNetSuite
- one-line cold replySlack channel #C0BCH243GFE
- usage slid 28%Product
- unpaid invoice for 37 daysNetSuite
- cold renewal replyOutlook
- CSM changed status to churn risk
There's a second payoff here. Because traces record exceptions, not just the clean path, you can see when a rule keeps getting overridden. If AP grants the same late-payment exception to twenty vendors, the policy is wrong, not the vendors. The graph turns that pattern into a signal to fix the underlying rule itself.
was granted this quarter
Over time, the context graph becomes the real source of truth for autonomy, and your company can easily audit and debug this autonomy.
The recent ACE paper, "Agentic Context Engineering", makes the mechanism concrete:
Treat the accumulated context as a playbook that grows through generation, reflection, and curation, and let real outcomes refine it. The agent gets better by editing what it knows, not by touching a single weight. A correction today becomes a rule tomorrow. A trace today becomes precedent next quarter. This feedback loop enables learning in agents.
Example of an agent using context graphs
How the agent finds — and uses — a precedent
Hover the panel and use ← / → — or tap the arrows — to step through.
Isn't this just a knowledge graph with extra steps?
Fair question. The honest answer: the parts are old, the combination is new.
Knowledge graphs have been around since Google shipped one in 2012. Vector search and RAG are standard issue. Event sourcing, storing the sequence of events instead of just the latest state, is a pattern any backend engineer already knows. A context graph is close to event sourcing for decisions, where each event drags along its rationale and its links to everything it touched, and the store is a graph so you can walk relationships and search by meaning.
So no, there's no new primitive here. What's new is narrow and real. You capture the why on the write path, as structured data, because for the first time there's a consumer hungry enough to make it worth the trouble. The novelty isn't the graph. It's that agents finally give the why somewhere to be used.
RAG, for contrast, retrieves documents that look similar to your question. A context graph retrieves decisions, with their reasoning and their edges to everything they affected. One hands the model more text to read. The other hands it structure to walk and precedent to reason from.
Why your CRM won't just do this for you
Systems of record probably have it wrong. Salesforce released Agentforce, ServiceNow released Now Assist, Workday is doing something similar. Their reasoning is to add intelligence where the data resides.
But their agents will inherit the exact same limitations as their parents.
- Salesforce only stores the current state. It knows what a deal looks like today, but not what it looked like last week. When someone approves a discount, the system drops the reasons why. You cannot see the exact context of the choice.
- These systems also miss data. A support ticket doesn't just live in Zendesk. It needs user tiers from CRM, SLA terms from billing, recent outages from PagerDuty, Slack thread flagging churn risk. No single system of record sees the whole picture.

When an agent triages an escalation, responds to an incident, or decides on a discount, it pulls context from multiple systems and time periods. The orchestration layer sees the full picture: what inputs were gathered, what policies applied, what exceptions were granted, and why. Because it's executing the workflow, it can capture that context at decision time instead of bolting on governance afterwards.
This is the essence of a context graph, and that will be the single most valuable asset for your company in the era of AI.
The hard parts
Here's where it gets difficult -
Garbage in, garbage precedent. If the captured rationale is lazy ("approved, see Slack"), your precedents are landfill. The graph is worth exactly the quality of the why you put in it, and writing a good why is real work, even with the prompt sitting right there.
Who writes the trace. If a human has to type thoughtful rationale every time, you've reinvented the wiki and it will rot the same way. If the agent infers the rationale, you have to trust the inference, and "the model guessed why we did this" is a shaky base for an audit. The real answer is somewhere in between, and getting that split right is most of the engineering.
The decision swamp. We turned data lakes into data swamps by dumping everything in with no schema and no curation. A graph of millions of contradictory, half-true traces is the same failure with extra edges. Without curation, more traces make precedent search worse, not better. ACE has a name for the small-scale version of this, context collapse, where rewriting erodes detail over time. Same disease, bigger blast radius.
And the plain one: this is early. Most vendor decks make it sound shipped. It mostly isn't. The pattern is sound and the early results are good, but "good early results" is not "proven," and anyone who tells you otherwise is pitching.
The full stack
To recap, you have four layers in an AI-native workflow -
- At the bottom sit your systems of record. Salesforce, SAP, Zendesk, GitHub, Slack, the Zoom transcript from this morning's call. They hold the raw state of your business. Scattered and current-state-only, but they're the ground truth for what is.
- Above them runs the harness. It sits in the execution path and runs the reason → act → observe loop. It holds the tools, picks what goes into the model on each step, stores corrections as memory, checkpoints long runs, enforces permissions, logs every decision, and catches errors before they crash the run. This engine turns a stateless LLM into something that finishes work.
- As the harness runs, it builds the context graph. Every decision it routes leaves a trace: what inputs it gathered, which rule it applied, what exception it took, who approved, and why. The graph stitches those traces across entities and time. It answers "why," and over time it becomes the real source of truth.
- At the top sit the agents and the humans. The agents do the bulk of the net work. Humans stay in the loop for the calls the agent flags as uncertain. Every correction a human makes flows back down into memory and the graph, so the next case runs cleaner.
The four layers of an AI-native company
Every correction a human makes flows back down into memory and the graph, so the next case runs cleaner.
This actually maps to the two core features of an AI-native workflow -
- Universal context is your systems of record made queryable through the context graph. The agent doesn't re-derive the links between an invoice, a PO, a contract, and a Slack message on every turn. The graph already holds them.
- Loops are the harness closing feedback on every run. A correction today becomes a rule tomorrow. A trace today becomes precedent next quarter.
Where to start
Don't try to automate everything at once. Pick one workflow/team and prove it before you expand. Try to pick a workflow/team with one or more of these three traits -
- High headcount, because that labor exists to handle messy logic.
- Exception-heavy decisions, because precedent matters a lot there.
- Cross-functional roles, because they exist just to carry context that no other system holds currently.
If all three line up, that's your first target. Procurement, finance, claims, deal desk, underwriting, escalation management are few examples.
Don't be afraid to tokenmaxx. If your monthly AI usage bill doesn't make you uncomfortable these days, you are doing something wrong. Plus, you can more than compensate for your AI bills today by saving in your payrolls tomorrow.
Conclusion
A clean way to hold all of this in your head:
The model is your brain, the agent / agentic harness is your limbs, and the context graph is the map of your specific world (company). A superb body with no map of your world stalls at every fork in the road that requires knowing the map, and enterprise processes are nothing but those forks.
So the question worth asking about your own company is small and uncomfortable. Where do you remember why things happened, and what actually worked? If the honest answer is "in a few people's heads," you already know where the risk is, and you already know what's worth building.
Capturing the “why” behind decisions is the next great leap in business intelligence.