The Cost of the Answer

2026-07-02

Most people keep both a checking account and a savings account, not because one is better but because they do different jobs. A checking account gives back what you put in: deposits go in, withdrawals come out, and the balance is roughly what you deposited. A savings account puts the balance itself to work. Interest earns interest, and given enough time the growth outweighs the original deposit.

Most teams open a checking account when they start with AI. Take a process, hand it to agents, and the work speeds up. The gain is proportional to the deposit: more effort into the design, more throughput out, and to gain again you go back in and deposit more. That is real value. It is also a ceiling. If all your investments are in checking, you grow only as fast as you can deposit.

What separates the two accounts is not the money. It is whether anything in the system can turn this period's balance into next period's growth. AI has the same dividing line: it compounds only when the organization lowers the cost of knowing whether the work got better. The cost that matters is not the cost of the work but the cost of the answer.

The machinery and the economics

The machinery of compounding is the loop: a system that does a piece of work, evaluates what it produced, and feeds the evaluation back into the next round. A first draft gets a critique, the critique produces a second draft, and each round starts where the last one ended rather than where the first one began. Anything that closes that circuit is a loop. Anything that doesn't, no matter how fast or how automated, is a single pass repeated.

Every loop has two halves: the thing producing the work, and the thing judging whether the work improved. Call the second one the evaluator. Agents have made the first half nearly free. The work itself, the drafting, the coding, the generating, now costs so little that it is no longer where the economics of the loop are decided; the evaluator is. A loop compounds only if something can reliably tell, round after round, that this version beats the last one, and feedback without that judgment does not compound. An unvalidated loop can reinforce a mistake as easily as fix one, and it can get wronger every round, which is worse than never looping at all, because it looks like progress the whole way down.

Agents make the work cheap. Evaluators make the answer cheap. Evaluators that persist make judgment compound. Most teams are working hard on the first, and the returns have moved to the other two.

The theory of value

If validated feedback is what compounds, then the speed of any loop is set by how cheaply you can get the validation: how much it costs to know whether the last round was better. When the answer is free, you can check after every single round, and the loop compounds as fast as it can run. When the answer is expensive, you can only afford to check occasionally, and the loop crawls no matter how fast the work itself runs.

Answer cost, not work cost, decides which work compounds and which merely speeds up. Coding compounded first, and the tempting explanation is that coding gets its answer for free: the test suite runs itself, so the loop can iterate as often as it likes. But a test suite only tells you whether the code satisfied the tests, not whether the product judgment behind them was right, and tests do not write themselves. Coding compounded first because software engineering spent decades building cheap evaluators: tests, builds, linters, type systems, benchmarks, continuous integration. Fields compound when they invest in answer infrastructure. Writing has a little of that infrastructure: a read is quick, so a draft can get an answer nearly every round. Strategy and synthesis have almost none, which is why the verdict on a strategy can take months to arrive.

The work leaders are often held accountable for, the strategy, the synthesis, the judgment, is also the work that is difficult to check. Difficult is not the same as fixed, though. Answer cost depends on how much answer infrastructure a field, or a firm, has built, and infrastructure can be built. And the final answer is not the only answer. For a strategy, the verdict may genuinely stay expensive; you may not know for six months whether the call was right. But the verdict decomposes into intermediate answers that are cheap if you ask for them: whether this version is clearer than the last, whether the assumptions are explicit, whether the objections are addressed, whether the plan survives the plausible futures. None of these is the final truth. AI cannot know the final truth cheaply, and it does not need to: it can lower the cost of the intermediate answers that make the final judgment better, and those are the answers a loop runs on. The verdict stays with whoever owns the call; the loop hands them better material to make it with.

Making answers cheap, and keeping them honest

A cheaper answer is not automatically a good one, only a faster one, and a fast answer that is still wrong buys you nothing. An answer has to be cheap enough to run often and trustworthy enough that running often means learning more. Both properties can be engineered.

You lower cost by asking a smaller question. Judging whether a draft is good in any absolute sense is slow and expensive. Judging whether this version beats the last one is cheaper, and it is usually all the loop needs. Comparison can still be hard when dimensions conflict, one version clearer, the other bolder, but the question stays bounded: two candidates, one call, no need to define good in the abstract. Bounded questions can be asked often.

You lower it again by watching the plateau instead of waiting for a finish line. Nobody can tell you in advance what done looks like for a piece of strategy, so do not make the loop wait for that answer. Watch the size of the changes each round: when they shrink toward nothing, or the same fix keeps recurring, you have a signal, not a verdict, and it cost nothing but attention to a trend you were already tracking.

You raise trustworthiness by requiring evidence before you count a win. A loop with no free answer can talk itself into progress that is not there, so make it show its work: the specific change, the specific result. This does not make any single check faster. It makes the check honest, which is what keeps you from compounding on a false gain. And you raise it again by waiting for a pattern to repeat before you trust it. A lesson drawn from one round might be noise wearing a lesson's clothes. Waiting for it to show up twice costs one extra round and buys an answer that is harder to fool.

The durable asset

A loop with a fixed evaluator makes the work better: this draft, this strategy, this website. The artifact improves until it ships, and the next artifact starts from zero. The artifact was never the asset. It ships, it leaves, and it starts depreciating the day it is done. The evaluator can stay.

A company writes a strategy memo. The first pass is an agent's draft. The second pass is an evaluator holding that draft against a rubric: assumptions explicit, options distinct, tradeoffs named. The third pass notices that the same objection keeps recurring across rounds, weak customer evidence, say, and starts demanding it earlier. The fourth pass writes that objection into the standing rubric, so the next memo, on a different question entirely, faces it from the first draft. Run this for a year and the company has not merely produced better memos. It has accumulated better strategic taste, held somewhere that does not walk out the door.

The model may be exactly the same at the end of that year as it was at the start. What sharpened is the apparatus around it: the rubrics, the objection sets, the worked examples, the recorded failure cases, the standing comparisons. A loop that improves its own evaluative apparatus is lowering the cost of the answer on its own, permanently, for every future piece of work that passes through it. It makes more of the organization's work eligible to compound without anyone going back in to fix the economics by hand. AI is well suited to running that apparatus, because it can hold a draft against a rubric, yesterday's synthesis against today's, or two strategies against each other, fast enough to answer every round. The apparatus is what compounds, and it belongs to whoever built it.

Throughput is consumed, artifacts depreciate, and evaluators appreciate. They can also ossify: a rubric can harden into bureaucracy, an objection set can overfit to the last failure, accumulated taste can quietly become accumulated caution. The asset needs maintenance, new evidence, new counterexamples, an occasional reset. Kept honest and held long enough, an evaluator changes what the work is. A strategy stops being a document you finish and file, and starts being something you keep running: more variants worth generating because generating them got cheap, decisions revisited on a cadence instead of once a year, a living draft still answering to reality on the same terms it always did, just able to try more before reality weighs in.

AI does not compound because generation got cheap. It compounds when judgment gets cheap enough to run continuously, and durable enough to improve the next piece of work, not just the current one. That second investment is harder to fund than the first. Automation pays back in throughput, and throughput shows up in the first week; an evaluator looks like overhead right up until it becomes the asset. So the natural question, what work can I automate, keeps winning the budget, and it leads to checking accounts every time: you speed up the work, pocket the gain, and the next gain costs another deposit.

The better question is: where is the cost of the answer too high to loop? Those places are not failures of AI; they are the map of where to invest. Lower the cost of the answer and the work starts compounding. Lower it persistently and pieces of judgment that have sat in checking forever, one deposit at a time, start behaving like savings.