What is the three-proof rule?

A working trust standard for AI-produced research artefacts. Open the packet, pick any three material claims — not decorative ones, claims the thesis would have to rely on — and try to verify them in five minutes. If the claims hold, the artefact has earned attention. If they don't, it has not earned a place in the workflow.

What does 'material claim' mean in the rule?

A claim the thesis depends on. A revenue driver, a margin assumption, an adoption barrier, a regulatory risk, a customer behaviour, a supply constraint, a competitive dynamic. A decorative claim — background colour, generic context — doesn't count. The rule is about whether the load-bearing parts of the argument survive checking.

What does the rule punish that other trust standards miss?

Two opposite failures at once. First, the wrong claim — an assertion that doesn't hold when checked. Second, the vague claim — 'market sentiment is shifting,' 'customers are becoming more selective' — language so general that nothing can be verified or contradicted. AI tools often fail one of these tests cleanly while failing the other quietly.

Why does negative evidence have to be in the packet?

Because a research artefact that only proves its own argument accelerates confirmation bias and dresses it up as diligence. A packet that earns trust includes what would weaken the thesis — the counter-evidence, the source that contradicts the trend, the reason the risk might be overstated. A packet that performs certainty fails the rule by definition.

Can the three-proof rule be applied by buyers who don't run the research themselves?

Yes — that is one of its useful properties. Compliance teams, PMs, research leaders, and procurement can all apply the same five-minute test to the same artefact. The rule converts an abstract debate about AI tools into a concrete inspection of a specific piece of work.

The Three-Proof Rule

I have been in enough boardrooms where a research deck got demolished by a single question to know what the question looks like. It is rarely sophisticated. It is something like "where did this number come from?" or "what is the date on the source?" or "is this an industry average or a single example?" The deck cracks not because the team was lazy but because the system that produced the deck did not expect anyone to check.

That is the wrong system for investment research.

A simple test

The three-proof rule is the bar I have come to use when I look at any AI-produced research artefact, my own or anyone else's. Open the packet. Pick any three material claims — not decorative ones, claims the thesis would have to rely on. Each one should be tied to a line-level source under eighteen months old, with at least one negative datapoint inside the citation chain. Verify the three in five minutes. If they hold, the work has earned attention. If they don't, the conversation is over.

That sounds simple because it is supposed to. The complexity is in the system that has to survive the test, not in the test itself.

Confidence is not credibility

Generic AI is very good at producing confident language. That is useful for drafting and risky for diligence. A confident paragraph can hide weak sourcing, stale data, a missing caveat, an unsupported causal step. The prose runs smoothly enough that the reader stops asking whether the underlying evidence is strong, and by the time the analyst realises a load-bearing claim was thin, the memo has already been read by the PM.

The question worth asking is not whether the output sounds right. It is whether the analyst can check the output, quickly, without leaving the artefact.

Most generic AI fails that test. Most purpose-built research tools fail it too, for a slightly different reason — they put citations in the output, but not where the analyst can actually use them. A footnote at the end of a paragraph is not the same as a source attached to a specific claim, with the date, the relevant excerpt, and a confidence marker.

What the rule punishes

A packet that passes the three-proof rule has two qualities most AI output lacks: specificity and provenance.

It punishes two opposite failures. The first is the wrong claim — the assertion that does not hold when checked. The second is subtler, and more common in practice: the claim that is too vague to be checked at all. "Market sentiment is shifting." "Customers are becoming more selective." "Competition is intensifying." All of those might be true; none of them are useful without specificity, and none of them can be verified, which means none of them belong in a decision artefact.

A claim has to be concrete enough to be checked. If a packet cannot produce three checkable material claims, it is not research. It is commentary, and the analyst has plenty of commentary already.

Negative evidence is part of the test

A serious research packet does not only prove its own argument. It shows what would weaken the argument.

If the packet says a thesis is strong, what evidence works against it? If it identifies a risk, what suggests the risk may be overstated? If it highlights a trend, what source contradicts it? AI tools that only gather supporting evidence are dangerous because they accelerate confirmation bias and dress it up as diligence. A packet that earns trust includes the counter-evidence; one that performs certainty does not.

I learned this the hard way in agency work. Decks that present only the supporting case look strong in the room and fall apart the moment someone with experience asks the second question.

Auditability has to be designed in, not bolted on

Many AI workflows add citations the way a deck adds logos — for decoration. A proper research artefact treats citations as infrastructure. The source trail is not there to make the output look serious; it is there so the analyst can challenge the work quickly and know where to dig next.

Auditability is more than a link. It is the claim, the source, the date, the relevant excerpt or paragraph reference, the confidence assigned, and the role the claim plays in the argument. A background claim should not carry the same weight as a central thesis assumption, and the artefact should make that hierarchy obvious. The load-bearing claims are the ones the analyst tests first. If three of those survive five minutes of checking, the packet has done its job, whatever the rest of the deck looks like. If they don't, no amount of supporting prose recovers the artefact.

Why this matters before procurement

Analysts may be willing to experiment with tools that feel useful. Institutions need a higher bar than feel.

The research leader needs to know whether outputs are consistent. The compliance team needs to know whether the process is defensible. The PM needs to know whether the workflow improves judgment or adds noise. Procurement needs to know whether the value is explainable to people who will never use the tool themselves.

The three-proof rule gives all of them the same test. Instead of debating AI in the abstract, the buying committee can inspect the artefact. Pick three claims. Check them. If they hold, the conversation moves forward. If they don't, the meeting is over and nobody loses an afternoon. That is a more useful trust standard than enthusiasm, demo polish, or model mystique.

What Thesis Lab should be judged on

Thesis Lab should not be judged on whether anyone likes the idea of synthetic research. It should be judged on the packet.

Can the thesis brief surface the load-bearing assumptions? Can the evidence map show where the claims come from? Can the counter-thesis pressure-test the house view? Can the source table survive inspection? Can the analyst find one non-obvious gotcha, or save an hour of intake?

And can three material claims be verified in five minutes?

That is the bar I want the product judged on, because that is the bar I would apply to anyone else's research before I let it influence a decision. The future of AI in research belongs to systems that make verification easy — not because users are cynical, but because serious buyers should be skeptical.

The Three-Proof Rule

A simple test

Confidence is not credibility

What the rule punishes

Negative evidence is part of the test

Auditability has to be designed in, not bolted on

Why this matters before procurement

What Thesis Lab should be judged on

Further reading

Frequently Asked Questions

What is the three-proof rule?

What does 'material claim' mean in the rule?

What does the rule punish that other trust standards miss?

Why does negative evidence have to be in the packet?

Can the three-proof rule be applied by buyers who don't run the research themselves?

Andreas Duess

Ready to Experience Synthetic Persona Intelligence?

Stay in the loop.

A simple test

Confidence is not credibility

What the rule punishes

Negative evidence is part of the test

Auditability has to be designed in, not bolted on

Why this matters before procurement

What Thesis Lab should be judged on

Further reading

Frequently Asked Questions

What is the three-proof rule?

What does 'material claim' mean in the rule?

What does the rule punish that other trust standards miss?

Why does negative evidence have to be in the packet?

Can the three-proof rule be applied by buyers who don't run the research themselves?

Andreas Duess

Related Articles

The New Math of Expert-Network Spend

First Word, Not Last Word

The Work Isn't the Decision. It's the Intake.

Ready to Experience Synthetic Persona Intelligence?

Stay in the loop.