Six Pillars of Trustworthy Financial AI
Financial AI earns trust only when its reasoning is constrained, inspectable, and replayable. Outside that boundary, it isn’t really a system – it’s uncontrolled behaviour.
Simon Gregory | CTO & Co-Founder
Pillar 1: Auditability
When you can’t see how an answer was formed, you can’t trust it
Pillar 2: Authority
When AI can’t tell who is allowed to speak, relevance replaces legitimacy
Pillar 3: Provenance
When you can’t see the lineage, the system invents it
Pillar 4: Context Integrity
When the evidential world breaks, the model hallucinates the missing structure
Pillar 5: Temporal Integrity
When time collapses, financial reasoning collapses with it
Pillar 6: Determinism
When behaviour is unstable, trust must come from the architecture, not the model
Pillar 6: Determinism
When behaviour is unstable, trust must come from the architecture, not the model
You run the same query twice. You get different answers.
Not slightly different. Materially different. Different sources cited. Different conclusions reached. Different confidence implied. No error message. No audit trail. Just… different.
You run it a third time. A fourth. Each time, the system sounds certain. Each time, it sounds slightly different from the last.
This isn’t a bug or a configuration issue. This is how these systems work.
It is called non-determinism. Understanding it is the architectural foundation beneath everything else in this series.
The question isn’t whether to use AI in finance, but whether you have the architecture to use it safely.
Two Worlds
Deterministic systems always produce the same output for the same input. They are stable, repeatable, testable, and auditable. Financial infrastructure depends on this property at every layer where trust and auditability are required: pricing engines, ledgers, reconciliation systems, regulatory reporting, risk models. The trust they carry comes from the fact that their behaviour does not vary.
Non-deterministic systems can produce multiple valid outputs from the same input. They are probabilistic, context-sensitive, high-dimensional, parallel, and non-linear. Their potential comes from the fact that their behaviour can vary.
LLMs are non-deterministic by design. Vector embedding models are deterministic in principle, but in practice can exhibit non-deterministic behaviour (through hardware parallelism and versioning), meaning information retrieval itself can be a source of silent variation.
These are two fundamentally different worlds. Financial AI sits directly at their boundary, where non-deterministic reasoning must feed deterministic infrastructure.
The goal is not to eliminate non-determinism from every layer, but to ensure it never operates without deterministic controls where decisions are consequential.
Why LLMs are non-deterministic
The architecture is designed for variability.
The model doesn’t retrieve a correct answer. It constructs a probable one from thousands of possibilities. Vectors encode meaning as positions in high dimensional space rather than as rules or lookups. The same question, surrounded by different context, activates different parts of the model. Even with fixed random seeds, micro-differences in context, timing, retrieval order, or hardware can compound into qualitatively divergent outputs.
The system explores a possibility space, rather than executing a fixed path.
The butterfly effect in financial AI
LLMs behave like chaotic systems: tiny changes produce massive differences.
Small changes in prompt phrasing can produce large shifts in tone, conclusion, or emphasis. Different retrieval results change which evidence the model reasons from. A different model version introduces silent changes in model behaviour: the same query, the same system, different outputs, with no change made by any user. A different sampling temperature changes how the model weighs competing continuations.
In practice: a risk analyst reruns a research summary on the same earnings call. The first run flags a covenant concern. The second does not. Both outputs are fluent. Neither is flagged as uncertain. The analyst does not know which to trust, or whether to trust either.
This behaviour isn’t an edge-case, it’s what these systems do.
Why non-determinism breaks financial workflows
Financial workflows are built on five assumptions that non-deterministic systems violate:
Stability – the same inputs produce the same outputs. LLMs cannot guarantee this.
Traceability – outputs can be traced to specific, verifiable sources with their full lineage intact. Generative synthesis obscures this.
Repeatability – processes can be re-run for audit, validation, or review. Non-deterministic processes cannot be replayed.
Contextual consistency – the same question, same evidence, should give the same answer. But LLMs change their answer based on what else is in the prompt.
Temporal consistency – the same question today should match the same question in three months. But model updates, retrieval inconsistency, embedding variations, and the model’s tendency to flatten time mean behaviour changes with no notification and no audit event.
A non-deterministic output may not be the same next time. It may not follow the same reasoning path. It may not be grounded in the same sources. It may change silently with model updates. It may change with context variations in ways no one intended.
These failures compound. The output, the reasoning, and the sources may change. Silently.
This means raw LLM output should be considered untrusted input for any consequential downstream process.
The replayability problem
In regulated finance, the ability to reconstruct a decision is not optional. Regulators, auditors, and risk committees require it. If a firm made a consequential decision six months ago (such as a credit assessment, a risk classification, or a research-driven trade) and is asked to show its reasoning, it must be able to replay the process that produced it.
Non-deterministic systems cannot satisfy this requirement on their own by design.
The process that produced a given output does not persist. The sampling state, the retrieval results, the context window, the model weights at that moment; none of these are preserved or reproducible. Running the same query again does not reconstruct the original reasoning. It produces a new and different output that may reach different conclusions from different sources via a different path.
There is no audit trail because there is no fixed path to audit. There is no replay because the process was never deterministic in the first place.
This is not a logging problem, and capturing outputs after the fact doesn’t solve it. An output without a reproducible process is evidence of nothing. A regulator asking how a decision was reached cannot be satisfied by showing them the answer. They need to see the reasoning, the sources, the lineage, and the logic that connected them. Non-determinism makes that structurally impossible.
Human review can’t fix this. You can’t validate what you didn’t see. Even if you were there, you can’t guarantee the next run will match. The system offers no such guarantee.
This is the compliance exposure that non-determinism creates, not at the edges, but in the core of every consequential process it touches.
The deterministic trust boundary
Probabilistic black boxes do not fail randomly. They fail in predictable ways, and those ways have names.
The first five pillars of this series each describe a distinct failure mode. Separately, each is serious. Together, they describe something larger: a system under a common pressure. The sixth pillar is that pressure. It is what makes the other five necessary, and what the deterministic trust boundary exists to contain.
Auditability is the requirement that every output can be traced, verified, and replayed. It is both a direct consequence of deploying a probabilistic black box (which cannot trace its own reasoning) and the ultimate proof that the architecture is working.
Authority is the requirement that institutional legitimacy, not semantic proximity, determines who shapes an answer. The system is structurally blind to this distinction. It must be enforced from outside.
Provenance is the continuity layer that preserves full lineage from source to output. Without it, nothing downstream can be trusted. It is what holds all other pillars together.
Context Integrity is the requirement that the evidential world presented to the model remains structurally intact. It can be impacted by non-deterministic elements in the retrieval. Instabilities and gaps can also be magnified by the model’s chaotic sensitivity.
Temporal Integrity is the requirement that time is treated as a first-class dimension. Outdated evidence that crosses the trust boundary does not announce itself. It looks correct.
Non-determinism is the failure mode that makes replayability structurally impossible. There is no fixed path to reconstruct. The same query does not reproduce the same reasoning. Every output is unrepeatable by design.
These six failure modes are not independent, as they relate to the same underlying condition: probabilistic black box systems that explore a possibility space, rather than executing a fixed path. That is the source of their capability, and the reason their failures compound rather than sit in isolation.
The model’s chaotic sensitivity to context variation means small differences in what enters the system can produce large differences in what comes out, magnifying every failure mode it touches.
Containment is the only viable architecture
The answer to non-determinism isn’t to eliminate it, as that would eliminate the capability. The correct response is to contain it.
The non-deterministic layer must be bounded by deterministic controls at every point where trust is required. That means deterministic inputs, deterministic lineage, deterministic retrieval paths, and deterministic authority signals, so that the generative step operates on a stable, validated foundation rather than an unstable one.
The LLM synthesis layer (where much of the non-determinism lives) must sit outside the trust boundary. It should receive structured, validated inputs. Its outputs should be anchored to auditable lineage. What it cannot reach, it cannot cite.
The other five pillars are not independent problems to be solved separately. They are the deterministic containment architecture that must exist precisely because the generative layer cannot be trusted to provide them. Together, they are what makes non-determinism safe to use.
The right architecture doesn’t constrain what AI can do. It expands what is possible. Reasoning, synthesis, abstraction all become available when the non-deterministic layer is properly bounded.
The containment architecture is not the price you pay for using AI safely. It is what makes AI genuinely useful in finance rather than merely impressive.
Non-determinism is a property, not a feature. In finance, it must be contained by deterministic architecture.
< Previous | Pillar 5: Temporal Integrity



