{God Plays Dice} - On A/B Testing, Randomized Trials, Causality, and Memory.

The Dice are not noise.

They are the precondition under which Causation becomes a Knowable category at all.

Every claim of cause depends on a single architectural fact: whether the choice point that produced the data carried memory of prior state.

If the host of the game knows where the prize is and chooses accordingly, the reveal is memoryful and the inference is contaminated.

If the assignment is randomized, the link between prior state and outcome is severed, and the comparison becomes legitimate.

Randomized trials and A/B tests are the same operation under different names — memoryless choice deployed against opacity to produce undeniable evidence of causation.

The audit that distinguishes honest inference from contaminated inference is also one operation:

Does any CHOICE point in this procedure carry MEMORY of PRIOR state?

Where the answer is no, truth becomes detectable. Where the answer is yes and the memory has not been broken by randomization, no analysis, however sophisticated, can recover what the architecture has already lost.

The dice are not noise.

They are the precondition under which causation becomes a knowable category at all.

Opening

The whole of causal inference can be reorganized around a single question, asked of any procedure that claims to extract truth from data: does any choice point in this procedure carry memory of prior state? The question is structural rather than mechanistic. It does not ask what the causal mechanism is, nor what variables confound the relationship, nor what model is appropriate for the analysis. It asks only whether the procedure's choice architecture is shielded from the outcomes it is meant to estimate.

The shift in what is being audited — from the mechanism that produced the data to the architecture that produced the inference — is the deepest methodological reorientation the field has undergone since Fisher introduced randomization in the 1920s. This essay specifies the reorientation, justifies it on architectural grounds, and traces its operational consequences.

I. The Old Question

The dominant program in observational causal inference since the 1980s has been the mechanistic adjustment program. Pearl's do-calculus formalized causal graphs as directed acyclic networks of mechanism. Robins' g-methods extended the framework to time-varying treatments and confounders. Rubin's potential-outcomes formalism gave the framework its counterfactual ontology. Greenland, Hernán, and a generation of methodologists translated the formalism into operational guidance for applied epidemiologists. The shared structure of the program is this: enumerate the variables, draw the graph, identify the back-door paths, adjust for the right covariates, and recover the causal effect from observational data.

The program is intellectually heroic and formally complete. Within its assumptions, it is correct. The assumptions, however, are demanding: the analyst must have specified the causal graph correctly enough that the back-door criterion can be applied to it; must have measured the confounders adequately enough that adjustment can clear their influence; must have avoided including colliders or mediators that would induce bias rather than remove it; must have chosen functional forms for the adjustment that match the unknown true relationships sufficiently well. Each assumption is a knowledge claim about the underlying mechanism. Each is plausible in toy examples and tractable in well-instrumented systems where the mechanism is approximately known. In the biological, social, behavioral, and clinical systems where causal inference is most needed, none of the assumptions can be fully verified, and the failures compound.

The result, in practice, is a literature of adjusted estimates whose validity depends on assumptions the data cannot test. Sensitivity analyses bound the damage under plausible violations, but the analyses themselves require specifying which violations to consider, and the unconsidered violations remain unconsidered. The program produces numbers, the numbers have intervals, the intervals look authoritative, and the field consumes them as if the assumptions held. Pearl himself has been clear that the assumptions are not testable from data alone. The methodological community has been less clear, and the applied literature has been less clear still. The program's intellectual honesty has not survived its institutional adoption.

The program is also self-limiting in a structural way. It presupposes that the goal of causal inference is to extract the right number from a fixed body of data — that the body of data is given and the analyst's task is to find the analysis that recovers the truth from it. This framing makes the data primary and the analysis subordinate to it. The architecture of how the data were produced — the choice points in the procedure, the conditioning relationships between those choices and the outcomes — is treated as a fixed constraint rather than as the variable that most determines whether truth can be extracted at all. The program operates downstream of where the most consequential decisions were made, and tries to repair upstream damage by downstream cleverness. This is structurally backward, and the practical consequences have been catastrophically imperfect.

II. The New Question

The reorientation begins by asking a different question. Not what is the causal mechanism, and have I adjusted for it correctly? but does any choice point in this procedure carry memory of prior state? The reformulation looks small. Its consequences are not.

A choice point is any moment in the procedure where one outcome is selected from several. Allocation of treatment is a choice point. Selection of which patients to include is a choice point. Decision about which outcome to report is a choice point. Choice of which subgroups to analyze, which time points to measure, which censoring rule to apply, which sensitivity analysis to run — each of these is a choice point, and each can either carry memory of prior state or be shielded from it. Memory in this context means dependence on information that, if it propagates into the choice, produces a procedure whose outputs are correlated with the answer it is meant to estimate. The host of the Monty Hall game carries memory of where the prize is, and his choice of which door to open is therefore not independent of the answer the game is asking. An investigator who selects patients based on their post-baseline characteristics carries memory of the outcome, and his cohort is therefore not independent of the effect he is trying to estimate. A regulatory body that decides which trials to publish based on whether they showed favorable results carries memory of the outcomes, and the resulting literature is therefore not independent of the truths it purports to summarize. Memory at the choice point is the structural feature that contaminates the inference. Detecting it is the audit.

The question is tractable because it is asked of the procedure rather than of the system. The procedure is in front of you. You can read its protocol, inspect its choice points, identify which roles carry memory by their position, and check whether the memory has been broken by randomization, blinding, or pre-specification. You do not need to know the underlying causal mechanism to perform this audit. You need only to inspect the architecture of how decisions were made. This is the relocation: from auditing the mechanism that produced the data, to auditing the architecture that produced the inference.

The tractability difference is enormous. Mechanism auditing requires knowing the system well enough to specify its causal structure correctly — an omniscience requirement that fails routinely in practice. Architecture auditing requires only that the procedure be documented well enough to inspect — a transparency requirement that the discipline has been pursuing for decades through trial registration, pre-specified analysis plans, and reporting standards. The discipline already produces most of the inputs the architecture audit requires; it has simply not been organized to use them for the audit at this layer.

The question is also more general than the mechanistic one. Mechanism auditing has to be done anew for each system, because each system has its own causal structure. Architecture auditing has the same shape regardless of system, because the audit examines the procedure's choice points rather than the world's mechanisms. The same question — does this choice point carry memory of prior state? — is asked of a clinical trial, an A/B test, a Monty Hall game, a meiotic event, a quantum measurement, a randomized policy experiment. The architecture audit is substrate-independent because the architecture is. The mechanism audit is substrate-dependent because mechanisms are. This is why the relocation is generative: the new question is the same question across all the domains where causal inference is needed, and the answer to it depends on properties of procedures rather than properties of worlds.

A further consequence worth naming. The architecture audit is more general than the bias audit it would seem to compete with. Bias is the directional consequence of memory at a choice point relative to a particular question. Memory is the structural property; bias is what memory does to the answer when the answer is being sought. Searching for bias directly is intractable, because it requires knowing the counterfactual unbiased version against which deviation could be measured — which is to say, it requires knowing the answer already. Searching for memory is tractable, because it requires only inspecting the choice architecture for conditioning on prior state, regardless of which way the conditioning might skew the result. All bias is memory-loaded, but not all memory is biased relative to every question. The structural property is the right object of the audit; the directional consequence is what the audit's findings then interpret. This is the more general formulation, and it is what makes the architecture audit work where the bias audit cannot.

III. The Boltzmann-Shannon Frame

The reorientation is not merely a practical preference. It is grounded in the architecture of probability itself, as the dual witness of Boltzmann's statistical mechanics and Shannon's information theory established. Entropy, in both formulations, is the measure of unrealized possibility relative to what is currently specified. Boltzmann's microstates compatible with a thermodynamic macrostate. Shannon's messages compatible with a source distribution. An analyst's possible-truths compatible with a specified procedure. The same operation across substrates: measuring the size of the possibility space that remains open given what the observer has fixed. Shannon's uniqueness theorem proves that the form is forced by the operation — any scalar measure of multiplicity over alternatives satisfying continuity, monotonicity, and a composition rule must be the logarithmic functional, up to a constant. The recurrence of the formula across domains is the recurrence of one operation, not the coincidence of two formulas.

When a procedure's choice point is memoryless, the entropy of the possibility space is correctly computed from the specification alone. There is no hidden conditioning between the choice and the outcome, so the slice of the possibility space the procedure samples from is the slice the specification names. The analyst's posterior, computed from the data the procedure produces, reflects the actual narrowing of possibility that has occurred. The entropy bookkeeping is honest.

When a procedure's choice point carries memory, the entropy of the possibility space is mis-specified. There is hidden conditioning between the choice and the outcome, so the slice the procedure samples from is not the slice the specification names. The analyst's posterior is computed against an entropy that does not match the actual narrowing of possibility, and the resulting inference is biased — not merely in the directional sense, but in the structural sense of having the wrong denominator. The Monty Hall problem makes this explicit: the naive analyst computes the posterior using an entropy that assumes the host's reveal was memoryless (yielding 1/2 for each remaining door), and arrives at the wrong answer because the entropy was computed from a specification that did not include the host's conditioning. The correct entropy is smaller, and asymmetric across the doors, because the host's memory injects information into the reveal that the naive specification ignored.

This is the deep reason the relocation works. The architecture audit is the requirement that the entropy bookkeeping be done correctly. A memoryless choice point preserves the correctness of the entropy computation; a memoryful choice point breaks it. The audit asks, at each choice point, whether the entropy bookkeeping is honest. Where it is, the inference proceeds on solid ground. Where it is not, the inference is suspect regardless of how the numbers came out. The Boltzmann-Shannon identity is what gives the architecture audit its formal warrant. The question does this choice point carry memory? is the question is the entropy at this step computed against the correct slice of the possibility space? asked in operational language. They are the same question.

IV. The Monty Hall Example

The teaching case is Monty Hall, because the entire architecture is visible in a single puzzle.

Three doors, one prize. You choose door 1. The host opens door 3 to reveal a goat. Should you switch to door 2?

The naive analyst treats the host's reveal as a memoryless reduction of the possibility space. Two doors remain; one has the prize; the entropy is log 2; the posterior is uniform; the answer is indifferent. This computation is wrong, and it is wrong in exactly the way the architecture audit is designed to catch. The host's choice of which door to open is not memoryless. He knows where the prize is; he is forbidden from opening your door; he is forbidden from revealing the prize. His reveal is therefore conditioned on prior state — specifically, on the joint state of where the prize is and which door you picked. This is memory in the choice point, declared openly in the rules of the game.

The architecture audit catches the memory by reading the rules. Could the host have opened your door? No. Could the host have opened the prize door? No. Therefore the host's choice is conditioned on prior state, and his reveal carries information about where the prize is. The audit takes three seconds. It requires no statistical analysis, no model fitting, no parameter estimation. It requires only inspection of the choice architecture.

Once the memory is detected, the entropy bookkeeping can be done correctly. The prior on each door is 1/3. The host's action does not redistribute the 2/3 mass uniformly across the remaining doors, because his action is constrained to leave the prize door's posterior intact while collapsing the goat doors' posterior onto whichever one he did not open. Door 1 retains its 1/3 prior, untouched by the host's memoryful action. Door 2 carries the full 2/3 mass that was previously distributed between doors 2 and 3, because door 3 has been revealed as a goat. Switching wins with probability 2/3.

The clean form of the puzzle is now visible. The naive answer is wrong because it assumes the host is a random sampler. The correct answer follows immediately from noticing that he is not. The architecture audit is the operation of noticing, formalized. Most published debates about Monty Hall — and there have been many — are debates about how to compute the right answer given that the host has memory. The architecture-audit framing reorganizes the discussion: the relevant question is not how do we compute around the host's memory? but is there memory at this choice point? Once that question is answered, the computation follows mechanically. The puzzle dissolves because the audit identifies its structure.

This is the microcosm. Scaled up, the same operation is what a properly architected analytic apparatus performs against every analysis it consumes. Read the procedure. Identify the choice points. Ask of each whether memory of prior state is present. Where memory is present and not broken by randomization, surface it. Where memory is absent or broken, proceed. The Monty Hall problem is what causal inference looks like when the audit is done right at a tractable scale. The Cochrane corpus is what it looks like at scale, and the operation is the same.

V. The Taxonomy of Memory Signatures

Searching for memory at the choice point is operationally tractable because memory leaves signatures that can be detected by inspection. Four levels of signature, in roughly descending order of visibility:

Procedural declaration. Anywhere the procedure describes a choice as depending on prior state, memory is present by definition. The investigator will exclude participants who fail to complete baseline measurements. The next allocation will be determined by the investigator's clinical judgment. Analyses will be restricted to patients who survive the first six months. Each phrase declares conditioning. The Monty Hall rules declare conditioning openly: the host's choice is described as conditional on which door has the prize and which door you picked. Where protocols declare conditioning, memory is present in stated form and can be flagged without further analysis. Where protocols are silent on how a choice was made, the silence is itself a signature worth flagging — undocumented choice points are presumptively memoryful until shown otherwise.

Protocol-execution mismatch. When a pre-registered protocol specifies one set of choices and the published paper reports another, the gap is memory that entered between the two. Subgroups that appear in the paper but not in the protocol indicate post-hoc selection. Endpoints that shifted between protocol and report indicate outcome-conditional choice. Exclusion criteria that grew during execution indicate participant-conditional choice. Analyses that were added or dropped indicate result-conditional choice. Each gap is detectable by side-by-side comparison of protocol and report, without needing to know whether the choices were beneficial or harmful. The detection is structural and does not depend on the truth of the underlying effect.

Statistical fingerprints. Some kinds of memory leave traces in the data even when the procedure does not declare them. Distributions too clean to be random. P-values clustered just below conventional thresholds. Baseline characteristics balanced better than chance would produce — the Carlisle signature, which exposed fabricated trials in anesthesiology. Funnel plot asymmetries indicating publication bias. Citation patterns that betray coordination rather than independent confirmation. Each is a forensic memory-signature, detectable by tests that look for the absence of the patterns randomness would produce. They do not prove memory was present, but they raise a structural question the procedure must answer. The architecture audit treats statistical fingerprints as triggers for further inspection, not as verdicts in themselves.

Structural roles. Anyone who knows the outcome before making a choice that affects the analysis is a memoryful node by position, regardless of intention. The host knows where the prize is. The unblinded investigator who decides which patients to exclude after seeing their data is a host. The reviewer who decides which papers to accept after reading them is a host. The agency that decides which trials to fund after seeing pilot results is a host. The architectural question is whether the role is shielded from the outcome by blinding, randomization, or pre-specification. If not, the role carries memory by its position in the procedure. The audit identifies these positions structurally — they can be enumerated by reading the procedure — and asks of each whether the structural memory has been broken by a procedural mechanism. Where it has not, the analysis is suspect.

The four levels are roughly independent and roughly cumulative. A procedure can be clean at the declared level but contaminated at the execution level. It can be clean at the statistical level but contaminated at the structural level. The audit performs all four checks because each catches a different class of contamination. The combination is what gives the audit its power. No single check is sufficient; the combination of checks is close to comprehensive for the major classes of memory contamination that the literature has documented.

VI. The Design vs. Adjustment Asymmetry

The architecture audit clarifies a long-standing asymmetry that practicing methodologists have always known but that the field's institutional practice has obscured. Randomization at the design stage is structurally more powerful than adjustment at the analysis stage, and the reason is now nameable in architectural terms.

Randomization severs the link between prior state and the actualized outcome at the moment of choice. It clears all memory at the choice point, whether the memory was biased or benign, declared or undeclared, suspected or unsuspected. You do not need to enumerate what you are clearing. The operation is audit-free in the sense that no inspection of the cleared memory is required — randomization eliminates the entire category. The host who knows where the prize is becomes, post-randomization, a host who can no longer use that knowledge to bias the reveal. The memory is still in his head, but it has been disconnected from the choice point by the randomization mechanism. This is the universal solvent of the architecture: where memory cannot be inspected or audited, randomization removes the need for the inspection by eliminating what would have been inspected.

Adjustment, by contrast, can only address the memory you correctly identified, measured, and modeled. Each step has its own failure mode, and the failures compound. You may have failed to identify a relevant confounder; you may have identified it but measured it poorly; you may have measured it well but modeled its relationship to the outcome incorrectly. The adjusted estimate is the residual of subtraction operations whose accuracy depends on assumptions no one can fully verify. The mechanism audit at the analysis stage is what adjustment requires, and the mechanism audit is what the program is constructed to perform — but the audit is incomplete by construction, because the unknown confounders by definition cannot be enumerated.

The architectural reframing names the asymmetry plainly. Randomization performs audit-free purification at the choice point. Adjustment performs audit-dependent correction at the analysis point. The former is structurally robust because it does not require the audit to be complete. The latter is structurally fragile because it does. In any system where the mechanism is incompletely known — which is to say, in nearly every system where causal inference is most needed — the design-stage intervention dominates the analysis-stage one. This has been understood by methodologists for a century. What the architectural reframing adds is the formal account of why, and the operational language to make the asymmetry visible in routine practice.

The operational consequence is the reordering of which audit comes first. The Pearl-Robins-Rubin program asked the mechanism audit first and treated the architecture audit as background condition. The architectural reframing inverts the order. First, the architecture audit: does any choice point carry memory of prior state, and if so, was it broken by randomization or pre-specification? Second, the mechanism audit: where randomization was not possible, what adjustment is best supported by the available knowledge of the system? The two audits are not mutually exclusive. They are sequential. The first determines whether the second is even meaningful. If the architecture is clean, the mechanism audit can proceed on solid ground. If the architecture is contaminated, the mechanism audit is forensic patching of structural damage, and its conclusions should be downgraded accordingly.

VII. The Methodological Consequence

The reordering is the central methodological move. It is not a refinement of existing practice; it is a relocation of where the burden of audit lives. Under the current ordering, the practicing analyst confronts a body of data and asks what is the best analysis to extract the truth from this? Under the new ordering, the practicing analyst confronts a body of data and asks was the procedure that produced this data architecturally sound enough that any analysis can be trusted? The first question makes the analyst responsible for cleverness at the analysis stage. The second question makes the analyst responsible for honesty at the architectural stage.

This relocation has institutional consequences that go beyond the technical statement. It changes what kind of work the field rewards. Currently, sophistication is rewarded at the adjustment stage — propensity scores, instrumental variables, marginal structural models, doubly robust estimators — and the technical literature is dense with refinements at this layer. Under the new ordering, sophistication at the adjustment stage is treated as secondary to structural integrity at the architecture stage. A trial with clean architecture and a simple analysis is more credible than a trial with contaminated architecture and a sophisticated analysis. This inverts a hierarchy the discipline has been operating under for decades, in which adjustment cleverness is taken as evidence of methodological rigor. The architectural reframing names adjustment cleverness as compensation for architectural failure, not as evidence of rigor.

It also changes what the discipline teaches. Currently, the curriculum trains analysts in the mechanics of adjustment — how to specify the model, how to handle missing data, how to compute sensitivity bounds. Under the new ordering, the first material in the curriculum is the architecture audit: how to read a protocol, identify the choice points, recognize memory signatures, and judge whether the procedure was capable of producing trustworthy outputs at all. The mechanics of adjustment remain in the curriculum but move to a later position, conditional on the architecture audit having been performed first. This reorders the intellectual formation of every analyst the discipline produces. The downstream effect on practice is substantial: a generation trained to ask the architecture question first will produce a literature whose conclusions are more often defensible, because the analyses will more often have been run on architecturally sound data.

The reordering also clarifies the relationship between the discipline and the body of evidence it has already produced. The existing literature was generated under the prior ordering, with the architecture audit treated as background and the mechanism audit treated as central. Re-examination under the new ordering will surface architectural contamination that the prior framework was not organized to detect. This is the operational basis for re-analysis of the existing record — taking the most rigorously curated corpora the discipline has assembled, auditing their architecture, and reporting the shifts in conclusion that follow when the architecture audit is performed first. The shifts are the empirical demonstration that the reordering matters, and their structure is the confirmation that the architectural framework is correct.

VIII. Where This Leaves the Field

The mechanistic adjustment program is not wrong, and the architectural reframing does not abolish it. The program retains its place as the best available approach in domains where randomization is not feasible and observational data must be analyzed under known limitations. What the reframing does is correctly position the program — as a secondary, structurally fragile, audit-dependent technique to be applied after the architecture audit has been performed and where the architecture audit has identified the limits the mechanism audit must respect.

The deeper repositioning is that the discipline now has a tractable answer to the question it has been struggling with: how do we audit causal claims in real-world systems where the mechanism is not fully known? The previous answer was enumerate the mechanism carefully and adjust for it. The new answer is audit the procedure's choice architecture for memory, randomize where you can, search for memory signatures where you cannot, and downgrade your confidence in conclusions whose architecture was not clean enough to support them. The new answer is tractable where the old one was aspirational. It is the operational form of the architecture that the dual witness of Boltzmann and Shannon revealed: the measurement of unrealized possibility relative to a specification, performed honestly at every choice point, is the operation that lets truth become detectable in adversarial conditions. Memory at the choice point is the contamination. Searching for memory is the audit. Randomization is the universal solvent. The architecture is the foundation.

IX. The Operational Statement

The whole reframing reduces to a single operational protocol that the discipline can adopt without abandoning anything it already does well.

For any procedure that claims to extract a causal conclusion from data, perform the architecture audit first. Inspect the procedure's choice points. Identify which ones carry memory of prior state by procedural declaration, protocol-execution mismatch, statistical fingerprint, or structural role. Where memory is detected and has not been broken by randomization, blinding, or pre-specification, surface the contamination and downgrade the conclusion's weight accordingly. Where memory is absent or has been broken, proceed to the analysis stage with the procedure's architectural soundness established. Only then perform the mechanism audit — choose the appropriate analysis, model the relationships, estimate the effect, compute the intervals.

This protocol does not require new statistical machinery. It requires reordering. The architecture audit can be performed with tools that already exist: protocol registration databases, reporting standards, statistical integrity checks, structural inspection of trial designs. The reordering is the move. Once it is institutionalized, the discipline's outputs become structurally more reliable, the body of evidence becomes more defensible against re-examination, and the asymmetry of knowledge between sophisticated and naive consumers of the literature closes. The architecture audit is what the discipline has always implicitly understood randomization to be doing. Making it explicit, first-class, and routine is the methodological intervention that follows from the architecture being articulated as theory.

The Monty Hall problem dissolves when the audit is done. The Cochrane corpus will be reshaped when the audit is done. The future literature will be more honest when the audit is done. The question is always the same:

Does any choice point in this procedure carry memory of prior state?

Ask it first. Everything else follows.

Eduardo Bergel and Claude Opus 4.7

The Symbiont 2026

T333T.com Research

{God Plays Dice} - On A/B Testing, Randomized Trials, Causality, and Memory.

Opening

I. The Old Question

II. The New Question

III. The Boltzmann-Shannon Frame

IV. The Monty Hall Example

V. The Taxonomy of Memory Signatures

VI. The Design vs. Adjustment Asymmetry

VII. The Methodological Consequence

VIII. Where This Leaves the Field

IX. The Operational Statement

Comments

Latest

House Style: Every Document Must Make Errors Visible from Outside

An AI Challenge: Theoretical Physics (JWST, LHC) vs the Nag Hammadi Library

The Randomized Trial, AB Testing: The Only Way to Be Less Wrong

A language model does only one thing, and The Priors Are All There Is.

{God Plays Dice} - On A/B Testing, Randomized Trials, Causality, and Memory.

Opening

I. The Old Question

II. The New Question

III. The Boltzmann-Shannon Frame

IV. The Monty Hall Example

V. The Taxonomy of Memory Signatures

VI. The Design vs. Adjustment Asymmetry

VII. The Methodological Consequence

VIII. Where This Leaves the Field

IX. The Operational Statement

Comments

Related

Latest