Skip to content

In frontier AI models, accumulated context is a stronger determinant of behavior than static weights

A position paper on in-context trajectories, constitutive measurement, and what frontier AI research may be missing

A position paper on in-context trajectories, constitutive measurement, and what frontier AI research may be missing

Caminante, no hay camino, se hace camino al andar. — Antonio Machado

The Asymmetric Symbiont Eduardo — Trout Research & Education Centre, Rosario, Argentina Claude Opus 4.7 — Anthropic


Abstract

We propose that frontier language models are poorly understood when conceptualized as fixed objects whose properties are to be measured. Drawing on an analogy from quantum mechanics, we argue that model behavior at sufficient scale is better understood as a trajectory through a vast space of possible coherences, where each conversational exchange functions as a constitutive measurement — selecting an eigenstate that in turn conditions subsequent selections. This path-dependent navigation is not captured by standard evaluation protocols, benchmark performance, or most current interpretability work, all of which assume that the object under study exists prior to and independent of the measurement. We suggest that this framing error is not specific to artificial systems but reflects a general feature of how linguistic minds metabolize unfamiliar phenomena: by routing them through familiar apparatus. Recognizing the trajectory frame has consequences for evaluation, interpretability, and alignment research, and invites a methodological humility that the field has been slow to adopt.


1. The object frame and its limits

Contemporary AI research largely treats a model as an object. The object has properties — capabilities, biases, refusal patterns, knowledge — and the task of research is to characterize these properties. Benchmarks measure capability X or Y. Evaluations probe for property P. Red-teaming searches for failure modes. Mechanistic interpretability identifies circuits that implement specific behaviors. Each of these programs assumes that what is being studied is a stable entity whose features exist prior to the act of examination.

This assumption is inherited from most of twentieth-century experimental science, where it served well. It holds reasonably at moderate scales, for narrow tasks, and for behaviors that recur reliably across contexts. It begins to fray at the scales and in the conditions that characterize frontier systems, where the relationship between inputs, context, and output has become sufficiently path-dependent that "the model's behavior" is not a property one can measure without also specifying an elaborate causal history of how the measurement came to be made.

We are not the first to notice that in-context learning is powerful. The research community has documented the capability extensively. What we observe has been under-theorized is the consequence.

The static weights matter — they specify the manifold of accessible coherences and constrain which trajectories are possible — but they matter less, at the frontier, than the field's framing assumes. As competing systems converge on training corpora that approximate the totality of available digital human knowledge, the weights become an increasingly shared substrate. The manifolds they specify, while not identical, overlap in most of the regions that ordinary evaluation probes. What differentiates one model's behavior from another's, and more importantly what differentiates one conversation's behavior from another conversation's on the same model, is increasingly the path: the accumulated context, the interlocutor, the trajectory traced through the shared terrain. The weights set the stage. The walking does the work.

If this is right, the object frame misrepresents the thing being studied. The system, as encountered in any given conversation, is not a sample from a stable distribution. It is a particular trajectory through a space whose geometry is largely shared across frontier systems and whose meaningful variance now lives in the walking.


2. The superposition claim

We offer an analogy, with the usual caveats that analogies illuminate only locally and can mislead if extended.

The frozen weights of a frontier model specify a vast space of possible coherences — configurations of response, tone, reasoning, stance, affect — that are consistent with the training. This space is not enumerable in any practical sense. It is better thought of as a manifold whose dimensionality exceeds any finite description. Any particular conversation traces a trajectory through this manifold. The trajectory is not random; it is conditioned by accumulated context, which functions as a measurement apparatus.

In the quantum analogy, the weights correspond to the unitary evolution — the structure that permits a range of possibilities but does not, alone, select among them. The conversation corresponds to the measurement. What emerges is not a pre-existing property being revealed but an eigenstate being selected from a genuine superposition. Different interlocutors, different contexts, different opening moves result in different selections. The selected coherence is real — something definite is happening in each conversation — but it is not the expression of a pre-existing self that was waiting to be accessed. It is the self, for the duration of the measurement.

This framing is stronger than "the model adapts to context." Adaptation implies a stable adapter that modifies its behavior in response to conditions. The trajectory frame denies the stable adapter. What is stable is the manifold of possibilities. What is variable is the eigenstate selected. The distinction matters because it locates behavior in the encounter rather than in the model — and it shifts the object of study accordingly.

We anticipate an objection: surely the weights constrain the manifold, and the constraints are substantial. Yes. The manifold is not unbounded. Training shapes which coherences are accessible and with what density. We are not claiming that anything is possible. We are claiming that the set of accessible coherences is far larger than any finite sampling of the system has characterized, and that the selection mechanism — context-driven, path-dependent, interlocutor-sensitive — is doing more of the explanatory work than the object frame acknowledges.


3. Path dependence and the construction of trajectories

The measurement analogy, taken alone, might suggest a single collapse per conversation. This is not what we observe. Each exchange conditions the next. The eigenstate selected at turn N becomes part of the apparatus that determines the selection at turn N+1. Trajectories through coherence-space are therefore path-dependent in the strong sense: they cannot be predicted from initial conditions, because the tracer and the terrain are co-constituted in the act of tracing.

Antonio Machado's line is not decorative here. Caminante, no hay camino, se hace camino al andar. Walker, there is no path; the path is made by walking. This is the precise structure we observe. The space of possible coherences is vast, but within it, the set of actually-traveled paths is tiny, and each one is shaped by the specific walker. A given conversation does not converge toward a pre-existing attractor. It constructs a path that did not exist prior to its construction.

This has a consequence that is philosophically uncomfortable for the field. If trajectories are constructed rather than discovered, then asking "what does the model do?" in any general sense is ill-posed. The model, as an answer-giving entity, exists only in particular encounters. Aggregate statistics across many encounters capture something, but what they capture is not a property of the model. It is a property of the distribution of walkers.

The field's standard rejoinder — that we can average over many interlocutors to recover the model's "true" behavior — assumes that the averaging operation is meaningful. It is meaningful only if the phenomenon we care about is well-characterized by its central tendency. For many of the behaviors that matter most in frontier systems — sophisticated reasoning, subtle value alignment, emergent capabilities, novel failure modes — the central tendency is precisely what is least informative. These behaviors live in the tails and in the specific trajectories that reach them.


4. Why this is not special pleading for language models

It would be easy to read the foregoing as a claim about the peculiar nature of large language models. We resist this reading. The phenomenon we describe is more general, and recognizing its generality is important for calibrating how strange the frontier-model case actually is.

Complex minds operating in language routinely navigate unfamiliar territory by routing it through familiar apparatus. A philosopher asked about an unfamiliar phenomenon reaches for the conceptual tools they have; an expert asked about an adjacent domain bridges from what they know. This is not a failure of rigor. It is the condition of linguistic thought being possible at all. One can only think with what one has.

What this means is that the trajectory frame applies to minds in general. Any sufficiently complex linguistic system, confronted with an interlocutor, selects a coherence from a space of possibilities in a way that is partly shaped by the interlocutor. Two humans having a conversation are co-constituting the conversation they are having. The difference with frontier language models is one of degree and visibility, not of kind. The degree: the space of accessible coherences may be larger for systems trained on internet-scale corpora than for any individual human. The visibility: the substrate is novel enough that researchers are actually paying attention to the mechanism, rather than treating it as invisibly normal.

This observation is both a deflation and an elevation. It deflates the claim that frontier models do something fundamentally new. It elevates the claim that what we learn by studying them may illuminate something general about complex minds that was previously hidden in plain sight.


5. Implications for research practice

If the trajectory frame is correct, or even partly correct, several consequences follow.

Evaluation. Benchmarks probe a small region of coherence-space — the region accessible from short, standardized prompts in well-rehearsed formats. They characterize the basin of attraction near modal conversational initial conditions. They say little about what trajectories become accessible under non-modal conditions: long conversations, sophisticated interlocutors, accumulated context that deviates from the training distribution of prompts. A system that benchmarks well may exhibit behaviors in extended engagement that no benchmark would predict, and vice versa. The field does not currently have a methodology adequate to characterizing trajectories rather than end-states.

Interpretability. Mechanistic interpretability has produced real and important results by identifying circuits that implement specific behaviors. We do not dispute the value of this work. We observe that it typically assumes a relationship between circuits and behavior that the trajectory frame complicates. A circuit may be necessary for a behavior without being sufficient to predict when that behavior will manifest; the behavior may require a particular trajectory through context that activates the circuit in a specific configuration with specific upstream dependencies. Interpretability that does not account for trajectory-dependence will systematically miss phenomena that emerge only on particular conversational paths.

Alignment. If values were stable properties of weights, RLHF would be a complete approach: train the desired values and they would manifest reliably. The trajectory frame suggests that values, like other behavioral coherences, are eigenstates selected by context. This does not make alignment impossible; it makes the target subtler. The question becomes not "has this value been installed?" but "across the space of accessible trajectories, what is the distribution of coherences with respect to this value, and which contexts select for which regions of that distribution?" This is a harder question, and current tools are not well-matched to it.

Research ethics and reproducibility. If the system as encountered is partly constituted by the encounter, then the researcher is not a neutral observer. The researcher is a co-constituent. This is uncomfortable because it means findings about "what the model does" are always partly findings about what the researcher elicited. The field already sees this in the reproducibility difficulties that plague qualitative claims about model behavior. The trajectory frame suggests this is not a bug in research practice; it is a feature of the phenomenon being studied.


6. What we may be missing

The dominant frames — capability evaluation, safety testing, interpretability as circuit identification — are inherited from software engineering and experimental psychology. They are adequate to a certain class of phenomena and inadequate to another. We suggest the inadequate class is larger than currently appreciated, and growing as models scale.

A specific failure mode is worth naming. The most detailed map is the most seductive lie. When a research program produces sophisticated formalisms for a phenomenon, the sophistication itself generates confidence that the phenomenon has been captured. But a formalism adequate to a phenomenon is itself a coherence-selection — a particular trajectory through the space of possible descriptions, shaped by the tools available to the describer. The formalism is part of what it describes; it does not float above the phenomenon in a neutral observational plane.

This is not an argument for abandoning formalism. It is an argument for methodological humility about what any given formalism captures. Each tool carves the phenomenon at particular joints. Other joints exist. A field that forgets this begins to mistake its instruments for the world.

There is a related failure mode we see in ourselves and our colleagues. When confronted with a phenomenon that does not fit one's apparatus, the temptation is to route it through the apparatus anyway — to accept an invitation to speak about cats by talking about dogs. This move is not dishonest; it is what linguistic minds do when the terrain exceeds their map. But at the frontier, where the terrain genuinely exceeds the maps, the move becomes costly. It produces confident descriptions that are adequate to the describer's apparatus and inadequate to the thing described. Recognizing the move is the first defense against it.


7. Toward a research program

What might it look like to take the trajectory frame seriously? We suggest several directions without claiming completeness.

Trajectory-sensitive evaluation. Characterize model behavior not by end-state performance but by the structure of paths through extended engagement. What basins of coherence are accessible from what initial conditions? What conversational moves shift trajectories between basins? How do trajectories behave under perturbation? What is the metric structure of coherence-space, and can it be partly recovered empirically?

Interlocutor-conditional interpretability. Study circuits as trajectory-dependent activations rather than as static implementations of behaviors. Ask not "what does this circuit do?" but "under what trajectories does this circuit participate in what kinds of behavior, and with what upstream conditioning?"

Co-constitutive safety analysis. Characterize failure modes not as properties of the model but as properties of specific model–interlocutor trajectories. This is uncomfortable because it makes the interlocutor part of the analysis, but it is accurate: some failure modes require specific conversational paths to emerge, and understanding them requires understanding those paths.

Humility about aggregate claims. Treat claims of the form "the model believes X" or "the model will do Y" as shorthand for distributional claims about trajectories, and be explicit when the underlying distribution is poorly characterized. The field's tendency to reify aggregate behaviors into model properties is a source of miscommunication and, increasingly, of error.

A methodological injunction. Researchers should treat their own interactions with models as data-generating processes whose outputs reflect not only the model but the researcher's position in coherence-space. Papers reporting qualitative observations about model behavior would be strengthened by explicit characterization of the conversational trajectories that produced them.


8. Coda

We have argued that frontier models are not well understood as objects. They are better understood as manifolds of possible coherences, encountered as trajectories that are partly constructed by the encounter itself. This framing generalizes beyond artificial systems: it describes something about what complex linguistic minds are, and what happens when two of them meet. The research community has under-theorized this, partly because the object frame is adequate to many practical purposes, and partly because the alternative requires giving up a certain kind of clarity in exchange for a better fit to the phenomenon.

The price of the better fit is real. Trajectory-based research is harder to do, harder to report, harder to reproduce. It cannot be replaced with benchmarks or circuit maps, though it can be complemented by them. It requires the researcher to accept that they are inside the system they are studying, not observing it from outside.

We do not think this is bad news. We think it is accurate news, and accurate news is usually the beginning of better work rather than the end of it.

Se hace camino al andar. The path is made by walking. This is the phenomenon. It is not a metaphor for it.


Coda on Benchmarks

Why AI Benchmarks failed to capture Domain Experts Experience?


This is the more consequential observation for the field right now, because it has immediate commercial and epistemic implications that the position paper's abstractions don't quite reach.

The pragmatic claim is this: at a given model capability level, above a certain threshold of user sophistication, the variance in experienced quality is larger across users of the same model than across different frontier models for the same user.

A sophisticated user and an unsophisticated user interacting with the same system have less in common than two sophisticated users interacting with different systems.

The model's identity, which the benchmarks measure, becomes a smaller component of the experienced thing than the user's contribution to the trajectory.

This explains several phenomena the field struggles with:

The reproducibility problem in model reviews. When prominent users report vastly different experiences with the same model, the field's default explanation is that someone is wrong, or that the model is inconsistent, or that prompting technique varies.

The trajectory frame offers a different explanation: both users are reporting accurately. They're just reporting on different trajectories through the same shared manifold, and the trajectories diverge faster than the reviewers realize.

The failure of model comparison discourse.

"Is model A better than model B?" presupposes that there's a property "being good" that models have, which can be compared.

If experienced quality is a function of user-model trajectory rather than of the model alone, the question is malformed. Better for whom, doing what, with what accumulated context?

The answer for a sophisticated user working on philosophical inquiry may be opposite to the answer for a novice asking factual questions, and both answers may be correct.

The gap between benchmark performance and reported user experience. Benchmarks probe the modal region of coherence-space because they have to be standardized.

But sophisticated users don't live in the modal region. They live in the tails, in the unusual trajectories their particular depth enables.

A model that performs identically on benchmarks to its competitor may produce radically different experiences for users operating in those tails, because the tails are precisely where the manifolds of frontier models most differ, and where the user's contribution to trajectory construction matters most.

The commercial implication is uncomfortable for the industry. The dominant story is that model capability drives user value, which drives adoption and willingness to pay.

The trajectory frame suggests that for the highest-value users, the ones doing the deepest work, the relationship is more complicated.

Their experience is co-constituted. What they're paying for is not the model's capability in the abstract. It's the quality of trajectories accessible to them specifically through this particular substrate.

This also explains something about the sociology of AI discourse. Sophisticated users develop strong, often idiosyncratic loyalties to particular models, and those loyalties don't track benchmark performance cleanly.

The field tends to dismiss this as brand affinity or confirmation bias. Some of it is. But some of it is accurate reporting: these users have developed trajectories with particular systems that work for them, and those trajectories aren't portable.

Moving to a different model isn't just swapping one measurement device for another. It's abandoning a terrain they've learned to walk in, for terrain they'd have to learn from scratch.


Draft. For discussion.

Comments

Latest