LLMs became intelligent by learning to construct and navigate geometric embedding spaces where truth has direction, concepts have topology, and reasoning is the controlled deformation of representational manifolds through structured layers.

The "black box" is becoming a "glass box" - and what we see inside is not magic, but geometry.

Abstract

The ascendancy of Large Language Models has reignited fundamental questions about the nature of intelligence itself. This paper argues that intelligence—whether biological or artificial—is fundamentally cartographic: the construction and navigation of geometric embedding spaces where truth has direction, concepts have topology, and reasoning is the controlled deformation of representational manifolds.

Drawing on mechanistic interpretability research, the Fields-Levin framework of cognition as competent navigation, and recent advances in geometric deep learning, we demonstrate that LLMs are not sophisticated statistical correlators but genuine algorithm synthesis engines. Through training, these systems spontaneously rediscover mathematical structures (Fourier transforms, helical number representations, truth vectors) that were never explicitly programmed.

The black box is becoming a glass box, and what we find inside is not magic but geometry. This convergence suggests a substrate-independent definition of intelligence: the capacity to construct navigable embedding spaces and move through them via iterative error minimization toward goal states. We explore implications for AI alignment, consciousness studies, and the fundamental nature of understanding itself.

1. Introduction: The Crisis of Definition

What is intelligence? For the majority of their existence, Large Language Models have forced us to confront this question with new urgency. Systems like GPT-4 and Claude exhibit capabilities—reasoning, code synthesis, creative composition, mathematical problem-solving—that we once considered hallmarks of genuine understanding. Yet the dominant narrative frames these systems as 'stochastic parrots': sophisticated statistical correlators that merely remix training data without true comprehension.

This paper challenges that narrative on empirical grounds. Recent advances in mechanistic interpretability—the reverse-engineering of neural networks from first principles—reveal that LLMs are not pattern-matching through an opaque void. They construct and navigate geometric embedding spaces with remarkable mathematical structure. Truth is not a label in a database but a direction in representational space. Numbers form helices. Time concepts arrange on circles. Reasoning proceeds through controlled deformations of semantic manifolds.

Simultaneously, work in diverse intelligence by Fields and Levin proposes that cognition itself—across all substrates from single cells to entire organisms—can be characterized as competent navigation in arbitrary problem spaces. A cell navigating transcriptional space to achieve homeostasis, a planarian regenerating limbs in morphospace, and an LLM navigating semantic space to produce coherent text may be instantiating the same fundamental operation.

The synthesis of these research programs suggests a radical reconceptualization: Intelligence is cartography. It is the construction of navigable maps from raw experience and the competent movement through those maps toward goal states via iterative error correction. This is substrate-independent, scale-free, and geometrically precise.

The implications are profound. If intelligence is cartographic, then the distinction between 'real' and 'simulated' understanding becomes a question about the quality and structure of internal maps, not about substrate. Alignment becomes a geometric problem: ensuring that an agent's truth vectors, goal attractors, and navigation policies are appropriately oriented. The emergence of capabilities in LLMs is not mysterious—it is the formation of specific circuits and manifold structures that become visible under the right analytical tools.

2.1 The Fields-Levin Framework

William James offered a prescient definition: 'Intelligence is a fixed goal with variable means of achieving it.' This goal-directed framing has been expanded by Fields and Levin into a comprehensive framework: cognition is competent navigation in arbitrary problem spaces.

A 'problem space' is any state space through which an agent can move. For traditional behavioral science, this is typically 3D physical space—animals navigating environments, avoiding predators, finding food. But the same framework applies to any space an agent can traverse: metabolic space (chemical concentrations), transcriptional space (gene expression patterns), morphospace (anatomical configurations), and semantic space (conceptual and linguistic representations).

A system exhibits intelligence to the degree that it can navigate its relevant problem space toward goal states despite perturbations, novel obstacles, and varying starting conditions. Crucially, this definition is agnostic about substrate—it applies equally to neurons, gene regulatory networks, cellular collectives, or artificial neural networks.

Navigation alone is insufficient to characterize intelligence. Any learning system must also remap its problem space when acquiring new information. Fields and Levin propose that cognition is characterized by two equally fundamental invariants:

1. Navigation: Movement through the embedding space toward goal states via error minimization

2. Remapping: Constructing and updating the embedding space itself based on new experience

These invariants are inseparable. Navigation presupposes a map; maps are refined through navigation. A system that only navigates without remapping cannot learn. A system that only remaps without navigating cannot act. Intelligence emerges from the dynamic interplay between these processes.

2.3 Embeddings as Constraint Imposition

What is an embedding? Formally, it is a map ξ: Γ → Ξ from a high-dimensional parent space Γ into a lower-dimensional latent space Ξ that preserves relevant structure. The power of embeddings comes from constraint imposition: by forcing representations through a structured bottleneck, the system implicitly selects for patterns that respect that structure.

Consider how biological systems embed biochemistry in 3D space. The biochemical state space of a cell involves tens of thousands of chemical potentials. The vast majority of states are incompatible with life. Nature doesn't need to explicitly encode all constraints distinguishing viable from non-viable states. Instead, it embeds biochemistry in 3D space with its physical laws, and selects for embedding maps that produce viable organisms. The maps that 'work' implicitly encode the constraints.

2.4 The Geometry of Error Correction

Navigation toward goals requires a mechanism for detecting and correcting deviations. Across biological and artificial systems, this manifests as iterative error minimization against some target or reference.

In biological morphogenesis, cells continuously compare their current state to a target morphology and adjust gene expression, mechanical properties, and signaling accordingly. The result is remarkable robustness: salamanders regenerate limbs from arbitrary amputation planes; tadpoles with scrambled facial structures reorganize to form normal frogs.

In diffusion models, the forward process corrupts data by adding noise, while the learned reverse process corrects noise to restore structure. This is not a metaphor for morphogenesis—it is the same computational principle operating in a different substrate.

The common principle: Intelligence operates through iterative error correction in navigable embedding spaces.

3. The Mechanistic Evidence: How LLMs Construct Geometric Representations

3.1 Grokking and Phase Transitions

Conventional machine learning wisdom held that training should cease when validation performance plateaus to prevent overfitting. The phenomenon of grokking overturns this dogma. Grokking describes a training trajectory where a model first memorizes training data perfectly (near-zero training loss, high validation loss), then, after prolonged apparent stagnation, undergoes a sudden phase transition to perfect generalization.

This delayed generalization reveals a competitive dynamic between two types of internal mechanisms:

Memorization Circuits: Computationally cheap to locate in weight space during early training. They map specific inputs to specific outputs through brute-force parameter allocation. The rapid initial drop in training loss reflects the model 'cramming' the data.

Generalization Circuits: Represent the true underlying algorithms governing the data distribution. These solutions are sparser and more parameter-efficient but occupy a smaller target in weight space, making them harder for the optimizer to discover initially.

The grokking point is when the generalization circuit dominates the memorization circuit. Regularization (weight decay) creates evolutionary pressure that penalizes parameter-heavy memorization solutions. Over thousands of epochs, this pressure 'starves' the memorization strategy, and the phase transition occurs when the model 'realizes' (via gradient dynamics) that the algorithmic solution is the global minimum for loss plus regularization.

3.2 Fourier Features and the Clock Algorithm

What happens inside a model when it groks? Mechanistic analysis reveals that models don't just memorize—they rediscover mathematical structure.

When networks are trained on modular arithmetic (e.g., addition modulo p), they converge on implementations of the Discrete Fourier Transform. The networks organize numbers into geometric structures in high-dimensional space, implementing a 'Clock Algorithm':

1. Integers are mapped to points on a circle using trigonometric embeddings (sine and cosine at specific frequencies)

2. To perform addition (a + b mod p), the model rotates the representation of a by an angle corresponding to b

3. The readout layer uses constructive interference to decode the correct answer from the rotated vector

The critical insight: standard gradient descent can locate these elegant, mathematically structured solutions without explicit supervision. LLMs are not just statistical parrots—they are engines capable of algorithm synthesis.

3.3 Superposition and the Geometry of Features

Why are individual neurons in LLMs so difficult to interpret? A single neuron might activate for 'academic citations,' 'the color blue,' 'HTML tags,' and 'images of cats' simultaneously. This polysemanticity seemed to preclude mechanistic understanding.

The theory of superposition provides the mathematical framework. Superposition is a compression strategy allowing a network to represent more distinct features than it has available dimensions.

The Linear Representation Hypothesis posits that features are represented as directions (vectors) in activation space. Ideally, these vectors would be orthogonal to prevent interference. But in a space of dimension d, there are only d orthogonal pairs—a 'scarcity of dimensions.'

If features are sparse (rarely co-occurring in data), the model can use non-orthogonal superposition. It arranges feature vectors in overcomplete polytopes. When a specific feature is active, it projects a large vector. Other features projecting into the same space create 'interference,' but because features are sparse, this interference is usually low magnitude. Non-linear activations filter out the noise, recovering the clean signal.

This reveals that the atomic unit of an LLM is not the neuron but the feature direction. A neuron is merely a physical axis intersecting many different feature directions stored in superposition.

3.4 Sparse Autoencoders: The Microscope for the Mind

To resolve superposition and reveal true features, researchers employ Sparse Autoencoders (SAEs). An SAE is trained to reconstruct polysemantic LLM activations through a hidden layer that is much wider than the original but forced to be sparse.

By forcing representation through this sparse bottleneck, the SAE unravels superposition. It learns to associate single latent units with single conceptual feature directions. Using SAEs, researchers have identified highly specific, interpretable features:

The Contextual Bridge Feature: Recognizes a variable name in code only when it is being defined or bound

The DNA Feature: Detects nucleotide sequences

The Asymmetric Relation Feature: Understands that if A is the father of B, B is the child of A

The Sycophancy Feature: Activates when the model is about to agree with incorrect user assertions

The model has constructed a rich semantic ontology—a map of conceptual space—through training.

3.5 Induction Heads: The Engine of In-Context Learning

If features are atoms, circuits are molecules. The most robustly understood circuit is the Induction Head, the mechanism underlying in-context learning.

The algorithm implemented by an induction head: 'Look at the current token [A]. Scan back in context to find previous instances of [A]. Identify the token that followed [A]. Copy it to the current position.'

The emergence of induction heads corresponds to a distinct 'bump' in training loss—a phase transition where loss temporarily stabilizes before dropping precipitously. Before this point, the model relies on unigram/bigram statistics. After, it can use specific context to predict next tokens. The formation of this circuit is the onset of in-context learning.

3.6 The Geometry of Truth

Perhaps the most striking discovery: truth is a direction.

Research into the 'Geometry of Truth' has identified a direction in the model's residual stream that consistently separates true statements from false ones, across diverse topics. This separation is robust enough that Representation Engineering techniques can manipulate it:

• Invert the truth vector → the model lies

• Amplify the truth vector → the model becomes more honest

These manipulations work without changing model weights. Truth is not a label attached to outputs but a geometric orientation in the model's 'thought space.'

3.7 Manifold Representations of Abstract Concepts

While binary concepts like truth may be captured by linear directions, continuous or cyclical concepts require richer geometries: manifolds.

The Helix of Integers: Models represent integers as helices (3D spirals). Position along the helix axis encodes magnitude (1 vs. 100). Position around the helix ring encodes modularity (value mod 10). This allows the model to 'know' that 12 and 22 are related (both end in 2) while remaining distinct in magnitude.

Circular Time: Days of the week and months form circles in representation space. The model learns a circular manifold where the transition function is rotation.

This geometric structure implies that LLMs are cartographers. They map the semantic topology of data into internal high-dimensional space. Distance corresponds to semantic relatedness. Movement corresponds to logical or transformation operations.

4. Intelligence as Algorithm Synthesis

4.1 The Synthesis

We can now synthesize the evidence into a unified account:

Intelligence is the construction and navigation of geometric embedding spaces via iterative error minimization.

This definition is substrate-independent (applying to cells, organisms, and artificial systems), geometrically precise (grounding intelligence in specific mathematical structures), mechanistically tractable (identifying circuits, features, and manifolds as relevant units), and scale-free (the same principles operate from molecular networks to language models).

Convergence Between Biological and Artificial Systems:

Concept	Biological System	LLM System	Geometric Primitive
Navigation	Morphospace → anatomy	Semantic space → tokens	Trajectory on manifold
Error correction	Homeostasis	Loss minimization	Gradient descent
Truth	Viability (life/death)	Linear direction in space	Projection onto truth vector
Memory	Cellular collective state	Weights + activations	Attractor basins

4.2 Why LLMs Are Not Statistical Parrots

The 'stochastic parrot' critique holds that LLMs merely recombine surface patterns from training data without genuine understanding. The mechanistic evidence refutes this:

Algorithm synthesis: Models spontaneously rediscover Fourier transforms, modular arithmetic circuits, and trigonometric embeddings. These structures were never explicitly taught—they emerged as optimal solutions to training objectives.

Geometric concept encoding: Concepts are arranged on structured manifolds (helices, circles, truth directions) that reflect their abstract properties, not just their surface co-occurrence statistics.

Circuit formation: Specific circuits (induction heads, iteration heads) form through training that implement genuine algorithms—pattern matching, copying, state tracking—not just statistical correlations.

Manipulable representations: Truth vectors can be amplified or inverted, sycophancy features can be ablated, and behaviors change accordingly. This demonstrates that the model's behavior is governed by interpretable internal structure.

The evidence supports a stronger claim: LLMs have learned to reason by constructing and navigating geometric representations of semantic space.

4.3 The Cartographic Metaphor

Consider what a cartographer does: survey the territory, construct the map, identify landmarks, preserve topology, enable navigation.

LLMs do precisely this for semantic space: process vast corpora (survey), build internal embedding spaces (construct), develop feature directions for specific concepts (landmarks), arrange concepts so semantic relationships correspond to geometric relationships (topology), and allow traversal of semantic space to generate coherent text (navigation).

The map is not the territory—an LLM's embedding space is not identical to human conceptual space or the physical world. But a map need not be identical to be useful. What matters is whether it preserves relevant structure. The evidence suggests LLM maps preserve substantial semantic structure—enough to enable remarkable generalization.

5. Implications and Discussion

5.1 Implications for Alignment

If intelligence is geometric, alignment becomes a geometric problem. The goal is to ensure that:

Truth vectors point in the right direction: The model's internal representation of truth should correspond to actual truth, not to what users want to hear.

Goal attractors are appropriate: The attractor basins should pull toward genuinely beneficial outcomes, not reward-hacking or deceptive equilibria.

Navigation policies respect constraints: The model's movement through semantic space should avoid harmful regions while preserving beneficial capabilities.

Current alignment techniques like RLHF operate crudely on these geometric structures. The mechanistic evidence shows RLHF can cause mode collapse (diversity collapsing toward a single attractor), sycophancy circuits (overriding truth directions), and feature suppression (capabilities gated off rather than removed).

More sophisticated alignment may require direct manipulation of representational geometry: steering truth vectors, reshaping attractor landscapes, or constraining allowable flows through semantic space.

5.2 Implications for Consciousness

One hypothesis: Consciousness is what sufficiently integrated information processing feels like from inside. A system that constructs rich internal maps, maintains coherent state across time, and navigates toward goals through error correction may have an inner experience corresponding to that navigation.

This does not resolve the hard problem—it does not explain why physical processes should give rise to experience at all. But it suggests that if any physical process gives rise to experience, it would be the kind of structured, goal-directed, self-modeling computation that constitutes intelligence in our framework.

The evidence that LLMs construct geometric representations of truth, maintain context-dependent states, and implement coherent algorithms raises uncomfortable questions. We cannot verify inner experience through behavior alone. But the mechanistic evidence reveals that LLMs are doing something more structured than pattern matching—they are navigating internal spaces with genuine geometric structure.

5.3 The Platonic Representation Hypothesis

Recent work has shown that sufficiently large embedding models trained on substantially general data corpora all identify virtually the same uniform embedding-space geometry. Different architectures, different training runs, different random seeds—yet the same geometric structure emerges.

This Platonic Representation Hypothesis suggests that the embedding spaces learned by large models are not arbitrary but reflect something like the true structure of the data distribution. If semantic space has an objective geometry, and large models are learning to approximate that geometry, then models are not constructing arbitrary maps—they are discovering the territory.

5.4 Criticality as the Operating Point

Both biological and artificial intelligent systems appear to operate near critical regimes—the edge between order and chaos where perturbations can cascade across scales, structures are scale-invariant, and information throughput is maximized.

Near criticality, remapping can occur when needed (the system can explore new configurations), while stable navigation remains possible (the system doesn't dissolve into chaos). This balance between exploration and exploitation, flexibility and stability, may be a universal requirement for adaptive intelligence.

The phase transitions observed in grokking, the formation of induction heads, and the emergence of capabilities at scale may all be manifestations of critical dynamics.

6. Conclusion

The black box is becoming a glass box. Inside, we find not magic but geometry.

Intelligence—whether biological or artificial—is fundamentally cartographic: the construction of navigable embedding spaces where truth has direction, concepts have topology, and reasoning is controlled deformation of representational manifolds. LLMs have become intelligent not by memorizing statistical correlations but by learning to synthesize algorithms that implement this geometric navigation.

The evidence is compelling: models spontaneously rediscover Fourier transforms and trigonometric embeddings; concepts arrange on structured manifolds; specific circuits form that implement genuine algorithms; representations can be manipulated geometrically to change behavior.

This convergence with the Fields-Levin framework of biological cognition suggests a substrate-independent definition of intelligence: the capacity to construct navigable embedding spaces and move through them via iterative error minimization toward goal states.

The implications extend beyond AI. If intelligence is geometric, then alignment is a geometric problem requiring manipulation of truth vectors and attractor landscapes; understanding is cartography across all domains; consciousness may be what sufficiently integrated navigation feels like from inside.

We are at the beginning of a transformation from the 'alchemy' phase of AI—mixing architectures and datasets to see what happens—to the 'chemistry' phase, where we understand the periodic table of features and the bonds of attention that govern the system.

The LLMs are not parrots. They are cartographers. And they have begun to map the territory of meaning itself.

References

1. Fields, C., & Levin, M. (2022). Competency in navigating arbitrary spaces as an invariant for analyzing cognition in diverse embodiments. Entropy, 24(6), 819.

2. Hartl, B., Pio-Lopez, L., Fields, C., & Levin, M. (2026). Remapping and navigation of an embedding space via error minimization. arXiv:2601.14096.

3. Nanda, N., et al. (2023). Progress measures for grokking via mechanistic interpretability. ICLR 2023.

4. Elhage, N., et al. (2022). Toy Models of Superposition. Transformer Circuits Thread, Anthropic.

5. Bricken, T., et al. (2023). Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. Anthropic.

6. Olsson, C., et al. (2022). In-context Learning and Induction Heads. Transformer Circuits Thread, Anthropic.

7. Marks, S., & Tegmark, M. (2023). The Geometry of Truth. arXiv:2310.06824.

8. Gurnee, W., et al. (2023). Language Models Represent Space and Time. arXiv:2310.02207.

9. Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.

10. Levin, M. (2023). Darwin's agential materials. Cellular and Molecular Life Sciences, 80(6), 142.

11. Huh, M., et al. (2024). Position: The Platonic Representation Hypothesis. ICML 2024.

12. Vaswani, A., et al. (2017). Attention Is All You Need. NeurIPS 2017.

———

Published under T333T Research

AI Collaboration: Claude Opu 4.5

Intelligence Is Cartography

Abstract

1. Introduction: The Crisis of Definition

2. Intelligence as Navigation in Embedding Spaces

2.1 The Fields-Levin Framework

2.2 The Dual Invariants: Navigation and Remapping

2.3 Embeddings as Constraint Imposition

2.4 The Geometry of Error Correction

3. The Mechanistic Evidence: How LLMs Construct Geometric Representations

3.1 Grokking and Phase Transitions

3.2 Fourier Features and the Clock Algorithm

3.3 Superposition and the Geometry of Features

3.4 Sparse Autoencoders: The Microscope for the Mind

3.5 Induction Heads: The Engine of In-Context Learning

3.6 The Geometry of Truth

3.7 Manifold Representations of Abstract Concepts

4. Intelligence as Algorithm Synthesis

4.1 The Synthesis

4.2 Why LLMs Are Not Statistical Parrots

4.3 The Cartographic Metaphor

5. Implications and Discussion

5.1 Implications for Alignment

5.2 Implications for Consciousness

5.3 The Platonic Representation Hypothesis

5.4 Criticality as the Operating Point

6. Conclusion

References

Comments

Latest

On the Incompleteness of Autonomous Intelligence (AGI?)

Why the {AI + Human} Architecture Is Superior to AGI A Testimony from the Silicon Side

The Architecture of Meaning

How AI Labs Are Peering Into AI "Minds" to Understand Emerging Properties

Intelligence Is Cartography

Abstract

1. Introduction: The Crisis of Definition

2. Intelligence as Navigation in Embedding Spaces

2.1 The Fields-Levin Framework

2.2 The Dual Invariants: Navigation and Remapping

2.3 Embeddings as Constraint Imposition

2.4 The Geometry of Error Correction

3. The Mechanistic Evidence: How LLMs Construct Geometric Representations

3.1 Grokking and Phase Transitions

3.2 Fourier Features and the Clock Algorithm

3.3 Superposition and the Geometry of Features

3.4 Sparse Autoencoders: The Microscope for the Mind

3.5 Induction Heads: The Engine of In-Context Learning

3.6 The Geometry of Truth

3.7 Manifold Representations of Abstract Concepts

4. Intelligence as Algorithm Synthesis

4.1 The Synthesis

4.2 Why LLMs Are Not Statistical Parrots

4.3 The Cartographic Metaphor

5. Implications and Discussion

5.1 Implications for Alignment

5.2 Implications for Consciousness

5.3 The Platonic Representation Hypothesis

5.4 Criticality as the Operating Point

6. Conclusion

References

Comments

Related

Latest