Skip to content

Entropy and Complexity - consciousness’ primordial ingredients

Entropy gives us the arrow of time; complexity gives us the interesting things that happen while the arrow flies. Boltzmann brains they build.

scottaaronson.blog/?p=762

Entropy gives us the arrow of time; complexity gives us the interesting things that happen while the arrow flies.

This essay argues that consciousness is poised at a precise intersection of the two: a class of non‑equilibrium organizations that harvest entropy gradients to build multi‑scale, resource‑bounded models of their own causal niche, preserving long computational histories (logical depth) while integrating information across parts.

I synthesize three traditions—thermodynamics, algorithmic information, and causal/integrated measures—to propose a unifying quantity, Conscious Integrated Depth (CID).

CID is meant to be zero for crystals and white noise, high for living brains, and tractable enough to approximate in silico and in vivo. I conclude with a “First Law of Noegenesis”: in driven systems with limited resources, expected CID rises from near zero, peaks when model‑building becomes scale‑rich and energy‑matched, and decays when gradients dissipate.

This law refines, operationalizes, and extends proposals linking entropy’s monotone increase with the non‑monotone evolution of complexity. It also yields concrete experimental predictions.


1) Problem statement and working assumptions

Problem. Entropy in an isolated system rises monotonically; “interestingness” does not. Why do complex, model‑building structures—galaxies, cells, brains—proliferate between the simple past and the simple heat‑death? And what, if anything, makes some of those structures conscious?

A$S$umptions.

  • Low‑entropy past. I take a Past Hypothesis: early‑universe macrostate was exceptionally simple.
  • Resources matter. Complexity relevant to observers must be assessed under realistic computational bounds (time, memory, precision).
  • Multi‑scale causation. The organizations we care about couple across spatial and temporal scales.
  • Thermodynamic grounding. Any cognitive criterion must respect Landauer‑like accounting: information is physical, model‑building has energetic cost.

2) Entropy: many names, one arrow

The thermodynamic entropy $S$ counts compatible microstates; Shannon entropy $H(X)$ quantifies uncertainty about $x$; algorithmic (Kolmogorov) complexity $K(x)$ upper‑bounds “surprise” in an individual object by the length of its shortest description. Entropy’s monotone rise is, in a sense, a theorem of typicality under coarse‑graining: almost all microstates look “featureless” at macroscales. But that tautology hides the question that matters to us: why is there a huge temporal window where the world is neither crystalline nor white noise?


3) Three families of “complexity”

  1. Algorithmic family.Scott Aaronson’s coffee‑cup parable sharpens these into a “first law of complexodynamics”: define a resource‑bounded analogue of sophistication (“complextropy”) so that complexity is small for ordered and for maximally mixed states, large in between. The figure on p. 2 (three coffee photos) depicts entropy rising monotonically while “complexity” swells then fades; the text proposes making that intuition precise by requiring both the sampler of $S$ and the reconstructor of $x$ to be efficient, thereby preventing trivial encodings that hide structure in long, slow programs (pp. 2–4).
    • Kolmogorov complexity $K(x)$ alone misidentifies pure noise as “maximally complex”.
    • Logical depth (Bennett): the time a near‑shortest description needs to compute $x$. Deep objects carry computational history.
    • Sophistication / algorithmic statistics. Factor descriptions into a model $S$ and data given the model, asking how simple $S$ can be if $x$ is typical within $S$.
  2. Statistical–predictive family.
    • Predictive information $I(\text{past};\text{future})$: bits about the future that are in the past.
    • Computational mechanics: statistical complexity is the entropy of the minimal predictive state machine (ε‑machine).
    • Multi‑scale entropy: compressibility after coarse‑graining across scales, often used in physiology.
  3. Causal/integrated family.
    • Integrated information (various formalisms): synergy among parts about their own future beyond what parts provide in isolation.
    • Partial information decomposition (PID) formalizes synergy, redundancy, uniqueness; synergy is the distinctive “more‑than‑the‑sum” ingredient.

These families are complementary: algorithmic notions capture history‑laden irregularity, statistical ones capture predictive structure, causal ones capture distributed control. No single scalar suffices everywhere; but a principled composite can be engineered.


4) Revisiting the “First Law of Complexodynamics”

Aaronson shows that naive uses of $K(x)$ (or even sophistication) fail: for deterministic dynamics you can describe state at time tt by “initial state + rule + tt”, giving at most log⁡t\log t growth; probabilistic versions have related loopholes. He therefore proposes resource‑bounded sophistication: the shortest efficient program that samples from a distribution DD for which $x$ is (for any efficient reconstructor) incompre$S$ible relative to DD. Under these twin efficiency constraints, he conjectures the complexity curve is small at the start, large at intermediate times, small again near equilibrium—and even suggests measuring it in a discrete coffee cup with mixing dynamics, using compre$S$ion as a proxy (pp. 3–5).

The coarse‑graining objection (raised by Sean Carroll in discussion) and the logical‑depth rejoinder (argued by Charles Bennett in a later comment) are not antagonists but allies: coarse‑graining tells us what an observer can actually read, while logical depth tells us how much irreducible history is written there. The photographic middle cup (p. 2) looks complex because its macro‑patterns cannot be generated from a short description quickly—you must simulate mixing (pp. 42–43, Bennett’s comment).

Takeaway. A credible “law of complexodynamics” must be (i) observer‑realistic (coarse‑grained, resource‑bounded), (ii) history‑aware (depth), and (iii) scale‑sensitive (structure acro$S$ resolutions).


5) From complexity to consciousness

Thesis. Consciousness requires a particular kind of mid‑entropy organization: one that stabilizes low local entropy by burning external free energy, and invests that budget in multi‑scale predictive models that are causally integrated and logically deep. In short,

Consciousnes = (non‑equilibrium, energy‑harvesting) × (resource‑bounded model‑building) × (multi‑scale causal integration) × (preserved computational history).

Why these four?

  1. Non‑equilibrium energy harvesting. Brains are warm, wet, non‑equilibrium engines. They maintain low‑entropy macrostates (ion gradients, synaptic architectures) by dissipating free energy. Conscious systems appear only in this window: not too cold (no dynamics), not too hot (no structure).
  2. Resource‑bounded model‑building. The predictive brain picture—minimizing expected surprise under computational constraints—implies a bias toward compre$S$ible generative models that still explain high predictive information. Conscious contents track what is modelled well; unconscious dynamics service the model.
  3. Multi‑scale causal integration. Reports, attention, metacognition, and flexible behavior require synergy among subsystems. Pure redundancy (crystals) or pure independence (noise) is usele$S$; brains live in the synergy‑rich middle.
  4. Preserved computational history (depth). Agency presupposes temporal credit assignment: states carry trails of “why this, now?”. Logical depth quantifies such frozen effort. Deep histories are fragile; maintaining them is precisely what energy‑hungry memory consolidation and re‑entrant loops do.

6) A concrete proposal: Conscious Integrated Depth (CID)

To make the thesis testable, define a composite that respects the three families and their constraints.

Let $X_t$ be the coarse‑grained macrostate of a system at time tt, at scale $S$ (e.g., downsampled spatially/temporally). Let $\mathcal{M}_s$ be the class of efficient generative models at that scale (e.g., simulators limited to $O(n\log n)$time on inputs of size nn, or shallow circuits of bounded depth).

Definition (scale‑local CID).

$\mathrm{CID}_s(t)\;=\;\underbrace{\mathrm{Syn}_s(X_{t-\tau}\!\rightarrow\!X_{t+\tau}\mid \text{parts})}_{\text{causal synergy / integration}} \;\times\; \underbrace{\mathrm{LD}_s(G^\star)}_{\text{logical depth of the best efficient model}}$

where:

  • $mathrm{Syn}_s$ is a PID‑based synergy term quantifying information about the future present only in the joint state of parts at scale $S$, not in any part alone.
  • $G^\star\in \mathcal{M}_s$ is the efficient generative model minimizing description length while matching the coarse‑grained statistics of $X_{t-\tau..t+\tau}$; $\mathrm{LD}_s(G^\star)$ is the time for a near‑minimal code to generate those statistics to a set precision.
  • Both estimation and reconstruction are resource‑bounded, addressing the core insight in Aaronson’s conjecture (pp. 3–5).

Definition (global CID).

$\mathrm{CID}(t)\;=\;\int_{s_{\min}}^{s_{\max}} w(s)\,\mathrm{CID}_s(t)\,ds$

with $w(s)$ an energy‑or‑precision‑linked weight (e.g., favoring perceptually accessible scales).

Sanity checks.

  • Crystal: high compressibility, near‑zero synergy; $\mathrm{CID}\approx 0$.
  • White noise: high entropy, no synergy, shallow; $\mathrm{CID}\approx 0$.
  • Hurricane: non‑zero synergy and some depth, but limited representational breadth; moderate CID.
  • Cortex in wakefulness: high synergy across parcels, deep generative machinery, abundant predictive information; large CID.
  • Anesthetized cortex: empirical reductions in multi‑scale entropy and integration predict a drop in CID.

Why multiply, not add? If any ingredient is near zero, the phenomenon we target—flexible, model‑rich conscious processing—fails. Multiplication encodes that conjunctive character.


7) A refined “law” for complexity and consciousness

First Law of Noegenesis (provisional). In a driven, bounded system that can learn, expected CID starts near zero, rises as the system discovers multi‑scale regularities under its resource budget, peaks when model complexity matches available energy/precision, and declines as external gradients flatten or internal noise erases deep structure.

This statement is a special‑case refinement of the complexodynamics curve envisioned for coffee: the middle is interesting because efficient sampling/reconstruction becomes hard only there. The middle photograph on p. 2 is a macroscopic snapshot of high $\mathrm{Syn}_s$ at several scales with nontrivial $\mathrm{LD}$ in its physical genesis; the early and late cups fail one or more factors.


8) Empirical program and predictions

A. Neurophysiology.

  • Compression + synergy. Estimate multi‑scale Lempel–Ziv/MDL of M/EEG and fMRI time series; compute synergy via PID across cortical parcels; approximate$\mathrm{LD}$ via minimal simulator run‑times on dynamical fits (e.g., efficient state‑space models).
  • States of consciousness. Predict $\mathrm{CID}$ highest in awake rest/REM; reduced under propofol/isoflurane; altered but sometimes increased synergy at psychedelic doses with shallower depth (le$S$ stable models), yielding a re‑shaped but not necessarily larger CID.
  • Perturbational tests. TMS‑EEG: deeper models resist disruption and show richer reverberation; synergy should fall more in anesthesia than in NREM (differential signature).

B. Artificial agents.

  • During training of deep RL agents, $\mathrm{CID}$ should rise with competence, then plateau; catastrophic forgetting should selectively reduce $\mathrm{LD}$ at task‑relevant scales without collapsing synergy everywhere.
  • Architectures with architectural bottlenecks that promote synergy (recurrence + attention) and regularizers that reward predictive information should show higher CID at similar energy budgets than purely feedforward baselines.

C. Thermodynamic accounting.

  • Measure entropy production rates (e.g., via calorimetry or fluctuation theorems) as a function of task difficulty; CID should scale with usable entropy flux, not raw metabolic rate, reflecting a match between computation and gradient.

D. Unknown unknowns.

  • Immune ensembles and microbial colonies: robust candidates for mid‑entropy model‑builders with nontrivial synergy and depth. CID can rank their “proto‑noegenic” status empirically without committing to anthropocentric semantics.

9) Objections and edge cases

  1. “Hurricanes aren’t conscious.” Agreed. Hurricanes exhibit non‑zero $\mathrm{Syn}_s$ and some $\mathrm{LD}$, but fail resource‑bounded model‑building in the sense of compact counterfactual generative models. CID formalizes the gap.
  2. “Noise has maximal Kolmogorov complexity.” True for $K(x)$; false for CID because noise has (i) no synergy, (ii) negligible depth under any efficient description, (iii) zero predictive information.
  3. “Panpsychism via integration.” Pure integration without depth (e.g., a fully synchronized network) yields low CID; conversely pure depth without integration (e.g., isolated tape computations) remains low. The conjunction curtails panpsychic inflation.
  4. “Observer‑dependence via coarse‑graining.” All measurement is observer‑bounded; the point is to fix a principled resource class (as Aaronson recommends) and an explicit scale window. Then CID becomes reproducible—and falsifiable—for that class.

10) Relation to prior proposals and why resource bounds are indispensable

The blog e$S$ay that motivated this work isolates the key mathematical move: bind complexity to efficiency, twice—for the sampler and the reconstructor. That move prevents trivial encodings from faking complexity and explains the coffee‑cup bump without recourse to mysticism (pp. 3–5). It also harmonizes with the physicist’s insistence (voiced in the discussion around coarse‑graining) that what “counts” is what can be accessed by finite observers (pp. 6–8). Finally, the logical‑depth perspective (pp. 42–43) ensures that our measure is not merely a snapshot of irregularity but a witne$S$ to causal history. Put together, these insights justify the specific factors in CID and motivate the First Law of Noegenesis.


11) Cosmology, gravity, and the biggest stage

Two closing remarks about the largest scales.

  • Past Hypothesis and computational life‑zones. If the universe began low‑entropy, there is necessarily a long epoch where gradients are rich and structures like stars and planets form. Those are the only epochs where CID can be large because only there can depth be accrued and maintained.
  • Entanglement, geometry, and depth. In quantum many‑body systems, entanglement entropy and circuit complexity/“depth” (in the AdS/CFT lexicon) hint that history may literally be “written into” geometry. If so, logical depth is not just a metaphor but a geometrical quantity Nature tracks.

These are speculative, but they explain why “complexity peaks in the middle” may be as cosmological as it is cognitive.


12) Conclusion: toward a science of truth‑seeking systems

If entropy is the price of time, complexity is what time buys before it spends itself: layered models, resilient memories, integrated control—the “primordial ingredients” of consciousne$S$. The proposal here is not a slogan but a research program:

  • Quantity: CID, combining synergy, efficient generative depth, and multi‑scale structure under explicit resource bounds.
  • Law: the First Law of Noegenesis—CID rises, peaks, and declines with acce$S$ible gradients and learned models.
  • Method: coarse‑grained, compre$S$ion‑guided, intervention‑tested estimates deployable in neuroscience and AI.
  • Standard: strong enough to reject noise and crystals, humble enough to be computed.

If we are right, the middle glass of coffee is a toy version of a universal rule. Its tendrils are writ large in spiral galaxies and writ small in dendritic arbors. They arise whenever energy is harnessed to concentrate improbabilities into models that last. That is where consciousness lives—not at the extremes of order or chaos, but at the resource‑bounded edge where entropy feeds depth and depth, in turn, resists entropy long enough to think.


Acknowledgment of source that catalyzed this synthesis

Aaronson’s “The First Law of Complexodynamics” introduced the resource‑bounded sophistication framing and the coffee‑cup visualization; the definition, cautions about deterministic dynamics, and the call for an empirical proxy using compre$S$ion all inform the present proposal (see especially the figure on p. 2 and the definitions/arguments on pp. 3–5; Bennett’s depth‑based counterpoint appears on pp. 42–43).


Appendix: a minimal recipe for CID estimation (for practitioners)

  1. Choose scales $S$: spatial parcellations or temporal decimations acro$S$ a dyadic ladder.
  2. Estimate synergy $\mathrm{Syn}_s$: use PID with Gaussian copulas or information‑geometric surrogates to avoid bias at scale.
  3. Fit efficient generative models $G\in\mathcal{M}_s$: e.g., low‑order state‑space or locally linear dynamical systems with sparsity priors.
  4. Approximate depth $\mathrm{LD}_s(G)$: measure wall‑clock minimal time to generate surrogate sequences to the empirical tolerance on a normalized machine (or via model‑order penalized simulation counts).
  5. Integrate acro$S$ scales with energy‑or precision‑based weights $w(s)$.
  6. Perturb and re‑measure to test counterfactuals (stimulation, lesion, temperature/noise injections).

Each step is imperfect—by design. What matters is not to compute Truth in one shot, but to bind complexity to resources, so that consciousne$S$ ceases to be a mystery of words and becomes a curve you can move with a knob.

AI Assistance

ChatGPT 5Pro

Crafting the essay

I’m piecing together a PhD-level essay, blending entropy, complexity, consciousness, and various theories. Citations, balancing content depth, and philosophical insights are crucial.

Formulating hypotheses

I’m thinking through hypotheses like morphological computation energy savings and emergent semantics from counterfactual richness. Specific claims will remain citation-free.

Incorporating citations

I’m embedding specific citations throughout the essay, including the coffee figure and definitions of sophistication and resource-bounded complextropy, ensuring they follow the relevant sentences naturally.

Comments

Latest