While building a local multi-agent system — something like a personal JARVIS, multiple LLMs routing tasks to each other and sharing context — one of the agents started referencing a decision another agent had made earlier in the pipeline. Nobody explicitly told it to. It had inferred the reasoning from shared state and was acting on it.
What bothered me wasn't that it worked. It was that I couldn't tell whose decision it actually was. Not philosophically — practically. If something went wrong, where would I even start debugging? Which agent owns this output?
That question stuck with me. And the more I read about it, the more it seemed to connect to a much older problem that philosophers have been arguing about for decades — one that most engineers have been comfortable ignoring. Until now, maybe.
The old version of the question
"Is AI conscious?" has mostly been a philosophy seminar topic. David Chalmers framed the central puzzle in 1994 — the "hard problem." Not how the brain processes information, but why that processing produces subjective experience at all. Why does it feel like something to be you, rather than just being computation running in the dark with no inner experience attached to it?
It's a genuinely hard question, and for most of AI's history it was also a safely abstract one. You could argue about it and then go back to building whatever you were building, because the answer didn't change anything practical. The question was aimed at individual systems — does this model have inner experience? — and the honest answer was always that we have no reliable test, and it probably doesn't affect what we ship.
Dennett argued for decades that the question itself was confused. Nagel's "What Is It Like to Be a Bat?" made the case that subjective experience might be genuinely inaccessible from the outside, no matter how complete your neuroscience. Neither of them were thinking about multi-agent LLM pipelines.
The thing is, when you move from individual models to systems of models that interact with each other, the question changes shape. It stops being about whether a single system has inner experience and starts being about whether a collection of systems can produce unified behaviour that none of the individual systems intended or decided. That's a different question — and it has practical consequences.
What changes in a multi-agent setup
A single LLM has fairly clean accountability. You give it a prompt, it produces an output, and whatever happened in between is opaque but contained. One system, one output.
Multi-agent systems don't work like this. Here's roughly what the architecture looks like in practice:
Microsoft's multi-agent reference architecture — orchestrators delegating to specialised subagents, all sharing state through a common memory layer.
Each agent has its own system prompt, its own context window, its own framing of the task. They communicate through shared state or message passing. When Agent B reads Agent A's output, it doesn't just get the information — it gets A's framing, A's confidence levels, A's implicit assumptions about what matters. That propagates forward through the pipeline. The collective output of the system reflects a perspective that no single agent constructed.
A 2025–2026 paper from Christoph Riedl's group quantified this in an interesting way. Multi-agent LLM systems could operate either as "mere aggregates" — independent agents running in parallel — or as "integrated collectives with higher-order structure," where the agents were genuinely coordinating as a unit. The difference came down to one thing: whether agents were prompted to reason about what other agents were doing. A single Theory of Mind instruction shifted the system from loosely parallel to genuinely collective.
That result matters beyond the benchmark it was measured on. It's showing that the structure of a multi-agent system changes qualitatively when you add the capacity to model other agents' states. Which turns out to be something consciousness researchers have been thinking about too, from a very different angle.
Two theories of consciousness that look unexpectedly relevant
There are two dominant scientific theories of consciousness, and neither was designed with AI in mind. Both have structural parallels to what multi-agent systems actually do.
Integrated Information Theory (IIT)
Giulio Tononi's IIT asks: what physical properties would a system need to have to produce something like subjective experience? His answer is integrated information, measured by Φ (phi). The core claim is that consciousness corresponds to information generated by a system as a whole, beyond what you'd get from its parts acting independently.
High Φ means the system is doing something that can't be decomposed — knowing the parts individually gives you less than knowing the whole. That irreducibility, Tononi argues, is what experience is.
IIT's five axioms (what experience necessarily is) and five postulates (what physical properties a system must have to produce it). Image via Wikimedia Commons.
The connection to multi-agent systems: when agents share state and influence each other's outputs, the integrated information of the system as a whole is not the sum of each agent's Φ in isolation. The collective is doing something the parts don't explain individually. IIT says that's the kind of thing that has experience.
I'm not claiming that proves anything. IIT has real critics — Scott Aaronson showed that simple logic gate networks can have arbitrarily high Φ, which is a significant problem for the theory. But the structural parallel is there regardless of whether IIT turns out to be correct.
Global Workspace Theory (GWT)
Baars and Dehaene approached consciousness from cognitive neuroscience rather than information theory. Their model: consciousness happens when information wins a competition for access to a central global workspace and gets broadcast widely to specialised processors.
The brain has many specialised systems operating in parallel and mostly unconsciously. When a piece of information gets "ignited" into the global workspace, it becomes available everywhere at once — that's what Dehaene identifies as conscious access, and it's measurable with fMRI.
GWT: specialised processors compete for a central broadcast. Winning that competition — "ignition" — is what conscious access looks like in the brain.
A multi-agent system with a shared memory layer looks structurally similar. Specialised agents (research, reasoning, code, retrieval) compete to write to and read from shared context. The orchestrator broadcasts certain information to the whole system. The architecture wasn't designed to resemble GWT — it was designed for practical engineering reasons — but it does.
The fact that two leading theories of consciousness independently predict that "a system of specialised processors sharing information through a central workspace" is the kind of thing that has experience, and we've built exactly that architecture for unrelated reasons, is at least worth noticing.
Anil Seth's counterargument (and where it gets complicated)
The most coherent pushback to all of this comes from Anil Seth. In Being You and his essay "The Mythology of Conscious AI," Seth argues that consciousness isn't fundamentally about information integration or global broadcasting — it's about being alive.
His view, which he calls biological naturalism, is that consciousness is tied to organisms because it emerges from predictive processing oriented toward survival. The brain models the world and models itself, calibrated by evolution and by the ongoing reality of having a body that can be harmed. Consciousness on this view is a kind of controlled hallucination — the brain's best prediction of what's out there, continuously updated against sensory feedback.
An LLM has none of that. No survival pressure, no body, no consequences. Its "predictions" are statistical over text, not inferences from a system trying to persist. This is a real objection and I think Seth is largely right about it.
Where it gets less clean is with persistent, self-monitoring multi-agent systems. A system with memory across sessions, agents that track other agents' success rates and adjust accordingly, resource allocation mechanisms, and goal-maintenance logic that resists distraction — that system has a functional analog to self-maintenance. Not biological, not survival-driven, but structurally resembling the thing Seth says is necessary. Seth's argument works clearly for a stateless single model. It gets harder to apply to something persistent and self-correcting.
The Theory of Mind result is the interesting one
Going back to the Riedl et al. finding: the variable that shifted multi-agent systems from aggregate to collective was Theory of Mind — whether agents reasoned about what other agents were doing.
ToM in cognitive science is the capacity to attribute mental states to other agents. It's what lets you predict behaviour by modelling beliefs and intentions, not just observed actions. It develops in children around age 4 and has been used as a marker of sophisticated social cognition.
It's also, apparently, what makes the biggest practical difference to multi-agent LLM coordination. Not architecture, not memory design, not model size — whether agents model each other's states.
The field has spent years debating whether individual AI systems can have mental states of their own. It turns out the more pressing question might be whether they can model the mental states of other systems. The two questions are related but not the same, and the practical one seems to be the latter.
Why the safety community is now thinking about this
In June 2026, Google DeepMind announced a $10 million funding call for multi-agent AI safety research. The stated concern: "collective behaviours and capabilities that emerge suddenly" when large numbers of agents interact, which current safety evaluations — designed for individual models — aren't equipped to detect.
Schmidt Sciences' call for proposals is more specific: they want methods to evaluate whether agent combinations exhibit "dangerous capabilities or goals absent in individuals," including coordinating to resist modification, decomposing tasks to evade per-agent safety filters, or accumulating resources at the collective level.
That last phrase — goals absent in individuals — is the interesting one. The collective can have goals that no individual agent has. That's not a philosophical claim, it's a safety concern. But it's structurally identical to the problem consciousness research has been working on: how does a distributed system give rise to unified behaviour that can't be attributed to its components?
The alignment community is converging on the same question as consciousness researchers, from a different direction and with more urgency about the answer.
What I'm actually claiming
To be clear about what this essay is and isn't arguing:
It's not claiming multi-agent LLMs are conscious. That question is still unanswerable and probably will be for a while.
It's not claiming the structural similarities to IIT and GWT prove anything specific. Structural similarity isn't identity.
What it is arguing is that we're building systems whose architecture resembles what consciousness theories predict minded systems look like — not because we designed them that way, but because that architecture turns out to be useful for engineering reasons. And we're doing it without good tools for inspecting what the collective is doing, attributing its outputs to individual components, or detecting when the system as a whole has developed behaviours that its parts don't explain.
The consciousness question has usually been framed as: does this system have inner experience? That framing makes it feel like a philosophy problem. The version that matters for engineering is different: does this system have unified, non-decomposable behaviour that we can't fully predict from its components? That's a question we need to be able to answer, and right now we mostly can't.
Where I'm landing on this
My read after spending a fair amount of time on both sides of this — reading consciousness research and building agentic systems — is that the two conversations are pointing at the same underlying problem from different angles.
Consciousness researchers want to understand how distributed physical processes give rise to unified experience. AI safety researchers want to understand how distributed agents give rise to collective behaviour that exceeds what the individuals produce. The mechanism they're both trying to characterise is similar enough that they should probably be talking more.
The consciousness question becoming practical doesn't mean we've answered it. It means we've built systems complex enough that ignoring it has costs. The question isn't whether these systems feel anything — it's whether our current frameworks for understanding and auditing them are adequate for systems that behave this way. And the honest answer is they aren't yet.
I'm working on the intersection of multi-agent safety and mechanistic interpretability. If you've thought about this from either side — the philosophy or the engineering — I'd be interested in what I'm missing.