Intelligence Economics

AI-Native Begins Where Meaning Becomes Structural

Phill Rosen

January 28, 2026

min read

Why Most Agent Architectures Plateau and What Comes After

Every generation of software believes it has finally escaped the limits imposed by structure. This is usually followed, a few years later, by the rediscovery that structure was never the enemy - only the kind that couldn't evolve.

We are in such a moment now.

The rise of large language models and agent frameworks has made it possible to build systems that appear intelligent atop almost any data substrate. Documents, logs, warehouses, SaaS exhaust - everything can be embedded, retrieved, summarized, reasoned over. The demos are compelling. The early wins are real.

And yet, embedded in most of these architectures is a quiet, consequential assumption: that meaning can be inferred indefinitely at runtime from data that was never designed for machine reasoning.

This assumption does not cause systems to collapse. It causes something more subtle, and more expensive. It causes them to plateau.

The majority of AI initiatives that stall do not do so because models are insufficient. They stall because the systems underneath them were designed for storage, retrieval, and human interpretation - not for reasoning. Intelligence is being asked to emerge from substrates that do not encode what things are, only what they contain.

The Model Fixation

Enterprise AI discourse today is overwhelmingly model-centric. Organizations debate foundation models, fine-tuning strategies, agent orchestration patterns. These are legitimate concerns. But they share a deeper premise: that the data layer is largely a solved problem, and that intelligence is something you add on top of existing information architecture.

This premise is convenient. It is also wrong.

When that happens, every act of reasoning begins with interpretation. And interpretation, repeated often enough, becomes a tax.

The Plateau Pattern

Most organizations building agentic systems today start from a rational place: we already have the data. It lives across databases, warehouses, document stores, and APIs. With embeddings and sufficiently capable models, semantics can be reconstructed on demand. Why re-architect the foundation when inference seems to work?

In early phases, this approach delivers genuine value. Agents answer questions, summarize reports, stitch together workflows across silos. The system feels intelligent because, in constrained contexts, it is.

But as scope expands - more users, more data, higher stakes - the same systems begin to exhibit a familiar pattern: costs rise nonlinearly as interpretation is repeated instead of reused, reliability degrades at the edges where meaning is implied rather than defined, governance becomes retrospective rather than structural, and determinism is reintroduced manually through brittle rules and human review.

What emerges is not a single breaking point, but a slow accumulation of compensating mechanisms: prompt discipline, schema hints, guardrails, evaluation harnesses, escalation paths. Each is reasonable in isolation. Together, they signal the same underlying truth.

The system is trying to reason over a world whose meaning was never resolved.

A Concrete Example: Preferences vs. Constraints

Consider a client record in wealth management.

In a typical AI-enabled system, client preferences might be stored as text: "Client prefers ESG investments, is concerned about concentration risk, and wants to maintain liquidity for a potential real estate purchase in 18-24 months."

Every downstream use of this information - portfolio construction, trade validation, risk analysis - requires a language model to parse natural language, infer intent, and hope that interpretation aligns with what was meant.

A semantically explicit approach looks different. Investment constraints can be structured as typed objects with explicit parameters for ESG screening thresholds, concentration limits, and liquidity reserves with specific amounts, time horizons, and purposes.

The difference is not aesthetic. It is operational.

Validating a proposed trade against these constraints in the second system requires traversal, not interpretation. The result is deterministic, fast, and explainable. In the first system, the same operation requires inference, introduces latency, and produces probabilistic outputs that must be wrapped in confidence thresholds and fallback logic.

One system reasons over meaning. The other reasons about what to do.

The Interpretation Tax

Large language models reason exceptionally well - when meaning is clear. Every step required to establish that meaning introduces overhead and error surface. Disambiguating whether "account" refers to a client account, a custody account, or a ledger account is not free. It is work the system must repeat, again and again.

This interpretation tax has three components.

Computational cost. Each inference consumes tokens, requires model calls, and adds latency. At scale, these costs dominate.

Error propagation. Inference is probabilistic. Chained inferences compound uncertainty. You could model this as multiplication—95% accuracy per step, five steps, 77% reliability at the end—and that arithmetic feels satisfying in the way that wrong-but-tidy things often do. The actual dynamics are weirder.[^1]

What the research shows is that multi-step reasoning systems don't just get less confident as they go. They drift. Early in a chain, most of the uncertainty is local—the system might be unsure about *this* step. But as steps accumulate, a different kind of uncertainty starts to dominate: uncertainty inherited from upstream, carried forward through dependencies between decisions. The system isn't merely compounding error. It's compounding *distance from the original question*. By step five, you're not just less sure of your answer. You may be answering something else entirely.

Iteration friction. When meaning is inferred rather than structural, debugging becomes archaeological. You are reconstructing why a model interpreted something a certain way instead of inspecting a deterministic state.

Much of what is called "data preparation" in AI projects is, in reality, the manual repayment of this tax - made permanent by architecture.

Why Better Models Don't Eliminate the Problem

A reasonable objection arises here: models are improving. Context windows grow. Tool use becomes more reliable. Structured output improves. Why assume future agents won't simply infer what today's systems cannot?

They will - to a point.

Model progress reduces the interpretation tax, but it does not remove it. Inference remains probabilistic. Costs remain variable. Auditability remains external to the system's logic. And as models grow more capable, the surface area over which ambiguity can hide grows with them.

The question is not whether a sufficiently advanced model can infer meaning from poorly structured data. It can.

The question is whether an organization can afford to do so repeatedly, at scale, under regulatory and operational constraints - while expecting intelligence to compound rather than reset with each interaction.

Inference is powerful. It is not a foundation.

Runtime Semantics vs. Structural Semantics

This is the real divide between AI-enabled and AI-native systems.

Most agent architectures today resolve meaning at runtime. Semantics are reconstructed on demand through retrieval, embeddings, and model reasoning. Intelligence exists, but it is ephemeral - rebuilt each time, sensitive to prompts, context, and subtle shifts in data.

AI-native systems resolve semantics at design time for the domains that matter. Not everything is rigid. Not everything is typed. But the core entities, relationships, constraints, and permissions that intelligence depends on are made explicit and machine-resolvable.

This does not reduce flexibility. It enables accumulation.

When meaning is structural, reasoning can be deterministic where it must be, probabilistic where it should be, auditable by construction rather than inspection, and governed as part of execution rather than layered on afterward.

Intelligence stops being something the system occasionally performs and becomes something it can sustain.

Governance as Architecture, Not Oversight

Semantic architecture transforms governance from a monitoring function into an emergent property of the system.

In AI-enabled systems, governance is reactive. Models are deployed, behavior is observed, controls are added to catch problems after they occur. Access control lives at the interface layer. Auditability is achieved through logging.

In AI-native systems, governance is intrinsic. Relationship-level permissions emerge when relationships are explicit in the schema. Compliance constraints become properties of valid data, enforced at write time. Provenance and lineage are encoded structurally, not reconstructed after the fact.

As regulatory expectations evolve toward explaining not just what a system did but why, this distinction matters. Systems that reason over explicit semantics can provide that explanation structurally. Systems that reason over inferred meaning cannot - they can only explain probabilities.

The Honest Tradeoffs

Semantic architecture is not free.

It requires upfront design investment, slower early iteration, and real migration effort for existing systems. These are non-trivial costs, and pretending otherwise would be dishonest.

The question is not whether these costs exist. It is whether the alternative - continuously paying the interpretation tax while hoping model improvements compensate for architectural ambiguity offers a better long-term return.

In most enterprise and regulated contexts, it does not.

Where Astraeus Begins

Astraeus is built on a simple, unfashionable premise: intelligence does not emerge from models alone. It emerges when data, execution, and governance share the same semantic substrate.

Most organizations are attempting to graft agents onto architectures optimized for storage and human interpretation. This produces motion, but not momentum.

Astraeus starts one layer lower - not because agents are unimportant, but because without data designed for machine reasoning, agents can only ever be as reliable as the ambiguity beneath them.

And ambiguity, while tolerable in demos, does not scale into systems you can trust.

Partner with us

Sources

Duan et al., "UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making," arXiv:2506.17419 (2025); Zhao et al., "Uncertainty Propagation on LLM Agent," Proceedings of ACL 2025; Tao et al., "Revisiting Uncertainty Estimation and Calibration of Large Language Models," arXiv:2505.23854 (2025).