Xunzhuo
Liu

Building collective intelligence for LLM systems.

Discover
INTELLIGENT ROUTING / LLM SYSTEMS

Semantic Routing as Energy Infrastructure

I keep coming back to one question: if intelligence becomes an infrastructure resource, who decides where it should run?

In the old network, routing mostly moved packets. In the AI stack, the thing being moved is semantic work: intent, uncertainty, privacy risk, reasoning demand, memory, tool use, and action. Once traffic starts carrying meaning, the routing layer cannot stay a thin forwarding layer.

This is why tokens feel like the wrong primitive to optimize alone. Tokens are easy to count, but they are not equal. A token produced by a small open model, a specialized model, a frontier closed model, or a local edge model has a very different cost, latency, energy footprint, and risk profile. The deeper question is whether the system spent the right kind of intelligence in the right place.

  • Tokens are not equal. The cost gap between model paths can be orders of magnitude, so token economics has to ask whether each token was spent in the right intelligence tier.
  • Energy is the hidden unit of intelligence. A model is not only capability; it is hardware, power, latency, supply, and operating cost.
  • The durable problem is coordination. The future is not one frontier model serving everything, but a heterogeneous fabric of closed models, open models, tools, verifiers, memory, edge devices, and different generations of hardware.

That is the lens behind Semantic Routing as Energy Infrastructure. To me, semantic routing is the control layer that decides where semantic work should live, when to stay cheap and local, when to escalate, when to retrieve, when to verify, and when to spend the expensive intelligence. It is not just model selection. It is resource scheduling for intelligence.

My research direction is to make this layer real and measurable: workload signals, routing memory, policy languages, evaluation, cost-quality frontiers, privacy boundaries, and cross-layer scheduling. vLLM Semantic Router is one concrete step toward an open semantic control plane for AI systems: inspectable, composable, and shared by design.

Work 01

vLLM Semantic Router

Co-Founder

Signal-driven decision routing for mixture-of-modality deployments.

Work 02

Elephant Agent

Creator

Personal-model-first self-evolving AI agent that grows correctable understanding and gets curious at the user's pace.

Work 03

Inferoa

Builder

Inference-native tokenmaxxing agent harness for loop engineering.

Work 04

Envoy Gateway

Steering Committee and Maintainer

Manages Envoy Proxy as a standalone or Kubernetes-based application gateway.

Work 05

Envoy AI Gateway

Maintainer

Manages unified access to generative AI services built on Envoy Gateway.

Role 01

Agentic Intelligence Lab

Chair

Chairing the lab's research and community work on agentic AI, personal AI agents, and system intelligence.

Role 02

Kubernetes AI Gateway WorkGroup

Co-Chair

Leading the community effort to define standards for AI Gateway in the Kubernetes ecosystem.

Role 03

CNCF Ambassador

Fall 2023 Ambassador

Representing and promoting Cloud Native Computing Foundation projects and values globally.

Role 04

Linux Foundation APAC Open Source Evangelist

2024 Program

Advocating for open source adoption and best practices across the Asia-Pacific region.

Role 05

KubeCon Program Committee

KubeCon 2024 Hong Kong

Reviewing and selecting talks for one of the largest cloud-native conferences.