Xunzhuo Liu

01. Semantic Routing

In the old network, routing mostly moved packets. In the AI stack, the thing being moved is semantic work: intent, uncertainty, privacy risk, reasoning demand, memory, tool use, and action. Once traffic starts carrying meaning, the routing layer cannot stay a thin forwarding layer.

This is why tokens feel like the wrong primitive to optimize alone. Tokens are easy to count, but they are not equal. A token produced by a small open model, a specialized model, a frontier closed model, or a local edge model has a very different cost, latency, energy footprint, and risk profile. The deeper question is whether the system spent the right kind of intelligence in the right place.

Tokens are not equal. The cost gap between model paths can be orders of magnitude, so token economics has to ask whether each token was spent in the right intelligence tier.
Energy is the hidden unit of intelligence. A model is not only capability; it is hardware, power, latency, supply, and operating cost.
The durable problem is coordination. The future is not one frontier model serving everything, but a heterogeneous fabric of closed models, open models, tools, verifiers, memory, edge devices, and different generations of hardware.

That is the lens behind Semantic Routing as Energy Infrastructure. To me, semantic routing is the control layer that decides where semantic work should live, when to stay cheap and local, when to escalate, when to retrieve, when to verify, and when to spend the expensive intelligence. It is not just model selection. It is resource scheduling for intelligence.

My research direction is to make this layer real and measurable: workload signals, routing memory, policy languages, evaluation, cost-quality frontiers, privacy boundaries, and cross-layer scheduling. vLLM Semantic Router is one concrete step toward an open semantic control plane for AI systems: inspectable, composable, and shared by design.

02. Selected Works

Work 01

vLLM Semantic Router

Co-Founder

Signal-driven decision routing for mixture-of-modality deployments.

GitHub Website Publications

Work 02

Elephant Agent

Creator

Personal-model-first self-evolving AI agent that grows correctable understanding and gets curious at the user's pace.

Website Paper GitHub

Work 03

Inferoa

Builder

Inference-native tokenmaxxing agent harness for loop engineering.

Website GitHub npm

Work 04

Envoy Gateway

Steering Committee and Maintainer

Manages Envoy Proxy as a standalone or Kubernetes-based application gateway.

GitHub Website

Work 05

Envoy AI Gateway

Maintainer

Manages unified access to generative AI services built on Envoy Gateway.

GitHub Website

More works

03. Research Highlights

Research Publication 2026

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

SIGIR 2026 Industry Track

Paper

Research Publication 2026

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference

arXiv Technical Report

Paper

Position Paper 2026

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

arXiv Technical Report

Paper

Vision Paper 2026

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

arXiv Technical Report

Paper

Research Publication 2025

When to Reason: Semantic Router for vLLM

NeurIPS - MLForSys

Paper

Research Publication 2026

Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL

arXiv Technical Report

Paper

More papers

04. Community Roles

Role 01

Agentic Intelligence Lab

Chair

Chairing the lab's research and community work on agentic AI, personal AI agents, and system intelligence.

Open

Role 02

Kubernetes AI Gateway WorkGroup

Co-Chair

Leading the community effort to define standards for AI Gateway in the Kubernetes ecosystem.

Role 03

CNCF Ambassador

Fall 2023 Ambassador

Representing and promoting Cloud Native Computing Foundation projects and values globally.

Role 04

Linux Foundation APAC Open Source Evangelist

2024 Program

Advocating for open source adoption and best practices across the Asia-Pacific region.

Role 05

KubeCon Program Committee

KubeCon 2024 Hong Kong

Reviewing and selecting talks for one of the largest cloud-native conferences.

More roles

XunzhuoLiu

Semantic Routing as Energy Infrastructure

vLLM Semantic Router

Elephant Agent

Inferoa

Envoy Gateway

Envoy AI Gateway

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

When to Reason: Semantic Router for vLLM

Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL

Agentic Intelligence Lab

Kubernetes AI Gateway WorkGroup

CNCF Ambassador

Linux Foundation APAC Open Source Evangelist

KubeCon Program Committee

Xunzhuo
Liu