vLLM Semantic Router
Signal-driven decision routing for mixture-of-modality deployments.
It`s just a good thinking game 🎲
I study intelligent routing and system intelligence for LLMs.
My current work connects semantic routing with the Workload-Router-Pool architecture: using workload signals, policy loops, and coordinated serving pools to optimize quality, cost, latency, privacy, and safety together.
Latest log: The Second Half of LLM RoutingSignal-driven orchestration for model, tool, and policy selection.
Pragmatic infrastructure work across gateways, inference stacks, and control planes.
Writing, evaluation, and community work that reframes routing as a systems problem.
Signal-driven decision routing for mixture-of-modality deployments.
Manages Envoy Proxy as a standalone or Kubernetes-based application gateway.
Manages unified access to generative AI services built on Envoy Gateway.
Cost-efficient and pluggable infrastructure components for GenAI inference.
AI gateway and AI-native API gateway.
Connects, secures, controls, and observes services.
Observability console for Istio with service mesh.
A selection from recent work on semantic routing, agent behavior, and AI infrastructure. More papers are collected on Works.
Leading the community effort to define standards for AI Gateway in the Kubernetes ecosystem.
Representing and promoting Cloud Native Computing Foundation projects and values globally.
Advocating for open source adoption and best practices across the Asia-Pacific region.
Reviewing and selecting talks for one of the largest cloud-native conferences.