distributed local
inference router

3,598 requests served · https://llmesh.example.com/v1

live simulation — requests routed across local AI workers

priority queuing

HIGH

NORM

LOW

Three priority tiers with FIFO within each lane. High-priority jobs are always dispatched first, with owner-affinity scheduling.

model aliases

# one name, many workers
"qwen"  → qwen3-4b-instruct
        → qwen3-14b-instruct

# owner affinity: prefers
# the requester's own GPU

Map one name across multiple models or machines. The scheduler picks the best available worker by affinity and load.

openai compatible

POST /v1/chat/completions
Authorization: Bearer sk-…

{
  "model": "llama-3.2",
  "stream": true
}

Drop-in for the OpenAI API. Works with Claude Code, Open WebUI, and any client that speaks the OpenAI protocol.

distributed localinference router