llmesh

distributed local
inference router

3,598 requests served · https://llmesh.example.com/v1

live simulation — requests routed across local AI workers

priority queuing
HIGH
8
NORM
5
LOW
3

Three priority tiers with FIFO within each lane. High-priority jobs are always dispatched first, with owner-affinity scheduling.

model aliases
# one name, many workers
"qwen"   qwen3-4b-instruct
         qwen3-14b-instruct

# owner affinity: prefers
# the requester's own GPU

Map one name across multiple models or machines. The scheduler picks the best available worker by affinity and load.

openai compatible
POST /v1/chat/completions
Authorization: Bearer sk-…

{
  "model": "llama-3.2",
  "stream": true
}

Drop-in for the OpenAI API. Works with Claude Code, Open WebUI, and any client that speaks the OpenAI protocol.