Token Flow — Simulated Cluster Throughput
0 tok/s
0 requests/min  ·  0 active
Total Tokens Served
512,847,293

MindRouter

Open-Source LLM Inference Load Balancer

Route inference requests across GPU backends with protocol translation, fair-share scheduling, and full observability. Supports OpenAI, Ollama, and Anthropic APIs.

Multi-Backend
Load Balancing
3 API Protocols
OpenAI · Ollama · Anthropic
GPU Cluster
Management
Fair-Share
Scheduling
Open Source
Apache 2.0

Everything You Need for LLM Inference

MindRouter provides a complete platform for running, managing, and integrating large language models — from a browser-based chat to full OpenAI, Ollama, and Anthropic API compatibility.

Multi-API Support

OpenAI, Ollama & Anthropic APIs

MindRouter speaks three API protocols natively. Use the OpenAI, Ollama, or Anthropic SDK of your choice — MindRouter translates between them and routes to the best available backend.

  • Full /v1/chat/completions OpenAI compatibility
  • Native Ollama API (/api/chat, /api/generate, /api/tags)
  • Anthropic Messages API (/v1/messages)
  • Streaming, tool calling & structured output across all APIs
  • Image, multimodal & embedding support
curl — Chat Completions
curl -X POST https://your-server.example.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-20b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'
Python Integration

Works with the OpenAI Python SDK

Change two lines of code — base_url and api_key — and your existing OpenAI Python code works instantly with MindRouter. No new libraries to learn.

  • Use openai Python package as-is
  • Async support with AsyncOpenAI
  • Streaming with server-sent events
  • Compatible with LangChain, LlamaIndex, etc.
example.py
from openai import OpenAI

client = OpenAI(
    base_url="https://your-server.example.com/v1",
    api_key="YOUR_API_KEY",
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "user",
         "content": "Write a haiku about GPUs."}
    ],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
Built-in Chat

Interactive Chat Interface

No setup required — just log in and start chatting. MindRouter includes a full-featured web chat with markdown rendering, code highlighting, file uploads, conversation history, and model switching.

  • Real-time streaming responses
  • Syntax-highlighted code blocks
  • File & image uploads
  • Persistent conversation history
  • Switch models mid-conversation
  • Thinking / reasoning mode
  • Web search integration
Deploy Your Own
MindRouter Chat
openai/gpt-oss-20b
Write a Python function that finds prime numbers using the Sieve of Eratosthenes.

Here's an efficient implementation:

def sieve(n):
    s = [True]*(n+1)
    s[:2] = False, False
    for i in range(2, int(n**0.5)+1):
        if s[i]:
            s[i*i::i] = [False]*len(s[i*i::i])
    return [i for i, p in enumerate(s) if p]

This runs in O(n log log n) time complexity...

Tool Calling

Native Function & Tool Calling

Build AI agents and agentic workflows. MindRouter supports the OpenAI tool calling protocol, enabling models to invoke functions, query databases, call APIs, and orchestrate complex multi-step tasks.

  • OpenAI-compatible tools parameter
  • Parallel tool calls supported
  • Works with LangChain agents
  • Structured JSON Schema outputs
tool_calling.py
response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user",
               "content": "What's the weather in Moscow, ID?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"}
                }
            }
        }
    }],
)
Translation Architecture

Universal Protocol Translation

MindRouter's innovative translation layer is the engine behind its flexibility. Rather than locking you into a single API format, it translates seamlessly between OpenAI, Ollama, and Anthropic protocols — on both the client and backend sides.

This means any tool, library, or application that speaks one of these protocols can connect to MindRouter — and MindRouter can route to any backend inference engine, regardless of its native API.

  • OpenAI-compatible — drop-in for any OpenAI SDK or tool
  • Ollama-compatible — works with Ollama clients and libraries
  • Anthropic-compatible — supports Claude SDK and tooling
  • vLLM & Ollama backends — route to any engine transparently
  • Dozens of models — deploy and serve cutting-edge models instantly
Translation Architecture
OpenAI
Client SDK
Ollama
Client SDK
Anthropic
Client SDK
Canonical Schema
MindRouter
Translate · Route · Balance
vLLM
High-throughput
Ollama
Flexible serving
Documentation & Blog

Comprehensive Docs & Blog

Everything you need to get started and go deep. The documentation covers API endpoints, authentication, model capabilities, rate limits, and integration guides. The blog features updates, tutorials, and best practices.

MindRouter Documentation
API Reference
Chat Completions · Embeddings · Models · Authentication
Getting Started
Quick Start Guide · Python SDK · cURL Examples · Rate Limits
Integrations
LangChain · LlamaIndex · Continue.dev · Open WebUI
Advanced
Tool Calling · Structured Output · Streaming · Multimodal
Institutional AI Sovereignty

Your AI. Your Hardware. Your Control.

Every model served by MindRouter runs on University-owned and University-managed GPUs. No data leaves institutional infrastructure. No third-party cloud providers process your prompts. No external APIs see your research, coursework, or sensitive data.

This isn't just a cost decision — it's a strategic one. Institutional AI sovereignty means full control over which models are deployed, how they behave, and where data flows.

Data Security
Prompts and responses never leave University infrastructure. Full FERPA and research data compliance.
Speed & Latency
On-premise GPUs eliminate network round-trips to distant cloud regions.
Power & Sustainability
Direct visibility into energy consumption per model and per request.
Model Control
Choose exactly which models to deploy, audit their behavior, and update on your own schedule.
Full Observability
Complete audit trail of every request. GPU metrics, token usage, and cost accounting.
Predictable Costs
No per-token API bills. Capital investment in GPUs with transparent, fixed operating costs.
University GPU Cluster
Node 1
4x GPU · gpt-oss-120b
ACTIVE
Node 2
4x GPU TP · qwen3.5-400b
ACTIVE
Node 3
4x GPU · gpt-oss-120b / 20b
ACTIVE
Node 4
2x GPU · Multimodal OCR
ACTIVE
On-premise · No cloud dependencies · Full institutional control
Telemetry & Monitoring

Full Cluster Observability

Monitor every aspect of your inference cluster in real time — from GPU utilization and power draw to token throughput and per-request audit trails.

GPU Telemetry

Utilization, memory, temperature, power draw, and fan speed per GPU with configurable time ranges (1h to 30d)

Live Throughput

Real-time token flow visualization, requests per minute, active request counts, and total tokens served

Audit & Export

Full request audit trail with filtering by user, model, status, and date — exportable as CSV, JSON, or JSON with content

Enterprise Infrastructure

Built for Production

MindRouter is designed for reliability, observability, and scale — load balancing across GPU backends with intelligent routing and fair-share scheduling.

Load Balancing

Intelligent routing across multiple GPU backends with health checks and failover

Fair-Share Scheduling

Weighted quotas, rate limiting, and fair access across users and groups

GPU Monitoring

Real-time GPU utilization, memory, temperature, and power metrics across the cluster

Protocol Translation

Seamless translation between OpenAI, Ollama, and vLLM API formats

Observability

Prometheus metrics, structured logging, request audit trail, and admin dashboards

Authentication

API key management, Azure AD SSO, group-based access control, and user quotas

Documentation

Comprehensive guides covering API endpoints, authentication, model capabilities, rate limits, Python SDK integration, tool calling, structured output, and more.

View Full Documentation

Open Source

MindRouter is open-source software released under the Apache 2.0 license. Clone the repo, deploy on your own infrastructure, and contribute back to the project.

View on GitHub Apache 2.0
terminal
$ git clone https://github.com/ui-insight/MindRouter.git
$ cd MindRouter
$ docker compose up -d --build
# MindRouter is now running at http://localhost:8000

Get in Touch

Questions about MindRouter? Interested in deploying it at your institution? Drop us a line.