VDF AI Router — The Self-Evolving LLM Router for Sovereign AI

Every request to the best model — by quality, cost, latency and energy. On your infrastructure.

SEEMR Learning Engine Multi-Objective Routing Energy & CO₂ Aware 100% On-Prem
Engine:

SEEMR — Self-Evolving Model Router, learning from every run

Deployment:

Docker on your VMs, Kubernetes, or bare metal — air-gap capable

Outcome:

40–60% inference cost reduction with built-in governance

THE PROBLEM

One Model for Everything Is Expensive

The default enterprise AI stack sends every request to the same flagship model.

Overpaying on Every Request

Flagship models answer trivial questions. Most enterprise prompts don't need the biggest model — but static stacks send them there anyway.

Static Routing Goes Stale

Model quality, price and latency drift weekly. New models ship monthly. Hand-written routing rules are outdated the day they are deployed.

No Governance, No Answers

Why did this request hit that model? No approved-model lists, no audit trail, no energy visibility — and regulators are starting to ask.

LLM spend scales with usage — routing intelligence is the only lever that scales with it.

HOW IT WORKS

One Router. Every Model.

A drop-in decision layer between your apps and all models — born inside the VDF AI platform, now standalone.

1
Routes per request

Picks the best model for each call across local and cloud catalogs — in milliseconds, before the LLM is invoked.

2
Learns from outcomes

SEEMR observes every run — quality, latency, failures, energy — and continuously improves its choices.

3
Enforces your policy

Allow/deny lists, regulated-domain approvals, air-gapped local-only mode. Compliance is built into the routing path.

VDF AI Router — request flow
Your Apps chat · agents · RAG
VDF AI Router SEEMR engine
Local Ollama · custom
Cloud GPT · Claude · …
Policy checked Decision logged Energy metered

Failover-ready: every decision returns an ordered candidate list, not a single bet.

ROUTING PIPELINE

Five Layers. One Decision. Milliseconds.

Every request flows through a policy-first, learning-last routing pipeline.

1 · Policy

Pinned models, regulated-domain approvals, allow/deny lists, local-only enforcement.

2 · Learned Shortlist

Embedding retrieval surfaces models that performed best on similar prompts.

3 · Rules & Filters

Capability match, on-prem preference, latency limits (TTFT / tokens-sec), energy budget.

4 · Objective Scoring

Weighted quality + cost + latency + energy. Modes: eco · balanced · max-quality.

5 · SEEMR Selection

LinUCB contextual bandit picks the winner — and keeps learning from the outcome.

Output — RoutingDecision: selected model + human-readable reason + ordered failover candidates + per-model scores. Fully auditable, every time.

Policy always wins: learned layers can never override compliance constraints.

LEARNING ENGINE

SEEMR — The Router That Improves Itself

Self-Evolving Model Router: a production contextual-bandit learning system.

LinUCB Contextual Bandit

Per-model linear estimates over rich context (domain, node type, capability, policy). Optimism under uncertainty balances exploring new models vs. exploiting proven ones.

Learns From Every Run

Quality scores, latencies, failures and energy from each execution feed back into the bandit. Quality drops trigger autonomous rerouting — no human retuning.

Challenger Routing

A configurable ~2% of traffic also runs a challenger model. Pairwise comparisons escape local optima and keep the champion honest.

Hybrid Offline Priors

Bandits can be initialized from offline-trained state and historical traces — smart from day one, not cold-started in production.

Five learning kinds in production: model routing · tool selection · agent selection · and more.

GOVERNANCE

Compliance Is a Routing Rule

Layer 1 of the pipeline — policy decisions can never be overridden by learning.

Regulated Domains

Flag a domain as regulated and only explicitly approved models are ever considered. EU AI Act and sector rules become enforcement, not documentation.

Allow / Deny / Pin

Organization-wide model allowlists and denylists. Pin a specific model per workload when determinism is required.

Air-Gap Mode

Disable external APIs entirely: routing is restricted to local models. Zero bytes leave your network — verifiable by design.

Explainable Decisions

Every decision is returned with its reason, candidate list and scores — an audit trail regulators and CISOs can actually read.

GDPR · EU AI Act · DORA-ready — sovereignty is the default, not an add-on.

MULTI-OBJECTIVE ROUTING

Route by What Actually Matters

Quality, cost, latency and energy — weighted to your priorities, per workload.

ECO

Energy-first. Favors efficient, local models.

Quality
20%
Cost
20%
Latency
10%
Energy
50%
BALANCED

The sensible default for mixed workloads.

Quality
40%
Cost
30%
Latency
20%
Energy
10%
MAX QUALITY

Critical tasks where output quality dominates.

Quality
70%
Cost
10%
Latency
15%
Energy
5%

Live latency intelligence. Rolling p50/p95, time-to-first-token and timeout rates per model feed routing in real time.

Energy & CO₂ per call. Watt-hours and gCO₂e estimated for every request — routing data and ESG reporting in one.

RESILIENCE

No Single Model Can Take You Down

Routing returns an ordered candidate list — resilience is built into every decision.

Automatic Failover

If the chosen model errors or times out, the engine walks the ranked candidate list — up to 5 models — with no re-routing round-trip.

Runtime Health Probes

Local runtimes (e.g. Ollama) are probed continuously. If local is down, routing shifts to permitted cloud models instead of failing.

Graceful Degradation

Capability requirements relax rather than hard-fail; per-node fallback models guarantee an answer path even under strict policy.

Decision = ordered candidates
1 · local/llama-3.1-70bselected
2 · gpt-4o-ministandby
3 · claude-3-haikustandby
4 · mistral-smallstandby

On failure → next candidate, instantly

MODEL REGISTRY

Every Model. One Catalog.

LLMFolio — the model registry that powers the router.

Cloud models

Hundreds of models via OpenRouter — GPT, Claude, Gemini, Llama, Mistral and more — behind one API key and one bill.

Local models

Ollama and custom on-prem deployments registered alongside cloud models — first-class citizens, preferred by routing when available.

Per-model intelligence
Capabilitiesanalysis · code · embeddings · vision
Cost$ per 1k tokens, live from catalog
Latencyrolling p50 / p95 / TTFT overlays
EnergyWh + gCO₂e coefficients per model
Complianceregulated-approved flag, priority tier

New model on the market? Register it once — SEEMR starts learning where it wins.

WHO IT'S FOR

Built for Companies

Wherever inference runs at scale, routing is the margin.

Enterprise On-Prem AI
  • One governed routing layer shared by every internal AI app — chat, agents, RAG.
  • Sovereignty by default: air-gap mode, approved-model enforcement, full audit trail.
  • Predictable spend: small tasks to small models, flagships only where they earn it.
  • EU AI Act / GDPR alignment built into the request path.
Data Centers & AI Providers
  • Energy is your largest variable cost — energy-aware routing converts watts into margin.
  • Offer differentiated 'smart inference' tiers on top of your GPU fleet.
  • Per-tenant policies, catalogs and audit — multi-tenant governance out of the box.
  • Consultancies: ship a routing practice, not a slide deck — deploy in a day.

Standalone product — or the routing core of the full VDF AI platform.

COMPARISON

Static Gateways Route. SEEMR Learns.

Capability VDF AI Router LLM API Gateways Cloud Router Services DIY Rules
Self-learning routing (contextual bandit) ✓ LinUCB, online ✗ static rules △ black box
Energy & CO₂-aware decisions ✓ per-call Wh + gCO₂e
Fully on-prem / air-gap ✓ by design △ self-host only ✗ cloud only
Governance: approved models, audit trail ✓ policy layer △ basic ACLs △ manual
Explainable decisions (reason + scores) ✓ every request
Ordered failover candidates ✓ up to 5 models △ retry lists

Categories shown for orientation; detailed feature comparisons available on request.

THE BUSINESS CASE

Routing Intelligence Pays for Itself

40–60%
Inference Cost Reduction

by matching every task to the cheapest capable model

~2%
Exploration Budget

challenger traffic keeps routing optimal as models evolve

5
Failover Candidates

per decision — availability without over-provisioning

Plus the savings nobody measures: no hand-tuned routing rules to maintain, no model-migration projects, and energy reporting your ESG team will ask for anyway.

DEPLOYMENT

Runs Where You Run

From laptop pilot to air-gapped data center — same router, same API.

Containerized

Docker-packaged Python service. Deploy on your VMs, Kubernetes, or bare metal next to your GPUs.

Simple Integration

REST API + Python SDK. Point your apps at the router; it speaks to Ollama locally and OpenRouter for cloud.

Progressive Activation

Every advanced layer is feature-flagged. Start with policy + rules; switch on learning, energy and challengers when ready.

Standalone or Platform

Use it as a single routing product — or as the decision core of the full VDF AI platform (Networks · Agents · Chat · Data).

Professional services by SysArt for custom integrations and compliance onboarding.

FAQ

VDF AI Router Questions

An LLM router is a decision layer that sits between your applications and your language models. Instead of sending every request to one hard-coded flagship model, the router picks the best model for each call — across local and cloud catalogs — based on quality, cost, latency, and energy. VDF AI Router makes that decision in milliseconds, before the LLM is invoked, and returns an ordered failover list rather than a single bet.

SEEMR (Self-Evolving Model Router) is the learning engine inside VDF AI Router. It is a production contextual-bandit system (LinUCB) that observes quality scores, latencies, failures, and energy from every execution and continuously improves its routing choices. A configurable ~2% of traffic also runs a challenger model, so the champion is constantly re-validated as models and prices drift — no human retuning required.

Yes. VDF AI Router is a Docker-packaged Python service that deploys on your VMs, Kubernetes, or bare metal. Air-gap mode disables external APIs entirely and restricts routing to local models — zero bytes leave your network, verifiable by design. Sovereignty is the default, not an add-on: approved-model enforcement, allow/deny lists, and a full audit trail are built into the routing path.

Typical deployments see a 40–60% inference cost reduction by matching every task to the cheapest capable model. Flagship models stop answering trivial questions; small tasks go to small models, and expensive flagships are used only where they earn it. Energy- and CO₂-aware routing adds further savings for organizations that run their own GPU fleets.

Compliance is layer 1 of the routing pipeline, and policy decisions can never be overridden by learning. You can flag a domain as regulated so only explicitly approved models are ever considered, pin models per workload, and enforce organization-wide allow/deny lists. Every decision is returned with its reason, candidate list, and scores — an audit trail regulators and CISOs can actually read. GDPR, EU AI Act, and DORA alignment are built into the request path.

Both. You can use it as a standalone routing product behind a REST API and Python SDK, or as the decision core of the full VDF AI platform (Networks, Agents, Chat, Data). It was born inside the VDF AI platform and is now available standalone — same router, same API, from laptop pilot to air-gapped data center.

Route Smarter. Own Your Stack.

VDF AI Router — the self-evolving routing layer for sovereign AI. Book a demo to see SEEMR routing live, or start a pilot on your own infrastructure today.