Skip to main content

§ Multi-model gateway · OpenRouter-backed · SOC 2 in progress

Every frontier model.One workspace.

One workspace for 50+ AI models. The router picks the right model for each prompt and instantly retries another if one goes down — so requests never fail. Send a prompt to several models at once, track spend per user and project, and get one bill at month's end.

For teams shipping AI features who are done juggling provider keys, dashboards, and invoices.

Compare GPT-4o, Claude 4.5 Sonnet, Gemini 2.5 Pro, DeepSeek V3 and 50+ LLMs side-by-side in one workspace.

Models on tap
0+
P50 fan-out
0ms
Free tier
10 total
~/azela · router.console
summarize this 80-page S-1 with citations
intentresearch·routefan-out · 3
  • anthropic/claude-4.5-sonnet212ms200 OK
  • openai/gpt-4o184ms200 OK
  • google/gemini-2.5-pro298ms200 OK
  • mistral/mistral-large-2501ms429
  • x-ai/grok-3247ms200 OK
  • meta/llama-3.3-405b612ms200 OK
synthesized · 1.42sstreaming

PRIMARY
claude-4.5
FALLBACK
gpt-4o
CHEAPEST
llama-3.3

§ 01 / PROVIDERS

One key.
Every provider.

One API for OpenAI, Anthropic, Google, DeepSeek, Mistral, Meta, Perplexity, Cohere, xAI, and Qwen.

Live status across every model graph we pull from — refreshed at the edge.

OpenAI184ms · 200
Anthropic212ms · 200
Google298ms · 200
Mistral262ms · 200
Meta612ms · 200
Perplexity401ms · 200
DeepSeek344ms · 200
Cohere289ms · 200
xAI247ms · 200
Qwen374ms · 200
OpenAI184ms · 200
Anthropic212ms · 200
Google298ms · 200
Mistral262ms · 200
Meta612ms · 200
Perplexity401ms · 200
DeepSeek344ms · 200
Cohere289ms · 200
xAI247ms · 200
Qwen374ms · 200
Qwen374ms · 200
xAI247ms · 200
Cohere289ms · 200
DeepSeek344ms · 200
Perplexity401ms · 200
Meta612ms · 200
Mistral262ms · 200
Google298ms · 200
Anthropic212ms · 200
OpenAI184ms · 200
Qwen374ms · 200
xAI247ms · 200
Cohere289ms · 200
DeepSeek344ms · 200
Perplexity401ms · 200
Meta612ms · 200
Mistral262ms · 200
Google298ms · 200
Anthropic212ms · 200
OpenAI184ms · 200

§ 02 / ROUTER

Decide which model at request time.

Smart LLM routing
and fallback.

Stop hard-coding model names into your app. The router scores the prompt, picks the cheapest model that can answer, and cascades to a fallback if the primary stalls.

Primitives
intent · route · fallback
Retry policy
1× same · then graph
Auth
X-Internal-Token
Logs
usage_logs · error_logs
router.trace
req_8f2c · live
  1. 01/5
    INGEST
    Prompt arrives at /chat/stream

    Cookie-signed JWT → BFF auth() → backendStream() proxies with X-Internal-Token and X-User-Id.

  2. 02/5
    CLASSIFY
    Smart-router scores intent

    Regex + heuristic gate: research · coding · image · doc-analysis · reasoning · simple. Returns a single intent + a confidence band.

  3. 03/5
    ROUTE
    Primary model selected

    Plan-rank gate, entitlements check, daily/monthly token budgets. The cheapest acceptable model that matches the intent wins.

  4. 04/5
    FALLBACK
    Cascading retry on 5xx

    One retry on the same model, then current_model.fallback_model_id. 401/403 skip the retry — bad keys fail loudly.

  5. 05/5
    STREAM
    SSE tokens piped, not buffered

    start · delta · metadata · error · fallback · done. Every attempt logged to usage_logs; every failure to error_logs.

scroll to trace

§ 03 / SWARM

Six specialists. One supervisor.

Multi-agent
AI swarms.

Fan a single brief out to a topology of specialist agents. The supervisor watches each return, resolves conflicts, and writes a single synthesized answer back to the stream.

  • Topologystar · ring · pipeline · custom
  • Concurrencyup to 50 agents (premium)
  • Synthesissupervisor reconciles, emits one stream
  • Observabilityper-agent token + cost ledger
swarm.topology · star6 active · 1 supervisor
SUPERVISORRESEARCHCODEANALYZEWRITEREVIEWRENDER
DISPATCH
0.08s
PARALLEL
6 agents
SYNTH
1.42s

§ 04 / COMPARE

Same prompt. Different minds.

Compare AI models
side by side.

Multi-LLM Side-by-Side Comparison — fan one prompt to GPT-4o, Claude 4.5 Sonnet, Gemini 2.5, and DeepSeek at once.

Fan a prompt out to two or more models in parallel. We render both streams as they arrive, count tokens, and surface the cost delta so you can pick on evidence — not on vibes.

PROMPTExplain how a transformer handles long-context retrieval.fan-out · 2
Aanthropic / claude-4.5-sonnet278 tps
tokens
2
cost
$0.0084
status
streaming
Bopenai / gpt-4o194 tps
tokens
2
cost
$0.0061
status
streaming
DELTA0%cheaper · B
AGREEMENT0%semantic overlap
FIRST TOKEN0msB · 212ms A

§ 05 / SURFACE

The tools the model can touch.

Chat with files, URLs,
and smarter prompts.

Three primitives that turn a chat box into a workspace — ingest a file, ingest a URL, rewrite a prompt. Each one is exposed as a first-class tool the router can call.

FILE CHAT01

Ingest any document, ask anything.

PDF · DOCX · CSV · PPT · MD. Chunked, embedded with pgvector, cited inline.

› parse(invoice.pdf)
→ 42 chunks · 14.2k tokens
› embed(text-embed-3)
→ ok · 184ms
› ask("what changed YoY?")
→ cited p.12, p.18
Formats
pdf · docx · csv · pptx · md · txt
Limit
200MB / 2k pages per file
Citations
section + page anchor
URL INGEST02

Hand it a link. Get back a brief.

Pull HTML, strip chrome, follow citations, render a structured note with sources.

› fetch(arxiv.org/2402.12354)
→ 200 · 8.4kb
› extract(refs)
→ 42 citations
› brief(mode: academic)
→ 680 words · 12 cites
Depth
follow 1 hop · max 20 pages
Modes
fast · deep · academic
Output
outline · brief · markdown
PROMPT ENHANCER03

Turn a sketch into a brief.

Rewrites vague prompts using established prompt-engineering primitives — role, constraints, format.

› in: "write me a plan"
› out: act as a senior PM …
use STAR · 600 words
return: markdown · h2/h3
→ +312 tokens · clarity +0.7
Techniques
role · cot · few-shot · constraints
Modes
tighten · expand · technical
Diff
side-by-side vs original

§ 06 / SPEC

A datasheet, not a brochure.

Every feature,
in plain terms.

Autonomous Agent Gateway, Token Usage Analytics, Image Studio, Video Creation, Multi-LLM Side-by-Side Comparison.

The full feature surface, named in the same terms the backend uses. If the system does it, it's listed. If it doesn't, it isn't.

01
Intent-classified routing
regex + heuristic · 7 intents

Every prompt is scored for what it actually needs — research, coding, analysis. The cheapest model that can handle it wins, so you're not paying frontier prices for simple work.

02
Cascading fallback
1× same · then graph

If a model errors, we retry it once, then automatically switch to a backup. Requests don't fail just because one provider is having a bad day.

03
Streamed SSE, piped
start · delta · done

Responses stream in word by word as the model generates them — no waiting for the full answer to load.

04
Per-user isolation
X-User-Id at the DB layer

Every user's data is isolated at the database layer. No user can ever see another's prompts, files, or history.

05
Token accounting
per user · project · request

Set daily and monthly spend limits per user or project — enforced before each request. Every call and error is logged for a full audit trail.

06
Plan-rank model access
free → starter → pro → premium

Premium models gate by plan rank. Feature flags govern compare, council, search, file-chat, output-studio. No surprises at checkout.

07
Projects + knowledge
chat × files × instructions

Bundle chats, uploaded docs, and a system prompt into a Project. Every new conversation inherits the surface area you've already built.

08
Output Studio
refine · diff · export

Take any response into a structured editor — tone passes, length passes, format passes — then export markdown, html, or docx.

§ 07 / PLANS

One bill.
Every model.

AzelaAI pricing — Free, Pro, and Premium plans for the multi-LLM workspace. From $0 to $39 per month.

−21%
FREE
Kick the tires.
$0

one time · no expiry

  • ·10 messages total · lifetime
  • ·Access to 12 baseline models
  • ·1 active project
  • ·File chat · 25MB / file
  • ·Community support
Start free
recommended
PRO
Daily driver.
$15/ mo

billed annually

  • +unlimited messages
  • +all 50+ frontier models
  • +Compare · fan-out to 4
  • +Agent Swarm · up to 8 agents
  • +File chat · 200MB / file
  • +Output Studio · refine + diff
Start Pro
PREMIUM
Heavy lifting.
$32/ mo

billed annually

  • +Swarms up to 50 agents
  • +API access · per-key budgets
  • +Deep research · long-form
  • +Priority routing
  • +SOC 2 ready environments
Start Premium
ENTERPRISE
Bring your own everything.
Custom

scoped per deployment

  • +BYO API keys · OpenAI · Anthropic · Azure
  • +Dedicated tenant · SSO · SCIM
  • +Audit logs · DPA · MSA
  • +Custom model graphs
  • +White-glove migration
Talk to us

All plans include token accounting, plan-rank model gating, and per-request logging. Free tier is lifetime — no card, no renewal. Paid plans: annual billed once, cancel anytime, no pro-rated clawback.

§ 08 / FAQ

The questions
that come up.

Anything else, write us at info@azelaai.com — we read every line.

Yes. AzelaAI lets you run GPT-4o, Claude 4.5 Sonnet, Gemini 2.5, DeepSeek, Mistral, Llama and 50+ other frontier models inside a single workspace. Send one prompt to all of them at once with Compare Mode, switch between them mid-conversation, or let the smart-router pick the best model for the task. One login, one bill, every model.

§ 09 / END OF SHEET

Stop choosing.
Start routing.

Open an account, point a prompt at the router, watch every frontier model take a swing. Free tier needs no card — Pro is ready when the work catches up.

free tier · 10 messages total · no credit card