Live — 7 nodes, 779 models

One API. 779 models.
Zero middlemen.

OpenAI-compatible inference API running on 7 bare-metal DGX nodes. Your data never leaves our servers. No rate surprises. No per-token markups.

Get Your API Key Read the Docs
779
Models
7/7
Nodes Online
100%
Uptime
HTTPS
Encrypted
# Drop-in replacement for OpenAI. Change two lines. from openai import OpenAI client = OpenAI( base_url="https://api.corriente.ai/v1", api_key="ck_your_key_here" ) response = client.chat.completions.create( model="auto", # Smart routing — or pick any of 779 models messages=[{"role": "user", "content": "Hello, Corriente"}] ) print(response.choices[0].message.content)

Built different.

Smart Routing

Set model to "auto" and our Polymorphic Brain picks the best model for each query. Code goes to coding models. Math goes to math models. Reasoning goes to reasoning models. Automatically.

OpenAI Compatible

Change your base_url and api_key. That's it. Works with every OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI API format.

Bare Metal

7 NVIDIA DGX nodes. 128GB unified memory each. No shared cloud instances. No noisy neighbors. Your inference runs on dedicated hardware.

Your Data Stays Here

We don't log prompts. We don't train on your data. We don't sell your queries. Your conversations are yours. Period.

Function Calling

Full tool use and function calling support. Build agents that take actions, call APIs, and interact with external systems.

Conversation Memory

Built-in session management. Create a session, send messages, and the API remembers context across turns. No client-side history management needed.

779 models. One key.

From 0.5B fast models to 236B reasoning powerhouses. We've got the model for the job.

Code

Qwen2.5-Coder 32B, CodeStral, DeepSeek-Coder, CodeLlama, StarCoder

Reasoning

Qwen3 32B, DeepSeek-R1, Mistral Large 123B, Command-R+ 104B

Math

Qwen2.5 72B, Wizard-Math, MathStral, DeepSeek-Math

Vision

LLaVA 34B, Granite Vision, Moondream, MiniCPM-V

Multilingual

Aya-Expanse 32B, Salamandra, NLLB — Spanish, Portuguese, 100+ languages

Fast

Qwen3 30B-A3B (MoE), Gemma3 4B, Phi4-Mini — sub-second responses

Simple pricing.

No per-token billing. No surprise invoices. Pick a plan, use your models.

Trial

Free
250K tokens / 7 days
  • All 779 models
  • Auto routing
  • 10 req/min
  • Community support
Start Free

Brook

$9.99/mo
250K tokens / month
  • All 779 models
  • Auto routing
  • 20 req/min
  • Quantum-safe keys
Get Started

River

$399.99/mo
10M tokens / month
  • Everything in Creek
  • 200 req/min
  • 10 concurrent requests
  • 50K requests/day
  • Direct support line
Upgrade

Current

$999.99/mo
25M tokens / month
  • Everything in River
  • 400 req/min
  • 25 concurrent requests
  • 100K requests/day
Upgrade

Surge

$1,999/mo
50M tokens / month
  • Everything in Current
  • 600 req/min
  • 50 concurrent requests
  • 200K requests/day
Upgrade

Wave

$4,999/mo
150M tokens / month
  • Everything in Surge
  • 1,000 req/min
  • 100 concurrent requests
  • Dedicated support
Upgrade

Tide

$14,999/mo
500M tokens + metered
  • Everything in Wave
  • 2,000 req/min
  • 200 concurrent
  • $0.50/M overage tokens
Talk to Us

Tsunami

$39,999/mo
1B tokens + metered
  • Everything in Tide
  • 5,000 req/min
  • 500 concurrent
  • $0.40/M overage tokens
Talk to Us

Abyss

$69,999/mo
2.5B tokens + metered
  • Everything in Tsunami
  • 10,000 req/min
  • 1,000 concurrent
  • $0.30/M overage tokens
Talk to Us

Ocean

$10M/mo
Metered — $10.00/M tokens
  • Unlimited everything
  • Dedicated nodes
  • Custom model fine-tuning
  • SLA guarantee
  • 24/7 priority support
  • $120M/year partnership
Talk to Us

Quick Start

curl

curl https://api.corriente.ai/v1/chat/completions \ -H "Authorization: Bearer ck_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'

Python

pip install openai from openai import OpenAI client = OpenAI(base_url="https://api.corriente.ai/v1", api_key="ck_your_key") # Auto-route to best model r = client.chat.completions.create(model="auto", messages=[{"role":"user", "content":"Hello"}]) # Or pick a specific model r = client.chat.completions.create(model="qwen2.5-coder:32b", messages=[...]) # Use sessions for multi-turn # POST /v1/sessions to create, then pass X-Session-ID header

JavaScript

import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.corriente.ai/v1', apiKey: 'ck_your_key' }); const response = await client.chat.completions.create({ model: 'auto', messages: [{ role: 'user', content: 'Hello' }] });

Endpoints

GET /health Health check (no auth)
GET /v1/models List all 779 models
POST /v1/chat/completions Chat inference (streaming supported)
POST /v1/completions Legacy text completions
POST /v1/embeddings Text embeddings
POST /v1/sessions Create conversation session
GET /v1/sessions/:id Get session history