Live — 7 nodes, 368 models

One API. 368 models.
Zero middlemen.

OpenAI-compatible inference API running on 7 bare-metal DGX nodes. Your data never leaves our servers. No rate surprises. No per-token markups.

Get Your API Key Read the Docs

# Drop-in replacement for OpenAI. Change two lines.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.corriente.ai/v1",
    api_key="ck_your_key_here"
)

response = client.chat.completions.create(
    model="auto",  # Smart routing — or pick any of 368 models
    messages=[{"role": "user", "content": "Hello, Corriente"}]
)

print(response.choices[0].message.content)
    

Built different.

Smart Routing

Set model to "auto" and our Polymorphic Brain picks the best model for each query. Code goes to coding models. Math goes to math models. Reasoning goes to reasoning models. Automatically.

OpenAI Compatible

Change your base_url and api_key. That's it. Works with every OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI API format.

Bare Metal

7 NVIDIA DGX nodes. 128GB unified memory each. No shared cloud instances. No noisy neighbors. Your inference runs on dedicated hardware.

Your Data Stays Here

We don't log prompts. We don't train on your data. We don't sell your queries. Your conversations are yours. Period.

Function Calling

Full tool use and function calling support. Build agents that take actions, call APIs, and interact with external systems.

Conversation Memory

Built-in session management. Create a session, send messages, and the API remembers context across turns. No client-side history management needed.

368 models. One key.

From 0.5B fast models to 236B reasoning powerhouses. We've got the model for the job.

Code

Qwen2.5-Coder 32B, CodeStral, DeepSeek-Coder, CodeLlama, StarCoder

Reasoning

Qwen3 32B, DeepSeek-R1, Mistral Large 123B, Command-R+ 104B

Math

Qwen2.5 72B, Wizard-Math, MathStral, DeepSeek-Math

Vision

LLaVA 34B, Granite Vision, Moondream, MiniCPM-V

Multilingual

Aya-Expanse 32B, Salamandra, NLLB — Spanish, Portuguese, 100+ languages

Fast

Qwen3 30B-A3B (MoE), Gemma3 4B, Phi4-Mini — sub-second responses

Simple pricing.

No per-token billing. No surprise invoices. Pick a plan, use your models.

Trial

Free

250K tokens / 7 days

All 368 models
Auto routing
10 req/min
Community support

Start Free

Starter

$9/mo

1M tokens / month

All 368 models
Auto routing + sessions
60 req/min
Function calling
Priority support

Get Started

Power

$79/mo

10M tokens / month

Everything in Starter
200 req/min
10 concurrent requests
50K requests/day
Direct support line

Upgrade

Enterprise

Custom

Unlimited

Dedicated nodes
Custom model fine-tuning
SLA guarantee
On-prem deployment option
24/7 priority support

Talk to Us

Quick Start

curl

curl https://api.corriente.ai/v1/chat/completions \
  -H "Authorization: Bearer ck_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
    

Python

pip install openai

from openai import OpenAI
client = OpenAI(base_url="https://api.corriente.ai/v1", api_key="ck_your_key")

# Auto-route to best model
r = client.chat.completions.create(model="auto", messages=[{"role":"user", "content":"Hello"}])

# Or pick a specific model
r = client.chat.completions.create(model="qwen2.5-coder:32b", messages=[...])

# Use sessions for multi-turn
# POST /v1/sessions to create, then pass X-Session-ID header
    

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'https://api.corriente.ai/v1',
    apiKey: 'ck_your_key'
});

const response = await client.chat.completions.create({
    model: 'auto',
    messages: [{ role: 'user', content: 'Hello' }]
});
    

Endpoints

GET	/health	Health check (no auth)
GET	/v1/models	List all 368 models
POST	/v1/chat/completions	Chat inference (streaming supported)
POST	/v1/completions	Legacy text completions
POST	/v1/embeddings	Text embeddings
POST	/v1/sessions	Create conversation session
GET	/v1/sessions/:id	Get session history

One API. 368 models.Zero middlemen.