Live — 7 nodes, 368 models

One API. 368 models.
Zero middlemen.

OpenAI-compatible inference API running on 7 bare-metal DGX nodes. Your data never leaves our servers. No rate surprises. No per-token markups.

Get Your API Key Read the Docs
368
Models
7/7
Nodes Online
100%
Uptime
HTTPS
Encrypted
# Drop-in replacement for OpenAI. Change two lines. from openai import OpenAI client = OpenAI( base_url="https://api.corriente.ai/v1", api_key="ck_your_key_here" ) response = client.chat.completions.create( model="auto", # Smart routing — or pick any of 368 models messages=[{"role": "user", "content": "Hello, Corriente"}] ) print(response.choices[0].message.content)

Built different.

Smart Routing

Set model to "auto" and our Polymorphic Brain picks the best model for each query. Code goes to coding models. Math goes to math models. Reasoning goes to reasoning models. Automatically.

OpenAI Compatible

Change your base_url and api_key. That's it. Works with every OpenAI SDK, LangChain, LlamaIndex, and any tool that speaks the OpenAI API format.

Bare Metal

7 NVIDIA DGX nodes. 128GB unified memory each. No shared cloud instances. No noisy neighbors. Your inference runs on dedicated hardware.

Your Data Stays Here

We don't log prompts. We don't train on your data. We don't sell your queries. Your conversations are yours. Period.

Function Calling

Full tool use and function calling support. Build agents that take actions, call APIs, and interact with external systems.

Conversation Memory

Built-in session management. Create a session, send messages, and the API remembers context across turns. No client-side history management needed.

368 models. One key.

From 0.5B fast models to 236B reasoning powerhouses. We've got the model for the job.

Code

Qwen2.5-Coder 32B, CodeStral, DeepSeek-Coder, CodeLlama, StarCoder

Reasoning

Qwen3 32B, DeepSeek-R1, Mistral Large 123B, Command-R+ 104B

Math

Qwen2.5 72B, Wizard-Math, MathStral, DeepSeek-Math

Vision

LLaVA 34B, Granite Vision, Moondream, MiniCPM-V

Multilingual

Aya-Expanse 32B, Salamandra, NLLB — Spanish, Portuguese, 100+ languages

Fast

Qwen3 30B-A3B (MoE), Gemma3 4B, Phi4-Mini — sub-second responses

Simple pricing.

No per-token billing. No surprise invoices. Pick a plan, use your models.

Trial

Free
250K tokens / 7 days
  • All 368 models
  • Auto routing
  • 10 req/min
  • Community support
Start Free

Power

$79/mo
10M tokens / month
  • Everything in Starter
  • 200 req/min
  • 10 concurrent requests
  • 50K requests/day
  • Direct support line
Upgrade

Enterprise

Custom
Unlimited
  • Dedicated nodes
  • Custom model fine-tuning
  • SLA guarantee
  • On-prem deployment option
  • 24/7 priority support
Talk to Us

Quick Start

curl

curl https://api.corriente.ai/v1/chat/completions \ -H "Authorization: Bearer ck_your_key" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'

Python

pip install openai from openai import OpenAI client = OpenAI(base_url="https://api.corriente.ai/v1", api_key="ck_your_key") # Auto-route to best model r = client.chat.completions.create(model="auto", messages=[{"role":"user", "content":"Hello"}]) # Or pick a specific model r = client.chat.completions.create(model="qwen2.5-coder:32b", messages=[...]) # Use sessions for multi-turn # POST /v1/sessions to create, then pass X-Session-ID header

JavaScript

import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.corriente.ai/v1', apiKey: 'ck_your_key' }); const response = await client.chat.completions.create({ model: 'auto', messages: [{ role: 'user', content: 'Hello' }] });

Endpoints

GET /health Health check (no auth)
GET /v1/models List all 368 models
POST /v1/chat/completions Chat inference (streaming supported)
POST /v1/completions Legacy text completions
POST /v1/embeddings Text embeddings
POST /v1/sessions Create conversation session
GET /v1/sessions/:id Get session history