GPT-5.1 Series (API)

Access the GPT-5.1 model family through EvoLink's unified API gateway. GPT-5.1 supports 400k context, 128k max output tokens, and a Sep 30, 2024 knowledge cutoff. Enable streaming, function calling, structured outputs, and prompt caching when supported by your account and endpoint.

Using coding CLIs? Run GPT-5.1 via EvoCode — One API for Code Agents & CLIs. (View Docs)

PRICING

PLAN	CONTEXT WINDOW	MAX OUTPUT	INPUT	OUTPUT	CACHE READ
GPT-5.1	400.0K	128.0K	$1.00-20% $1.25Official Price	$8.00-20% $10.00Official Price	$0.104-17% $0.125Official Price
GPT-5.1 (Beta)	400.0K	128.0K	$0.325-74% $1.25Official Price	$2.60-74% $10.00Official Price	$0.033-74% $0.125Official Price

Pricing Note: Price unit: USD / 1M tokens

Cache Hit: Price applies to cached prompt tokens.

Two ways to run GPT-5.1 — pick the tier that matches your workload.

· GPT-5.1: the default tier for production reliability and predictable availability.
· GPT-5.1 (Beta): a lower-cost tier with best-effort availability; retries recommended for retry-tolerant workloads.

Build with GPT-5.1 API — Production-Ready Intelligence

Use the GPT-5.1 API for dependable chat performance, tool-using workflows, and scalable long-context handling. Integrate via Responses or Chat Completions, enable streaming and structured outputs, and pin snapshots for release stability.

What can GPT-5.1 API achieve?

Massive Context Analysis

Handle larger inputs and longer conversation history with GPT-5.1's 400k context window and up to 128k output tokens. This is useful for reviewing repositories, analyzing long documents, or running multi-step research without excessive manual chunking.

Explore context capabilities

Advanced Reasoning

For problems that require multi-step thinking—planning, coding assistance, and decision support—use configurable reasoning effort. GPT-5.1 supports none, low, medium, and high effort so you can balance speed, cost, and depth.

Test reasoning

Prompt Caching

Prompt caching is enabled automatically for prompts 1,024 tokens or longer. Reuse stable prefixes (system prompts, policies, few-shot examples) and choose in-memory or 24h retention to reduce repeated processing and improve throughput.

Learn about caching

Why developers choose GPT-5.1 API

GPT-5.1 is a model family with snapshots and aliases, giving you stable production behavior and a clear upgrade path.

Model family design

Use chat-oriented or coding-oriented aliases like gpt-5.1-chat-latest or gpt-5.1-codex when available, while keeping a consistent API surface.

Practical long-context workflows

A 400k context window with up to 128k output tokens keeps tasks coherent and reduces the need for complex chunking pipelines.

API features for production integration

Streaming, function calling, structured outputs, and prompt caching are supported by GPT-5.1, so the model fits real production systems.

How to integrate GPT-5.1 API

Start using GPT-5.1 through EvoLink's unified gateway in three steps.

Step 1 — Get Your API Key

Create an account, generate an API key, and configure your environment variables. Access to specific GPT-5.1 variants can depend on usage tier and organization verification.

Step 2 — Configure Your Client

Use your preferred SDK or direct HTTP calls. Set the base URL to your gateway endpoint and choose Responses or Chat Completions. Pass the model alias you want to target (for example, gpt-5.1 or gpt-5.1-chat-latest).

Step 3 — Start Building

Send a small test request first, then add streaming, function calling, structured outputs, or caching. Monitor response usage fields like prompt_tokens_details.cached_tokens to validate behavior.

View API Docs

Core API Capabilities

Technical specifications for GPT-5.1 API

Capacity

Long Context (when available)

GPT-5.1 lists a 400k context window and up to 128k output tokens, with a Sep 30, 2024 knowledge cutoff.

Efficiency

Prompt Caching (when supported)

Automatic caching for prompts >= 1,024 tokens with exact prefix matching. Use prompt_cache_retention in_memory or 24h.

Intelligence

Reasoning-Oriented Variants

Configurable reasoning effort (none, low, medium, high) lets you trade off speed, cost, and depth per request.

Integration

Function / Tool Calling

Define JSON schema tools and route structured calls to your systems across endpoints like Responses and Chat Completions.

Reliability

Structured Outputs (when available)

Schema-adherent JSON responses are supported by GPT-5.1; confirm endpoint support for structured output formats.

Performance

Streaming

Stream partial tokens for responsive UIs via supported endpoints such as Responses or Realtime.

GPT-5.1 API - FAQ

Everything you need to know about the product and billing.

OpenAI's model docs list a 400,000-token context window and up to 128,000 max output tokens for GPT-5.1, with a Sep 30, 2024 knowledge cutoff. Use your dashboard and model docs as the source of truth for your account.

Prompt caching is automatic for prompts 1,024 tokens or longer and only works on exact prefix matches. Set prompt_cache_retention to in_memory or 24h. Cached tokens appear in usage.prompt_tokens_details.cached_tokens, and caches are scoped to your organization.

GPT-5.1 supports reasoning.effort values of none (default), low, medium, and high. Use lower effort for latency-sensitive tasks and higher effort for deeper multi-step reasoning.

Yes. GPT-5.1 supports streaming, function calling, and structured outputs. It is available on endpoints like Responses, Chat Completions, Realtime, Assistants, and Batch, subject to account and endpoint availability.

Cache hits require exact prefix matches. Put static instructions and examples at the start, move dynamic user data to the end, and keep tool definitions identical. You can also use prompt_cache_key to influence routing and improve cache hit rates for shared prefixes.

Structured outputs enforce JSON schema adherence, and GPT-5.1 lists structured outputs as supported. Availability can still depend on the endpoint, so confirm support in the model docs for your account.

Use snapshot model IDs to pin a specific version and avoid relying on the latest alias if you need strict consistency. The GPT-5.1 model page lists snapshot IDs such as gpt-5.1-2025-11-13.

OpenAI lists GPT-5.1 pricing per 1M tokens for input, cached input, and output (for example: $1.25 / $0.125 / $10.00). Actual prices through EvoLink can vary based on routing, plans, and discounts, so always use the pricing table and your dashboard usage/billing data as the source of truth.