MiniMax-M3 API

Use MiniMax-M3 through EvoLink with one API key on both OpenAI-compatible (/v1/chat/completions) and Anthropic Messages (/v1/messages) endpoints. With ~1M context, deep thinking, multimodal input, and prompt caching, it fits coding agents, repo Q&A, document analysis, and Claude Code-style workflows from $0.49/1M input tokens.

Model Type:

Price:

$0.494 - 0.988(~ 33.6 - 67.2 credits) per 1M input tokens; $1.976 - 3.953(~ 134.4 - 268.8 credits) per 1M output tokens

$0.618 - 1.235(~ 42 - 84 credits) per 1M cache write tokens; $0.099 - 0.197(~ 6.7 - 13.4 credits) per 1M cache read tokens

Context over 512K tokens is billed at 2× the official rate (long-context tier, not discounted). Supports thinking, multimodal input (image/video/PDF) and prompt caching.

Stable managed access for production workloads. Recommended when you need dashboard billing, API key control, and predictable integration behavior.

Use the same API endpoint for all versions. Only the model parameter differs.

PRICING

PLAN	CONTEXT WINDOW	MAX OUTPUT	INPUT	OUTPUT	CACHE WRITE	CACHE READ
MiniMax-M3	1,000,000	131,072	≤524.3K$0.495-20% (33.6 Credits) >524.3K$1.236 (84 Credits)	≤524.3K$1.977-20% (134.4 Credits) >524.3K$4.942 (336 Credits)	≤524.3K$0.618-20% (42 Credits) >524.3K$1.545 (105 Credits)	≤524.3K$0.099-20% (6.7 Credits) >524.3K$0.248 (16.8 Credits)

Pricing Note: Prices show both USD and Credits. Units default to / 1M tokens unless noted separately.

Cache Hit: Price applies to cached prompt tokens.

MiniMax-M3 API

Route MiniMax-M3 through EvoLink for coding agents, repo Q&A, research, and multimodal document analysis with a ~1M context window, deep thinking, and prompt caching. Connect via OpenAI-compatible or Anthropic Messages endpoints, with pricing from $0.49/1M input tokens.

Access and workflow fit

Best fit

Coding agents

Model ID

MiniMax-M3

Access

OpenAI + Anthropic

Context

1M window

Input

$0.49/1M

Built-in

Thinking + multimodal + caching

View pricing Gateway setup for coding CLIs MiniMax-M3 launch details

What can you build with the MiniMax-M3 API?

Coding Agents & Claude Code Workflows

Build coding copilots and agents that handle repo Q&A, code generation, and review. Because MiniMax-M3 exposes a native Anthropic Messages endpoint, it drops into Claude Code-style CLIs and agent frameworks, while deep thinking handles multi-step reasoning in a single API.

Start building

Use-case showcase of MiniMax-M3 API coding

Multimodal Understanding

Feed images, video, and PDF documents directly into MiniMax-M3 alongside text. Use it for visual Q&A, screenshot-to-code, chart and document understanding, and video summarization without wiring a separate vision model into your stack.

Explore multimodal

Use-case showcase of MiniMax-M3 API multimodal

Long-Context Document Processing

Process contracts, reports, codebases, and large knowledge bases without aggressive chunking. The ~1M context window suits structured summaries, extraction pipelines, and comparison tasks, while prompt caching keeps repeated long prefixes affordable.

Process documents

Use-case showcase of MiniMax-M3 API documents

Why teams choose the MiniMax-M3 API

Teams choose MiniMax-M3 on EvoLink when they need long-context multimodal reasoning, dual-protocol access, and predictable token pricing without building a vendor-specific integration.

Dual-endpoint access

Call MiniMax-M3 through the OpenAI-compatible endpoint or the native Anthropic Messages endpoint with one EvoLink key. Existing OpenAI SDK code and Claude Code-style clients both work without rebuilding your integration path.

Predictable production cost

Visible token pricing makes budgeting easier: input from $0.49/1M, output from $1.98/1M, and cache reads from about $0.10/1M for repeated prompts. Context above 512K is billed at a 2× long-context tier.

Thinking, multimodal, and caching

Use ~1M context for large prompts, enable deep thinking for complex reasoning, pass image/video/PDF input directly, and rely on prompt caching to cut the cost of repeated context.

MiniMax-M3 vs MiniMax-M2.5: which model should you use?

Use this as a model selection guide, not a benchmark claim. M2.5 remains useful as a lower-cost MiniMax-family fallback, while M3 is the stronger choice for more demanding agentic and multimodal workloads.

Decision point	MiniMax-M2.5	MiniMax-M3
Model role	Lower-cost MiniMax fallback for text-heavy workloads	Primary MiniMax option for advanced agentic workloads
Best fit	Repo Q&A, document analysis, research, and cost-sensitive text tasks	Coding agents, Claude Code-style CLIs, multimodal reasoning, and full-repo analysis
Context window	204K context	~1M context with a 2x tier above 512K
Input coverage	Text-focused model with web search and prompt caching	Text plus image, video, and PDF input with thinking and caching
Endpoint fit	OpenAI-compatible access	OpenAI-compatible plus native Anthropic Messages access
Cost posture	Use when unit cost matters more than peak capability	Use when stronger reasoning, longer context, or multimodal input justify the upgrade

View MiniMax-M2.5 Read the full comparison Stay on MiniMax-M3

How to integrate the MiniMax-M3 API

Keep your existing OpenAI or Anthropic client, point it to EvoLink, set the model to MiniMax-M3, and use the same route for coding-agent, multimodal, and long-context workflows.

Step 1 — Authenticate

Create an EvoLink API key and set the EvoLink base URL. Use Bearer auth for the OpenAI-compatible endpoint, or x-api-key for the Anthropic Messages endpoint.

Step 2 — Set required fields

Send `model: MiniMax-M3` with your `messages` array. Reuse stable system prompts and prefixes to benefit from prompt caching on repeated workloads.

Step 3 — Tune outputs

Adjust temperature, top_p, max_tokens, and stream as usual. Enable `thinking` for deep reasoning, and attach images, video, or PDF content blocks for multimodal requests.

MiniMax-M3 API features for production teams

Concrete controls and deployment signals instead of a generic model overview

Thinking

Deep thinking mode

Enable thinking for math, logic, and complex multi-step analysis. Reasoning is exposed as a separate field or content block, so you can show or hide the chain of thought in your product.

Context

~1M Context Window

Fit entire codebases, long documents, and multi-turn context into one request before reaching for aggressive chunking or multi-pass orchestration.

Multimodal

Multimodal input

Pass image, video, and PDF inputs alongside text for visual Q&A, document understanding, and video summarization in the same text API.

Compatibility

OpenAI + Anthropic compatible

Connect with the OpenAI SDK via /v1/chat/completions or the Anthropic SDK via /v1/messages by changing the base URL and model name — no integration rebuild required.

Caching

Prompt Caching

Repeated prefixes and system prompts are billed at a lower cache-read rate, which helps recurring agent workflows and high-volume production traffic.

Pricing

Long-context tier pricing

Requests up to 512K context use the base rate; above 512K, tokens are billed at a 2× long-context tier, so cost scales predictably with prompt size.

MiniMax-M3 API FAQs

Everything you need to know about the product and billing.

MiniMax-M3 pricing on EvoLink starts at about $0.49 per 1M input tokens and $1.98 per 1M output tokens. Cache reads start at about $0.10 per 1M tokens, which helps when you reuse long system prompts or stable prefixes. Requests with more than 512K context are billed at a 2× long-context tier.

MiniMax-M3 is a strong fit for coding agents, Claude Code-style CLIs, repo Q&A, multimodal understanding (image, video, PDF), research workflows, and long-document analysis that benefit from ~1M context, deep thinking, and prompt caching.

MiniMax-M3 supports a context window of roughly 1M tokens. Requests up to 512K context are billed at the base rate, and tokens beyond that threshold are billed at a 2× long-context tier.

Yes. MiniMax-M3 accepts image, video, and PDF input alongside text, supports a deep thinking mode for complex reasoning, and supports prompt caching so repeated prefixes are billed at a lower cache-read rate.

Yes. EvoLink exposes MiniMax-M3 on both an OpenAI-compatible endpoint (/v1/chat/completions) and an Anthropic Messages endpoint (/v1/messages). Change the base URL and set the model to MiniMax-M3 to use either the OpenAI SDK or the Anthropic SDK.

Usually yes. Because MiniMax-M3 exposes a native Anthropic Messages endpoint, it fits Claude Code-style CLIs and agent frameworks directly, and the OpenAI-compatible endpoint covers editor tools and internal agents. For adjacent setup patterns, see One Gateway for 3 Coding CLIs and Gateway vs Direct APIs.

Use the model enum `MiniMax-M3` in the request body. EvoLink will route the request to the MiniMax-M3 model through the optimal provider.