MiniMax-M3 API
$0.494 - 0.988(~ 33.6 - 67.2 credits) per 1M input tokens; $1.976 - 3.953(~ 134.4 - 268.8 credits) per 1M output tokens
$0.618 - 1.235(~ 42 - 84 credits) per 1M cache write tokens; $0.099 - 0.197(~ 6.7 - 13.4 credits) per 1M cache read tokens
Context over 512K tokens is billed at 2× the official rate (long-context tier, not discounted). Supports thinking, multimodal input (image/video/PDF) and prompt caching.
Highest stability with guaranteed 99.9% uptime. Recommended for production environments.
Use the same API endpoint for all versions. Only the model parameter differs.
MiniMax-M3 API
Route MiniMax-M3 through EvoLink for coding agents, repo Q&A, research, and multimodal document analysis with a ~1M context window, deep thinking, and prompt caching. Connect via OpenAI-compatible or Anthropic Messages endpoints, with pricing from $0.49/1M input tokens.
Access and workflow fit
Best fit
Coding agents
Model ID
MiniMax-M3
Access
OpenAI + Anthropic
Context
1M window
Input
$0.49/1M
Built-in
Thinking + multimodal + caching

What can you build with the MiniMax-M3 API?
Coding Agents & Claude Code Workflows
Build coding copilots and agents that handle repo Q&A, code generation, and review. Because MiniMax-M3 exposes a native Anthropic Messages endpoint, it drops into Claude Code-style CLIs and agent frameworks, while deep thinking handles multi-step reasoning in a single API.

Multimodal Understanding
Feed images, video, and PDF documents directly into MiniMax-M3 alongside text. Use it for visual Q&A, screenshot-to-code, chart and document understanding, and video summarization without wiring a separate vision model into your stack.

Long-Context Document Processing
Process contracts, reports, codebases, and large knowledge bases without aggressive chunking. The ~1M context window suits structured summaries, extraction pipelines, and comparison tasks, while prompt caching keeps repeated long prefixes affordable.

Why teams choose the MiniMax-M3 API
Teams choose MiniMax-M3 on EvoLink when they need long-context multimodal reasoning, dual-protocol access, and predictable token pricing without building a vendor-specific integration.
Dual-endpoint access
Call MiniMax-M3 through the OpenAI-compatible endpoint or the native Anthropic Messages endpoint with one EvoLink key. Existing OpenAI SDK code and Claude Code-style clients both work without rebuilding your integration path.
Predictable production cost
Visible token pricing makes budgeting easier: input from $0.49/1M, output from $1.98/1M, and cache reads from about $0.10/1M for repeated prompts. Context above 512K is billed at a 2× long-context tier.
Thinking, multimodal, and caching
Use ~1M context for large prompts, enable deep thinking for complex reasoning, pass image/video/PDF input directly, and rely on prompt caching to cut the cost of repeated context.
MiniMax-M3 vs MiniMax-M2.5: which model should you use?
Use this as a model selection guide, not a benchmark claim. M2.5 remains useful as a lower-cost MiniMax-family fallback, while M3 is the stronger choice for more demanding agentic and multimodal workloads.
| Decision point | MiniMax-M2.5 | MiniMax-M3 |
|---|---|---|
| Model role | Lower-cost MiniMax fallback for text-heavy workloads | Primary MiniMax option for advanced agentic workloads |
| Best fit | Repo Q&A, document analysis, research, and cost-sensitive text tasks | Coding agents, Claude Code-style CLIs, multimodal reasoning, and full-repo analysis |
| Context window | 204K context | ~1M context with a 2x tier above 512K |
| Input coverage | Text-focused model with web search and prompt caching | Text plus image, video, and PDF input with thinking and caching |
| Endpoint fit | OpenAI-compatible access | OpenAI-compatible plus native Anthropic Messages access |
| Cost posture | Use when unit cost matters more than peak capability | Use when stronger reasoning, longer context, or multimodal input justify the upgrade |
How to integrate the MiniMax-M3 API
Keep your existing OpenAI or Anthropic client, point it to EvoLink, set the model to MiniMax-M3, and use the same route for coding-agent, multimodal, and long-context workflows.
Step 1 — Authenticate
Create an EvoLink API key and set the EvoLink base URL. Use Bearer auth for the OpenAI-compatible endpoint, or x-api-key for the Anthropic Messages endpoint.
Step 2 — Set required fields
Send `model: MiniMax-M3` with your `messages` array. Reuse stable system prompts and prefixes to benefit from prompt caching on repeated workloads.
Step 3 — Tune outputs
Adjust temperature, top_p, max_tokens, and stream as usual. Enable `thinking` for deep reasoning, and attach images, video, or PDF content blocks for multimodal requests.
MiniMax-M3 API features for production teams
Concrete controls and deployment signals instead of a generic model overview
Deep thinking mode
Enable thinking for math, logic, and complex multi-step analysis. Reasoning is exposed as a separate field or content block, so you can show or hide the chain of thought in your product.
~1M Context Window
Fit entire codebases, long documents, and multi-turn context into one request before reaching for aggressive chunking or multi-pass orchestration.
Multimodal input
Pass image, video, and PDF inputs alongside text for visual Q&A, document understanding, and video summarization in the same text API.
OpenAI + Anthropic compatible
Connect with the OpenAI SDK via /v1/chat/completions or the Anthropic SDK via /v1/messages by changing the base URL and model name — no integration rebuild required.
Prompt Caching
Repeated prefixes and system prompts are billed at a lower cache-read rate, which helps recurring agent workflows and high-volume production traffic.
Long-context tier pricing
Requests up to 512K context use the base rate; above 512K, tokens are billed at a 2× long-context tier, so cost scales predictably with prompt size.
MiniMax-M3 API FAQs
Everything you need to know about the product and billing.