Gemini 3.5 Flash API
$1.399(~ 95.1 credits) per 1M input tokens; $8.387(~ 570.3 credits) per 1M output tokens
$0.141(~ 9.6 credits) per 1M cache read tokens; $1.399(~ 95.1 credits) per 1M audio tokens
Google Search grounding charged separately per query.
Highest stability with guaranteed 99.9% uptime. Recommended for production environments.
Use the same API endpoint for all versions. Only the model parameter differs.
Production-ready Flash model for agentic workflows and coding
Gemini 3.5 Flash is generally available and stable for scaled production use. Built for agentic workflows, coding agents, sub-agent deployment, and long-horizon tasks, it delivers frontier-level intelligence at Flash-tier cost with 1M context, built-in reasoning, and full tool support.
Page keyword
Gemini 3.5 Flash API
Request model ID
gemini-3.5-flash

Best use cases for Gemini 3.5 Flash API
Coding Agents and Multi-Step Development Loops
Gemini 3.5 Flash excels at coding tasks — code generation, debugging, refactoring, and test writing — at Flash-tier speed. Use it as the default model in coding agent loops where each iteration consumes tokens and latency matters.

Agentic Workflows and Sub-Agent Deployment
Built for parallel agentic execution loops: function calling, structured outputs, code execution, and search grounding. Deploy it as a sub-agent in multi-agent systems where speed and cost per call determine total workflow economics.

Long-Horizon Tasks and Document Processing
With 1M input context and 65K output tokens, Gemini 3.5 Flash handles long-horizon tasks that span many steps — legal document review, codebase analysis, research synthesis, and PDF-heavy workflows — without context truncation.

Production Inference at Flash-Tier Cost
Generally available and stable for scaled production use. Context caching, batch API, and unified multimodal pricing make it the default high-throughput model for teams that need reasoning quality without Pro-tier cost.

Why use EvoLink for Gemini 3.5 Flash API
EvoLink makes Gemini 3.5 Flash easy to slot into existing stacks: one gateway, OpenAI-compatible requests, and clean routing across the Gemini family from Flash Lite to Pro.
Keep OpenAI-Style Workflows While Using Gemini
Teams already built around the OpenAI SDK can add Gemini 3.5 Flash without rebuilding their request layer, auth flow, or fallback logic from scratch.
Use Flash as the Balanced Stage in a Multi-Model Stack
Route balanced multimodal and reasoning workloads to Flash, send retry-friendly batch jobs down to Flash Lite, and escalate the hardest reasoning to Pro on the same gateway.
Lower Migration Cost Than Vendor-Specific Integrations
One API key, OpenAI-compatible and native Gemini request formats, plus caching and batch support make it easier to operate Gemini alongside the rest of your model catalog.
How to use Gemini 3.5 Flash API
Use this page as an access overview: pick your request format, use the request model ID, and move detailed request examples to docs.
Step 1 - Choose the Request Format
Gemini 3.5 Flash can be called through OpenAI-compatible requests or the native Gemini API, which makes it easier to fit into existing stacks without rebuilding your whole integration path.
Step 2 - Use the Current Request Model ID
Use the exact request model ID "gemini-3.5-flash" when sending production traffic. That keeps the page keyword focused on Gemini 3.5 Flash API while still matching the route you actually call.
Step 3 - Pick the Right Workloads Here
Use Flash for multimodal reasoning, audio/video understanding, planning, and balanced agentic workflows. Send cheap high-volume jobs down to Flash Lite and the hardest reasoning up to Pro. For exact request bodies, parameters, and endpoint examples, continue to docs.
Gemini 3.5 Flash API Features and Limits
Core capabilities and limits for planning production integrations
1,048,576 Input Tokens
Up to 1,048,576 input tokens and 65,535 output tokens.
Multimodal Inputs
Text, image, video, audio, and PDF inputs with text output — all share unified pricing.
Reasoning + Structured Outputs
Built-in reasoning and structured outputs supported for reliable, machine-readable results.
Function Calling + Tools
Function calling, code execution, and Google Search grounding are supported.
Caching + Batch
Context caching and Batch API supported for repeated or large-scale workloads.
Unified Audio/Video Pricing
Use the live pricing table above to verify the current EvoLink pay-as-you-go rate for this route.
Gemini 3.5 Flash vs Other Gemini Models
Compare positioning, context, reasoning style, and tooling across the Gemini family to pick the right route for your workload
| Model | Best for | Context window | Reasoning style | Tooling & streaming |
|---|---|---|---|---|
| Gemini 3.5 Flash | Agentic workflows, coding agents, long-horizon tasks | 1M input / 65K output | Built-in reasoning at Flash speed | Function calling, code execution, structured outputs, caching, batch |
| Gemini 3 Flash Preview | General fast workloads, previous-gen Flash baseline | 1M input / 65K output | Standard Flash reasoning | Function calling, structured outputs, caching |
| Gemini 3.1 Pro | Hardest reasoning, complex analysis, frontier tasks | 1M input / 65K output | Deepest reasoning with thinking tokens | Full tool suite, code execution, search grounding |
| Gemini 3.1 Flash Lite | High-volume batch, low-cost extraction, simple tasks | 1M input / 65K output | Lightweight, no deep reasoning | Function calling, structured outputs, caching, batch |
Gemini 3.5 Flash API FAQs
Everything you need to know about the product and billing.
Gemini API Models on EvoLink
Gemini 3.5 Flash is the balanced multimodal route in the Gemini family. Drop down to Gemini 3.1 Flash Lite for cheap high-throughput jobs, or move up to Gemini 3.1 Pro for frontier reasoning. All models share the same API format — switch with one parameter.