
Gemini 3.5 Flash API Is Now Available: Model ID, Pricing, and Production Notes

Update (May 20, 2026): Google has updated its official Gemini API documentation. Gemini 3.5 Flash is now listed as generally available and stable for scaled production use. The model ID isgemini-3.5-flash. This page has been updated from a release-watch format to reflect confirmed availability.
TL;DR
- Gemini 3.5 Flash is now generally available (GA) and marked stable for production use.
- Model ID:
gemini-3.5-flash— confirmed in Google's official Gemini API docs. - Pricing: $1.50 input / $9.00 output per 1M tokens (standard tier), with context caching and batch discounts available.
- Context: 1,048,576 input tokens / 65,536 output tokens.
- Key strengths: agentic workflows, coding agents, sub-agent deployment, long-horizon tasks.
- Not a preview — production teams can route traffic to it with confidence.
What Changed Since May 18
On May 18, 2026, this page reported that Google's official Gemini API docs did not list Gemini 3.5 Flash. Here is what has changed:
| Item | May 18 status | Current status (May 20) |
|---|---|---|
| Official release | Not confirmed | Generally available, stable |
| Model ID | Not confirmed | gemini-3.5-flash |
| Pricing | Not confirmed | $1.50 input / $9.00 output per 1M tokens |
| Context window | Not confirmed | 1M input / 65K output |
| Tool calling | Not confirmed | Function calling, structured outputs, code execution supported |
| Context caching | Not confirmed | Supported |
| Batch API | Not confirmed | Supported |
| Production status | Not available | Stable for scaled production use |
Confirmed Capabilities
Google's official documentation describes Gemini 3.5 Flash as a model built for real-world tasks with frontier-level intelligence at Flash-tier speed and cost. The key capabilities confirmed:
Agentic Workflows
Gemini 3.5 Flash is optimized for agentic workflows, parallel execution loops, and sub-agent deployment. Function calling, structured outputs, and code execution are all supported natively.
Coding Tasks
The model handles code generation, debugging, refactoring, and test writing at Flash-tier speed, making it a strong candidate for coding agent loops where each iteration consumes tokens.
Long-Horizon Tasks
With 1M input context, it can process full codebases, multi-document analysis, legal review, research synthesis, and PDF-heavy workflows without context truncation.
Multimodal Input
Text, image, video, audio, and PDF inputs are supported with unified pricing — no separate surcharges for audio or video input.
Pricing Overview
| Tier | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Standard | $1.50 | $9.00 |
| Context caching | Reduced input cost | Same output |
| Batch API | Additional discounts | Additional discounts |
What This Means for Production Teams
You Can Route Production Traffic Now
Gemini 3.5 Flash is not a preview or experimental model. Google marks it as stable for scaled production use. This means you can plan production routing, cost budgets, and SLAs around it.
Evaluate It for Agent and Coding Workloads
Google explicitly positions this model for agentic workflows and coding tasks. If you are running coding agents, multi-step automation, or tool-orchestrated pipelines, this is a model worth benchmarking against your current default.
Compare Within the Gemini Family
| Model | Best for | Cost profile |
|---|---|---|
| Gemini 3.5 Flash | Agentic workflows, coding, long-horizon tasks | Flash-tier |
| Gemini 3.1 Pro | Hardest reasoning, complex analysis | Higher cost |
| Gemini 3.1 Flash Lite | High-volume batch, simple extraction | Lowest cost |
Use EvoLink for Unified Access
EvoLink provides OpenAI-compatible access to Gemini 3.5 Flash alongside other models. One API key, one billing system, and configurable routing between Flash, Pro, and models from other providers.
Updated Evaluation Checklist
Now that the model is available, here is what to verify on your actual workloads:
| Dimension | What to measure |
|---|---|
| Latency | Time to first token and full completion on your prompt distribution |
| Quality | Task success rate, schema adherence, hallucination rate |
| Cost | Token cost including retries, fallbacks, and context caching savings |
| Tool use | Function calling reliability, structured output accuracy |
| Agent loops | Cost per full agent session (see cost example) |
| Fallback rate | How often Flash must escalate to Pro |
Related Articles
- Gemini 3.5 Flash API — Full Product Page
- Gemini API Models on EvoLink
- Gemini 3.5 Pro API Release Watch
- Gemini 3.5 Pro vs Flash Release Watch
Official Sources
- Gemini 3.5 Flash model docs
- What's new in Gemini 3.5 Flash
- Gemini API pricing
- Gemini API release notes
FAQ
Is Gemini 3.5 Flash now available in the API?
gemini-3.5-flash.What is the model ID for Gemini 3.5 Flash?
gemini-3.5-flash. Use this exact string in API requests.What is Gemini 3.5 Flash pricing?
Is Gemini 3.5 Flash production-ready?
Yes. Google marks it as stable for scaled production use. It is not a preview or experimental model.
What is Gemini 3.5 Flash best for?
According to Google's official documentation, it is optimized for agentic workflows, coding agents, sub-agent deployment, long-horizon tasks, and cost-efficient production inference with 1M context.
How does Gemini 3.5 Flash compare to Gemini 3 Flash?
Gemini 3.5 Flash is the current-generation Flash model with frontier-level intelligence, stronger agentic and coding performance, and built-in reasoning. Gemini 3 Flash is the previous generation.


