Gemini Omni coming soonLearn more
Gemini 3.5 Flash API Is Now Available: Model ID, Pricing, and Production Notes
Release Watch

Gemini 3.5 Flash API Is Now Available: Model ID, Pricing, and Production Notes

EvoLink Team
EvoLink Team
Product Team
May 18, 2026
Updated on May 20, 2026
6 min read
Update (May 20, 2026): Google has updated its official Gemini API documentation. Gemini 3.5 Flash is now listed as generally available and stable for scaled production use. The model ID is gemini-3.5-flash. This page has been updated from a release-watch format to reflect confirmed availability.
For the full Gemini 3.5 Flash API page with pricing, code examples, and use cases, see Gemini 3.5 Flash API on EvoLink.

TL;DR

  • Gemini 3.5 Flash is now generally available (GA) and marked stable for production use.
  • Model ID: gemini-3.5-flash — confirmed in Google's official Gemini API docs.
  • Pricing: $1.50 input / $9.00 output per 1M tokens (standard tier), with context caching and batch discounts available.
  • Context: 1,048,576 input tokens / 65,536 output tokens.
  • Key strengths: agentic workflows, coding agents, sub-agent deployment, long-horizon tasks.
  • Not a preview — production teams can route traffic to it with confidence.

What Changed Since May 18

On May 18, 2026, this page reported that Google's official Gemini API docs did not list Gemini 3.5 Flash. Here is what has changed:

ItemMay 18 statusCurrent status (May 20)
Official releaseNot confirmedGenerally available, stable
Model IDNot confirmedgemini-3.5-flash
PricingNot confirmed$1.50 input / $9.00 output per 1M tokens
Context windowNot confirmed1M input / 65K output
Tool callingNot confirmedFunction calling, structured outputs, code execution supported
Context cachingNot confirmedSupported
Batch APINot confirmedSupported
Production statusNot availableStable for scaled production use

Confirmed Capabilities

Google's official documentation describes Gemini 3.5 Flash as a model built for real-world tasks with frontier-level intelligence at Flash-tier speed and cost. The key capabilities confirmed:

Agentic Workflows

Gemini 3.5 Flash is optimized for agentic workflows, parallel execution loops, and sub-agent deployment. Function calling, structured outputs, and code execution are all supported natively.

Coding Tasks

The model handles code generation, debugging, refactoring, and test writing at Flash-tier speed, making it a strong candidate for coding agent loops where each iteration consumes tokens.

Long-Horizon Tasks

With 1M input context, it can process full codebases, multi-document analysis, legal review, research synthesis, and PDF-heavy workflows without context truncation.

Multimodal Input

Text, image, video, audio, and PDF inputs are supported with unified pricing — no separate surcharges for audio or video input.

Pricing Overview

TierInput (per 1M tokens)Output (per 1M tokens)
Standard$1.50$9.00
Context cachingReduced input costSame output
Batch APIAdditional discountsAdditional discounts
For detailed EvoLink pricing with credit conversion, see the Gemini 3.5 Flash API page.

What This Means for Production Teams

You Can Route Production Traffic Now

Gemini 3.5 Flash is not a preview or experimental model. Google marks it as stable for scaled production use. This means you can plan production routing, cost budgets, and SLAs around it.

Evaluate It for Agent and Coding Workloads

Google explicitly positions this model for agentic workflows and coding tasks. If you are running coding agents, multi-step automation, or tool-orchestrated pipelines, this is a model worth benchmarking against your current default.

Compare Within the Gemini Family

ModelBest forCost profile
Gemini 3.5 FlashAgentic workflows, coding, long-horizon tasksFlash-tier
Gemini 3.1 ProHardest reasoning, complex analysisHigher cost
Gemini 3.1 Flash LiteHigh-volume batch, simple extractionLowest cost

EvoLink provides OpenAI-compatible access to Gemini 3.5 Flash alongside other models. One API key, one billing system, and configurable routing between Flash, Pro, and models from other providers.

Updated Evaluation Checklist

Now that the model is available, here is what to verify on your actual workloads:

DimensionWhat to measure
LatencyTime to first token and full completion on your prompt distribution
QualityTask success rate, schema adherence, hallucination rate
CostToken cost including retries, fallbacks, and context caching savings
Tool useFunction calling reliability, structured output accuracy
Agent loopsCost per full agent session (see cost example)
Fallback rateHow often Flash must escalate to Pro

Official Sources

FAQ

Is Gemini 3.5 Flash now available in the API?

Yes. As of May 2026, Google's official Gemini API documentation lists Gemini 3.5 Flash as generally available and stable for scaled production use. The model ID is gemini-3.5-flash.

What is the model ID for Gemini 3.5 Flash?

The confirmed model ID is gemini-3.5-flash. Use this exact string in API requests.

What is Gemini 3.5 Flash pricing?

Standard pricing is $1.50 per 1M input tokens and $9.00 per 1M output tokens. Context caching and Batch API provide additional cost savings. See the Gemini 3.5 Flash API page for EvoLink-specific pricing.

Is Gemini 3.5 Flash production-ready?

Yes. Google marks it as stable for scaled production use. It is not a preview or experimental model.

What is Gemini 3.5 Flash best for?

According to Google's official documentation, it is optimized for agentic workflows, coding agents, sub-agent deployment, long-horizon tasks, and cost-efficient production inference with 1M context.

How does Gemini 3.5 Flash compare to Gemini 3 Flash?

Gemini 3.5 Flash is the current-generation Flash model with frontier-level intelligence, stronger agentic and coding performance, and built-in reasoning. Gemini 3 Flash is the previous generation.

Yes. EvoLink provides access via OpenAI-compatible requests and native Gemini API format. See the Gemini 3.5 Flash API page for details.

Ready to Reduce Your AI Costs by 89%?

Start using EvoLink today and experience the power of intelligent API routing.