HappyHorse 1.0 is now liveTry it now

Gemini 3.5 Flash API

Gemini 3.5 Flash is Google's production-ready Flash model built for agentic workflows, coding agents, and long-horizon tasks. It combines frontier-level intelligence with Flash-tier speed and cost. Access it on EvoLink with OpenAI-compatible or native Gemini requests; model ID is gemini-3.5-flash.
Price: 

$1.399(~ 95.1 credits) per 1M input tokens; $8.387(~ 570.3 credits) per 1M output tokens

$0.141(~ 9.6 credits) per 1M cache read tokens; $1.399(~ 95.1 credits) per 1M audio tokens

Google Search grounding charged separately per query.

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

Production-ready Flash model for agentic workflows and coding

Gemini 3.5 Flash is generally available and stable for scaled production use. Built for agentic workflows, coding agents, sub-agent deployment, and long-horizon tasks, it delivers frontier-level intelligence at Flash-tier cost with 1M context, built-in reasoning, and full tool support.

Page keyword

Gemini 3.5 Flash API

Request model ID

gemini-3.5-flash

Gemini 3.5 Flash API

Best use cases for Gemini 3.5 Flash API

Coding Agents and Multi-Step Development Loops

Gemini 3.5 Flash excels at coding tasks — code generation, debugging, refactoring, and test writing — at Flash-tier speed. Use it as the default model in coding agent loops where each iteration consumes tokens and latency matters.

Coding agents

Agentic Workflows and Sub-Agent Deployment

Built for parallel agentic execution loops: function calling, structured outputs, code execution, and search grounding. Deploy it as a sub-agent in multi-agent systems where speed and cost per call determine total workflow economics.

Agentic workflows

Long-Horizon Tasks and Document Processing

With 1M input context and 65K output tokens, Gemini 3.5 Flash handles long-horizon tasks that span many steps — legal document review, codebase analysis, research synthesis, and PDF-heavy workflows — without context truncation.

Long-horizon tasks

Production Inference at Flash-Tier Cost

Generally available and stable for scaled production use. Context caching, batch API, and unified multimodal pricing make it the default high-throughput model for teams that need reasoning quality without Pro-tier cost.

Production inference

Why use EvoLink for Gemini 3.5 Flash API

EvoLink makes Gemini 3.5 Flash easy to slot into existing stacks: one gateway, OpenAI-compatible requests, and clean routing across the Gemini family from Flash Lite to Pro.

Keep OpenAI-Style Workflows While Using Gemini

Teams already built around the OpenAI SDK can add Gemini 3.5 Flash without rebuilding their request layer, auth flow, or fallback logic from scratch.

Use Flash as the Balanced Stage in a Multi-Model Stack

Route balanced multimodal and reasoning workloads to Flash, send retry-friendly batch jobs down to Flash Lite, and escalate the hardest reasoning to Pro on the same gateway.

Lower Migration Cost Than Vendor-Specific Integrations

One API key, OpenAI-compatible and native Gemini request formats, plus caching and batch support make it easier to operate Gemini alongside the rest of your model catalog.

How to use Gemini 3.5 Flash API

Use this page as an access overview: pick your request format, use the request model ID, and move detailed request examples to docs.

1

Step 1 - Choose the Request Format

Gemini 3.5 Flash can be called through OpenAI-compatible requests or the native Gemini API, which makes it easier to fit into existing stacks without rebuilding your whole integration path.

2

Step 2 - Use the Current Request Model ID

Use the exact request model ID "gemini-3.5-flash" when sending production traffic. That keeps the page keyword focused on Gemini 3.5 Flash API while still matching the route you actually call.

3

Step 3 - Pick the Right Workloads Here

Use Flash for multimodal reasoning, audio/video understanding, planning, and balanced agentic workflows. Send cheap high-volume jobs down to Flash Lite and the hardest reasoning up to Pro. For exact request bodies, parameters, and endpoint examples, continue to docs.

Gemini 3.5 Flash API Features and Limits

Core capabilities and limits for planning production integrations

Context

1,048,576 Input Tokens

Up to 1,048,576 input tokens and 65,535 output tokens.

Multimodal

Multimodal Inputs

Text, image, video, audio, and PDF inputs with text output — all share unified pricing.

Reasoning

Reasoning + Structured Outputs

Built-in reasoning and structured outputs supported for reliable, machine-readable results.

Tools

Function Calling + Tools

Function calling, code execution, and Google Search grounding are supported.

Scale

Caching + Batch

Context caching and Batch API supported for repeated or large-scale workloads.

Pricing

Unified Audio/Video Pricing

Use the live pricing table above to verify the current EvoLink pay-as-you-go rate for this route.

Gemini 3.5 Flash vs Other Gemini Models

Compare positioning, context, reasoning style, and tooling across the Gemini family to pick the right route for your workload

ModelBest forContext windowReasoning styleTooling & streaming
Gemini 3.5 FlashAgentic workflows, coding agents, long-horizon tasks1M input / 65K outputBuilt-in reasoning at Flash speedFunction calling, code execution, structured outputs, caching, batch
Gemini 3 Flash PreviewGeneral fast workloads, previous-gen Flash baseline1M input / 65K outputStandard Flash reasoningFunction calling, structured outputs, caching
Gemini 3.1 ProHardest reasoning, complex analysis, frontier tasks1M input / 65K outputDeepest reasoning with thinking tokensFull tool suite, code execution, search grounding
Gemini 3.1 Flash LiteHigh-volume batch, low-cost extraction, simple tasks1M input / 65K outputLightweight, no deep reasoningFunction calling, structured outputs, caching, batch

Gemini 3.5 Flash API FAQs

Everything you need to know about the product and billing.

Yes. Google lists Gemini 3.5 Flash as generally available and stable for scaled production use. It is not a preview or experimental model — you can route production traffic to it with confidence.
Gemini 3.5 Flash is the current-gen Flash model with frontier-level intelligence, stronger agentic and coding performance, built-in reasoning output, and unified pricing across text/image/video/audio. Gemini 3 Flash is the previous-generation Flash. Flash Lite is the lower-cost route for retry-friendly high-volume tasks.
Yes. EvoLink supports OpenAI-compatible requests at POST /v1/chat/completions, and it also supports Google Native API requests at POST /v1beta/models/gemini-3.5-flash:{method}.
Gemini 3.5 Flash supports up to 1,048,576 input tokens and 65,535 output tokens, which makes it suitable for long documents, multimodal context, and multi-step agentic pipelines.
Yes. Gemini 3.5 Flash supports text, image, video, audio, and PDF input with text output. Audio and video inputs share the same per-token price as text, which makes multimodal workloads cost-predictable.
Use the exact model identifier "gemini-3.5-flash" in API requests. This page targets the Gemini 3.5 Flash API route, and the request model ID matches the page slug.
Choose Flash for balanced multimodal reasoning workloads — audio/video understanding, agentic planning, and decision steps that need reasoning at moderate cost. Drop down to Flash Lite for high-volume retry-friendly batch jobs, and escalate to Pro for the hardest reasoning.
Gemini 3.5 Flash is best for multimodal reasoning over audio, video, and image inputs, agentic workflows, structured planning, and balanced production traffic where reasoning quality and cost both matter.
Image generation, audio generation, and Live API are not supported on this text-output Flash model. For image generation use Nano Banana / Gemini 3 Flash Image routes instead.

Gemini API Models on EvoLink

Gemini 3.5 Flash is the balanced multimodal route in the Gemini family. Drop down to Gemini 3.1 Flash Lite for cheap high-throughput jobs, or move up to Gemini 3.1 Pro for frontier reasoning. All models share the same API format — switch with one parameter.