pricing

Gemini 3.5 Flash Pricing Guide: Token Costs, Workload Examples, and Production Budgeting

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

EvoLink Team

Product Team

May 20, 2026

10 min read

Last verified: May 20, 2026. Pricing data below is based on official Google model documentation and EvoLink platform data reviewed on that date.

Gemini 3.5 Flash is Google's stable, cost-efficient model for high-volume production workloads. But "cost-efficient" is relative — at $1.50/$9.00 per 1M tokens, it sits between budget options like Gemini 3 Flash Preview and premium models like Gemini 3.1 Pro. This guide breaks down every pricing dimension and shows what real production workloads actually cost.

TL;DR

Input: $1.50 per 1M tokens
Output: $9.00 per 1M tokens
Cache hit: $0.15 per 1M tokens (90% savings on cached input)
Audio/Video input: $1.50 per 1M tokens (same as text)
Context caching, Batch API, and Google Search grounding are supported
The biggest cost driver is output tokens, not input — optimize output length first

Complete pricing table

Token type	Price per 1M tokens	Notes
Text input	$1.50	Standard text prompt tokens
Text output	$9.00	Generated response tokens
Cache hit (input)	$0.15	90% discount vs standard input; storage costs $1.00/hour
Audio input	$1.50	Processed audio tokens
Video input	$1.50	Processed video frame tokens
Image input	$1.50	Processed image tokens
PDF input	$1.50	Processed document tokens

Batch and Flex pricing

Google also offers discounted pricing for non-urgent workloads:

Pricing tier	Input / 1M	Output / 1M	Use case
Standard	$1.50	$9.00	Real-time requests
Batch	$0.75	$4.50	Asynchronous bulk processing
Flex	$0.75	$4.50	Flexible delivery timing
Priority	$2.70	$16.20	Guaranteed low-latency

Batch and Flex pricing offer a 50% discount over standard rates.

Key observations

Output tokens cost 6x more than input tokens. This is the single most important cost lever.
Cache hits reduce input cost by 90% — but factor in $1.00/hour cache storage cost.
Batch/Flex pricing halves both input and output costs for non-urgent workloads.
All multimodal inputs (audio, video, image, PDF) are priced at the same rate as text input.

How Gemini 3.5 Flash compares on price

Model	Input / 1M	Output / 1M	Cache hit / 1M	Context
Gemini 3.1 Flash Lite Preview	$0.25	$1.50	$0.025	1M
Gemini 3 Flash Preview	$0.50	$3.00	$0.05	1M
Claude Haiku 4.5	$1.00	$5.00	$0.10	200K
Gemini 3.5 Flash	$1.50	$9.00	$0.15	1M
Gemini 3.1 Pro	$2.00	$12.00	—	1M
Claude Sonnet 4.6	$3.00	$15.00	$0.30	200K

Gemini 3.5 Flash is positioned as the mid-tier Flash model — more capable and stable than preview Flash models, but significantly cheaper than Pro-tier or Sonnet-tier models.

Workload cost examples

Example 1: Classification pipeline

High-volume classification with short prompts and short responses.

Daily volume: 100,000 requests
Average input: 500 tokens per request
Average output: 50 tokens per request
Daily input tokens: 50M
Daily output tokens: 5M

Cost component	Calculation	Daily	Monthly
Input	50M × $1.50/1M	$75.00	$2,250
Output	5M × $9.00/1M	$45.00	$1,350
Total		$120.00	$3,600

With context caching (80% of input tokens cached):

Cost component	Calculation	Daily	Monthly
Input (uncached 20%)	10M × $1.50/1M	$15.00	$450
Input (cached 80%)	40M × $0.15/1M	$6.00	$180
Output	5M × $9.00/1M	$45.00	$1,350
Total with caching		$66.00	$1,980

Caching saves 45% in this scenario.

Example 2: Coding agent

Agent workflow with moderate input (code context) and heavy output (generated code).

Daily volume: 5,000 agent sessions
Average input: 10,000 tokens per session
Average output: 3,000 tokens per session
Daily input tokens: 50M
Daily output tokens: 15M

Cost component	Calculation	Daily	Monthly
Input	50M × $1.50/1M	$75.00	$2,250
Output	15M × $9.00/1M	$135.00	$4,050
Total		$210.00	$6,300

Output dominates — 64% of total cost. Reducing average output length by 20% saves $1,260/month.

Example 3: Long-context document analysis

Processing large documents with summarization output.

Daily volume: 500 documents
Average input: 100,000 tokens per document
Average output: 2,000 tokens per document
Daily input tokens: 50M
Daily output tokens: 1M

Cost component	Calculation	Daily	Monthly
Input	50M × $1.50/1M	$75.00	$2,250
Output	1M × $9.00/1M	$9.00	$270
Total		$84.00	$2,520

For long-context, input-heavy workloads, context caching is critical. If 60% of document context is shared (common headers, templates, instructions):

| Total with caching | | $48.00 | $1,440 |

Caching saves 43%.

Example 4: Multimodal pipeline (video + audio)

Processing video content with audio for content understanding.

Daily volume: 1,000 videos
Average video input: 20,000 tokens per video
Average audio input: 5,000 tokens per video
Average text input: 1,000 tokens per video
Average output: 500 tokens per video
Daily video tokens: 20M
Daily audio tokens: 5M
Daily text tokens: 1M
Daily output tokens: 500K

Cost component	Calculation	Daily	Monthly
Video input	20M × $1.50/1M	$30.00	$900
Audio input	5M × $1.50/1M	$7.50	$225
Text input	1M × $1.50/1M	$1.50	$45
Output	0.5M × $9.00/1M	$4.50	$135
Total		$43.50	$1,305

Multimodal pricing is straightforward — all input types share the same rate.

Cost optimization strategies

1. Use context caching aggressively

Context caching reduces input token costs by 90% (cache hits at $0.15 vs $1.50 per 1M tokens), but note that Google charges $1.00/hour for cache storage. Caching is most cost-effective when cached content is reused frequently within each storage hour. Invest in caching for:

System prompts and instructions
Few-shot examples
Shared document context across requests
Repeated tool definitions and schemas

2. Optimize output length

Output tokens cost 6x more than input. Strategies:

Set max_tokens to the minimum needed for your task
Use structured output schemas to constrain response format
For classification, use enum-style outputs instead of explanations
For extraction, return only the extracted fields

3. Use Batch API for non-urgent workloads

Batch API typically offers lower pricing for workloads that can tolerate higher latency. Use it for:

Nightly data processing
Bulk classification
Document analysis pipelines
Evaluation and testing

4. Route by workload tier

Not every request needs Gemini 3.5 Flash. Route simpler tasks to cheaper models:

Workload complexity	Recommended model	Why
Simple classification	Gemini 3.1 Flash Lite Preview ($0.25/$1.50)	6x cheaper input, 6x cheaper output
Standard extraction	Gemini 3 Flash Preview ($0.50/$3.00)	3x cheaper, good enough for simple tasks
Agent sub-steps	Gemini 3.5 Flash ($1.50/$9.00)	GA stability, better reasoning
Complex reasoning	Gemini 3.1 Pro ($2.00/$12.00)	Higher quality for hard tasks

5. Monitor cost per successful task, not just token cost

A cheaper model that requires 3 retries can cost more than a more expensive model that succeeds on the first attempt. Track:

Token cost per request
Retry rate
Fallback rate
Cost per successful task (including retries and fallbacks)

Hidden cost factors

Retries

If 10% of requests fail validation and require retrying, add 10% to your token budget. For agent workflows with multi-step chains, retry costs compound across steps.

Fallback to stronger models

If Gemini 3.5 Flash cannot handle 5% of requests and you fallback to Gemini 3.1 Pro, factor in the Pro-tier pricing for those requests.

Context growth in agent loops

Agent workflows often accumulate context across steps. A 5-step agent loop with growing context can use 2-3x more input tokens than the initial prompt. Budget for context growth, not just the first request.

Rate limit overhead

If you hit rate limits and need to queue or retry requests, the latency cost translates to engineering time and user experience impact — not just token spend.

FAQ

What is the cheapest way to use Gemini 3.5 Flash?

Enable context caching for repeated prompts, constrain output length with structured schemas, use Batch API for non-urgent work, and route simple tasks to cheaper Flash models.

Is Gemini 3.5 Flash cheaper than Claude Haiku 4.5?

No. Claude Haiku 4.5 is cheaper on both input ($1.00 vs $1.50) and output ($5.00 vs $9.00) per 1M tokens. But Gemini 3.5 Flash offers 1M context (vs 200K) and native multimodal inputs that Haiku does not support.

How much does context caching save?

Cache hits cost $0.15 per 1M tokens vs $1.50 for standard input — a 90% reduction. For workloads with shared system prompts or repeated context, caching can reduce total costs by 30-50%.

Is Gemini 3.5 Flash cheaper than Gemini 3.1 Pro?

Yes. Gemini 3.5 Flash is 25% cheaper on input ($1.50 vs $2.00) and 25% cheaper on output ($9.00 vs $12.00) compared to Gemini 3.1 Pro.

How do I estimate my monthly cost?

Calculate: (daily input tokens × $1.50/1M) + (daily output tokens × $9.00/1M) × 30. Then subtract savings from context caching and add overhead for retries and fallbacks.

Budget Your Gemini 3.5 Flash Workloads on EvoLink

EvoLink provides a unified API with usage monitoring and cost tracking across all Gemini models. Compare costs, set budget alerts, and route between Flash tiers from one integration.

Sources

All Posts

#Gemini 3.5 Flash #API pricing #token cost #production budgeting #cost optimization