Gemini Omni coming soonLearn more
Gemini 3.5 Flash Pricing Guide: Token Costs, Workload Examples, and Production Budgeting
pricing

Gemini 3.5 Flash Pricing Guide: Token Costs, Workload Examples, and Production Budgeting

EvoLink Team
EvoLink Team
Product Team
May 20, 2026
10 min read
Last verified: May 20, 2026. Pricing data below is based on official Google model documentation and EvoLink platform data reviewed on that date.
Gemini 3.5 Flash is Google's stable, cost-efficient model for high-volume production workloads. But "cost-efficient" is relative — at $1.50/$9.00 per 1M tokens, it sits between budget options like Gemini 3 Flash Preview and premium models like Gemini 3.1 Pro. This guide breaks down every pricing dimension and shows what real production workloads actually cost.

TL;DR

  • Input: $1.50 per 1M tokens
  • Output: $9.00 per 1M tokens
  • Cache hit: $0.15 per 1M tokens (90% savings on cached input)
  • Audio/Video input: $1.50 per 1M tokens (same as text)
  • Context caching, Batch API, and Google Search grounding are supported
  • The biggest cost driver is output tokens, not input — optimize output length first

Complete pricing table

Token typePrice per 1M tokensNotes
Text input$1.50Standard text prompt tokens
Text output$9.00Generated response tokens
Cache hit (input)$0.1590% discount vs standard input; storage costs $1.00/hour
Audio input$1.50Processed audio tokens
Video input$1.50Processed video frame tokens
Image input$1.50Processed image tokens
PDF input$1.50Processed document tokens

Batch and Flex pricing

Google also offers discounted pricing for non-urgent workloads:

Pricing tierInput / 1MOutput / 1MUse case
Standard$1.50$9.00Real-time requests
Batch$0.75$4.50Asynchronous bulk processing
Flex$0.75$4.50Flexible delivery timing
Priority$2.70$16.20Guaranteed low-latency
Batch and Flex pricing offer a 50% discount over standard rates.

Key observations

  • Output tokens cost 6x more than input tokens. This is the single most important cost lever.
  • Cache hits reduce input cost by 90% — but factor in $1.00/hour cache storage cost.
  • Batch/Flex pricing halves both input and output costs for non-urgent workloads.
  • All multimodal inputs (audio, video, image, PDF) are priced at the same rate as text input.

How Gemini 3.5 Flash compares on price

ModelInput / 1MOutput / 1MCache hit / 1MContext
Gemini 3.1 Flash Lite Preview$0.25$1.50$0.0251M
Gemini 3 Flash Preview$0.50$3.00$0.051M
Claude Haiku 4.5$1.00$5.00$0.10200K
Gemini 3.5 Flash$1.50$9.00$0.151M
Gemini 3.1 Pro$2.00$12.001M
Claude Sonnet 4.6$3.00$15.00$0.30200K
Gemini 3.5 Flash is positioned as the mid-tier Flash model — more capable and stable than preview Flash models, but significantly cheaper than Pro-tier or Sonnet-tier models.

Workload cost examples

Example 1: Classification pipeline

High-volume classification with short prompts and short responses.

Daily volume: 100,000 requests Average input: 500 tokens per request Average output: 50 tokens per request Daily input tokens: 50M Daily output tokens: 5M
Cost componentCalculationDailyMonthly
Input50M × $1.50/1M$75.00$2,250
Output5M × $9.00/1M$45.00$1,350
Total$120.00$3,600

With context caching (80% of input tokens cached):

Cost componentCalculationDailyMonthly
Input (uncached 20%)10M × $1.50/1M$15.00$450
Input (cached 80%)40M × $0.15/1M$6.00$180
Output5M × $9.00/1M$45.00$1,350
Total with caching$66.00$1,980
Caching saves 45% in this scenario.

Example 2: Coding agent

Agent workflow with moderate input (code context) and heavy output (generated code).

Daily volume: 5,000 agent sessions Average input: 10,000 tokens per session Average output: 3,000 tokens per session Daily input tokens: 50M Daily output tokens: 15M
Cost componentCalculationDailyMonthly
Input50M × $1.50/1M$75.00$2,250
Output15M × $9.00/1M$135.00$4,050
Total$210.00$6,300

Output dominates — 64% of total cost. Reducing average output length by 20% saves $1,260/month.

Example 3: Long-context document analysis

Processing large documents with summarization output.

Daily volume: 500 documents Average input: 100,000 tokens per document Average output: 2,000 tokens per document Daily input tokens: 50M Daily output tokens: 1M
Cost componentCalculationDailyMonthly
Input50M × $1.50/1M$75.00$2,250
Output1M × $9.00/1M$9.00$270
Total$84.00$2,520

For long-context, input-heavy workloads, context caching is critical. If 60% of document context is shared (common headers, templates, instructions):

| Total with caching | | $48.00 | $1,440 |
Caching saves 43%.

Example 4: Multimodal pipeline (video + audio)

Processing video content with audio for content understanding.

Daily volume: 1,000 videos Average video input: 20,000 tokens per video Average audio input: 5,000 tokens per video Average text input: 1,000 tokens per video Average output: 500 tokens per video Daily video tokens: 20M Daily audio tokens: 5M Daily text tokens: 1M Daily output tokens: 500K
Cost componentCalculationDailyMonthly
Video input20M × $1.50/1M$30.00$900
Audio input5M × $1.50/1M$7.50$225
Text input1M × $1.50/1M$1.50$45
Output0.5M × $9.00/1M$4.50$135
Total$43.50$1,305

Multimodal pricing is straightforward — all input types share the same rate.

Cost optimization strategies

1. Use context caching aggressively

Context caching reduces input token costs by 90% (cache hits at $0.15 vs $1.50 per 1M tokens), but note that Google charges $1.00/hour for cache storage. Caching is most cost-effective when cached content is reused frequently within each storage hour. Invest in caching for:
  • System prompts and instructions
  • Few-shot examples
  • Shared document context across requests
  • Repeated tool definitions and schemas

2. Optimize output length

Output tokens cost 6x more than input. Strategies:

  • Set max_tokens to the minimum needed for your task
  • Use structured output schemas to constrain response format
  • For classification, use enum-style outputs instead of explanations
  • For extraction, return only the extracted fields

3. Use Batch API for non-urgent workloads

Batch API typically offers lower pricing for workloads that can tolerate higher latency. Use it for:

  • Nightly data processing
  • Bulk classification
  • Document analysis pipelines
  • Evaluation and testing

4. Route by workload tier

Not every request needs Gemini 3.5 Flash. Route simpler tasks to cheaper models:

Workload complexityRecommended modelWhy
Simple classificationGemini 3.1 Flash Lite Preview ($0.25/$1.50)6x cheaper input, 6x cheaper output
Standard extractionGemini 3 Flash Preview ($0.50/$3.00)3x cheaper, good enough for simple tasks
Agent sub-stepsGemini 3.5 Flash ($1.50/$9.00)GA stability, better reasoning
Complex reasoningGemini 3.1 Pro ($2.00/$12.00)Higher quality for hard tasks

5. Monitor cost per successful task, not just token cost

A cheaper model that requires 3 retries can cost more than a more expensive model that succeeds on the first attempt. Track:

  • Token cost per request
  • Retry rate
  • Fallback rate
  • Cost per successful task (including retries and fallbacks)

Hidden cost factors

Retries

If 10% of requests fail validation and require retrying, add 10% to your token budget. For agent workflows with multi-step chains, retry costs compound across steps.

Fallback to stronger models

If Gemini 3.5 Flash cannot handle 5% of requests and you fallback to Gemini 3.1 Pro, factor in the Pro-tier pricing for those requests.

Context growth in agent loops

Agent workflows often accumulate context across steps. A 5-step agent loop with growing context can use 2-3x more input tokens than the initial prompt. Budget for context growth, not just the first request.

Rate limit overhead

If you hit rate limits and need to queue or retry requests, the latency cost translates to engineering time and user experience impact — not just token spend.

FAQ

What is the cheapest way to use Gemini 3.5 Flash?

Enable context caching for repeated prompts, constrain output length with structured schemas, use Batch API for non-urgent work, and route simple tasks to cheaper Flash models.

Is Gemini 3.5 Flash cheaper than Claude Haiku 4.5?

No. Claude Haiku 4.5 is cheaper on both input ($1.00 vs $1.50) and output ($5.00 vs $9.00) per 1M tokens. But Gemini 3.5 Flash offers 1M context (vs 200K) and native multimodal inputs that Haiku does not support.

How much does context caching save?

Cache hits cost $0.15 per 1M tokens vs $1.50 for standard input — a 90% reduction. For workloads with shared system prompts or repeated context, caching can reduce total costs by 30-50%.

Is Gemini 3.5 Flash cheaper than Gemini 3.1 Pro?

Yes. Gemini 3.5 Flash is 25% cheaper on input ($1.50 vs $2.00) and 25% cheaper on output ($9.00 vs $12.00) compared to Gemini 3.1 Pro.

How do I estimate my monthly cost?

Calculate: (daily input tokens × $1.50/1M) + (daily output tokens × $9.00/1M) × 30. Then subtract savings from context caching and add overhead for retries and fallbacks.

EvoLink provides a unified API with usage monitoring and cost tracking across all Gemini models. Compare costs, set budget alerts, and route between Flash tiers from one integration.

Related reading:

Explore on EvoLink:

Sources

Ready to Reduce Your AI Costs by 89%?

Start using EvoLink today and experience the power of intelligent API routing.