Gemini Omni Flash API

Gemini Omni Flash API on EvoLink — video generation and video editing through one API key, async task workflow, and callback support.

Model Type:

✓Text to Video Image to Video Reference to Video Video Edit

Price:

$1.275(~ 86.7 credits) per 1M input tokens; $14.875(~ 1011.5 credits) per 1M video output tokens

$7.650(~ 520.2 credits) per 1M other output tokens

Token-based billing. Actual cost follows the usage object returned by the API.

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same video endpoint for all modes. Only the model parameter differs.

Prompt*

Output is 720p with audio. Duration resets to Auto; drag the slider to send a fixed 3-10s duration.

131 (suggested: 2,000)

Aspect Ratio

Choose landscape, portrait, or Auto to let the provider select the output ratio.

Duration

Auto lets the provider decide the output duration (estimated as 10s). Choose 3-10s to send a fixed duration.

Click Generate to see preview

History

Max 20 items

0 running · 0 completed

Your generation history will appear here

Gemini Omni Flash API on EvoLink

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

Use Gemini Omni Flash on EvoLink for text-to-video, image-to-video, reference-to-video, and video editing through one unified video API. Public discussion often frames Gemini Omni as a video counterpart to Nano Banana because it brings multimodal video creation and conversational editing into short-form workflows. On EvoLink, the practical value is API access: EvoLink model IDs, async task workflow, callback support, token-based usage visibility, and the same API key used for Veo, Seedance, Kling, and other video models.

Gemini Omni API video workflow on EvoLink

Billing Rules

•Gemini Omni Flash is billed by token usage. The task returns a credits_reserved estimate on creation and settles from the actual usage tokens once the task completes.
•Text input: counted from the prompt tokens.
•Video input: 5,792 tokens per second of input video.
•Video output: 5,792 tokens per second of 720p video (audio included).
•The output follows the input video, so video edit does not accept duration or aspect_ratio.

Pricing

Model	Mode	Meter	Price
Text to Video	Output video	Video output tokens	$0.015/ 1K tokens(1.0115 Credits)
Text to Video	Input text / image / video	Input tokens	$0.0013/ 1K tokens(0.0867 Credits)
Text to Video	Thinking / text output	Other output tokens	$0.0077/ 1K tokens(0.5202 Credits)

Text to Video

Output video

Meter:Video output tokens

Price:

$0.015/ 1K tokens

(1.0115 Credits)

Text to Video

Input text / image / video

Meter:Input tokens

Price:

$0.0013/ 1K tokens

(0.0867 Credits)

Text to Video

Thinking / text output

Meter:Other output tokens

Price:

$0.0077/ 1K tokens

(0.5202 Credits)

If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.

EVOLINK · PRICE EST.gemini-omni-flash

Auto estimated as 10s · real-time

Figures are pre-bill estimates. Actual charges follow the upstream usage tokens returned by the model.

Your estimate

~$0.86959.106

Official· saves ~15%

~$1.02369.537

Tokens per task

video output57,920

text input0

other output1,000

Mode

Duration

Prompt

0 chars · ~0 text tokens

What can you build with Gemini Omni API?

Chat-Based Video Editing

Generate a clip with Gemini Omni, then refine it in conversation — "make the lighting warmer", "replace the red car". The workflow is designed for iterative edits while preserving the surrounding scene, subject identity, and motion as much as the selected route supports.

Try in playground

Object Replacement and Scene Rewrite

Swap an object in frame, remove an unwanted element, or rewrite a scene while preserving identity and motion. Useful for ad creative iteration and product variant rendering without external editing tools.

View workflow

Gemini Omni object replacement and scene rewrite

Reference Image Workflow

Pass a reference image and Gemini Omni anchors character identity, lighting, and color across the generated video. Combine with chat-based editing to refine specific shots without losing visual consistency.

Start using API

Audio-Capable Video Generation

Gemini Omni Flash routes can return short video outputs with audio where supported by the selected mode, reducing the need to stitch a separate TTS or sound-design pipeline into first-pass generation.

Start using API

Gemini Omni audio-capable video generation

How Gemini Omni Compares — All models on one EvoLink API key

Gemini Omni is most interesting for workflow rather than raw fidelity alone: multimodal inputs, conversational editing, and a practical EvoLink route for testing it beside Veo, Seedance, and Kling with one API key.

Chat-Native Editing Workflow

Gemini Omni is positioned around conversational video editing, while Veo 3.1 and Seedance 2.0 are usually evaluated first as generation routes. For multi-turn refinement, this is the workflow difference to test.

Long-Context Character Consistency

Gemini Omni is reported to benefit from Gemini context and world knowledge for continuity across multi-input and edit-heavy workflows. Treat this as a behavior to evaluate in your own storyboard or short-video pipeline.

No Google Cloud Project — Same Async Pattern as Veo and Seedance

No GCP setup, no Vertex billing, no separate region approval. If you already run video generation through EvoLink, adding Gemini Omni is a one-parameter change — same request shape, same task lifecycle as Veo 3.1, Seedance 2.0, and Kling.

Gemini Omni vs Veo 3.1 vs Seedance 2.0 — Side-by-side comparison

Three models commonly shortlisted for production video workflows in 2026. All three accessible through one EvoLink API key.

Feature	Gemini Omni	Veo 3.1	Seedance 2.0
EvoLink price	Token-based	From $0.50/s	From $0.092/s
Quality	720p	720p / 1080p, 4K upscaling where available	480p / 720p / 1080p
Native audio	Yes	Yes	Yes
Reference control	Text + image + chat edit	Text + image	Text + image + video + audio
Video length	3-10s / Auto	Short clips with Extend for longer scenes where supported	4–15s
Editing	Conversational editing workflow	Generation-first	V2V mode
Best for	Short-form editing and multi-input workflows	Cinematic baseline	Multimodal reference production

Gemini Omni vs Veo 3.1 →

How to Integrate Gemini Omni API

Three steps to your first Gemini Omni video task. Same integration pattern as Veo 3.1, Seedance 2.0, and Kling 3.0.

Step 1 — Get Your API Key

Step 2 — Submit Generation Task

POST to /v1/videos/generations with one of the Gemini Omni Flash model names and your prompt. Add duration for 3-10 second or Auto generation modes, image_urls for image-to-video or reference-to-video, video_urls for video edit, and callback_url for completion notification. The API processes asynchronously and returns a task id.

Step 3 — Retrieve Video Result

Use the task ID to poll the status endpoint, or wait for the callback_url webhook. When status reaches completed, you receive a download URL for the generated MP4. Links are valid for 24 hours.

Gemini Omni API Capabilities

Technical specifications for production video workflows.

Editing

Chat-Based Video Editing

Multi-turn refinement in a conversational workflow, with scene continuity depending on the selected route and input quality.

Output

720p, 3-10s / Auto Clips

720p output with configurable 3-10 second or Auto clips for generation modes. Auto is estimated as 10 seconds. Video edit accepts one MP4 input up to 10 seconds.

Modes

Text-to-Video and Image-to-Video

T2V from prompts and I2V with reference image input. Chat editing applies to outputs of either mode.

Audio

Audio-Capable Video Output

Short video outputs can include audio where supported by the selected Gemini Omni Flash route.

Consistency

Long-Context Character Consistency

Designed for stronger continuity across multi-input and edit-heavy workflows; validate consistency on your own production prompts.

Workflow

Async API with Task ID and Callback

Submit a task, receive an ID, poll status or configure a callback_url. Same lifecycle as other EvoLink video models.

Cost Example — Gemini Omni pricing estimates

100 × 3-10s/Auto clips for social media batch

Use current Pricing tab rates

1,000 × 3-10s/Auto clips/month at production scale

Use current Pricing tab rates

1 generation + 3 edits multi-turn workflow

Use current Pricing tab rates

Use the Pricing tab above for current token-based rates. Select the workflow by changing the model parameter.

Explore more video generation models on EvoLink →

Gemini Omni API Frequently Asked Questions

Everything you need to know about the product and billing.

Gemini Omni is Google's multimodal video model family announced at Google I/O 2026, with Omni Flash discussed as a short-form video route for text, image, video, and audio inputs. Compared with Veo 3.1, Gemini Omni is more interesting for conversational editing and multi-input workflows, while Veo remains a strong cinematic generation baseline.

Billing follows the usage tokens returned by the API, with separate token meters for input, video output, and other output. Check the Pricing table above for current rates.

No. EvoLink provides access via one API key. No Google Cloud project, no Vertex billing, no separate region approval. Same authentication as Veo 3.1 and Seedance 2.0 on EvoLink.

Four modes are available: gemini-omni-flash-text-to-video, gemini-omni-flash-image-to-video, gemini-omni-flash-reference-to-video, and gemini-omni-flash-video-edit. All share the same async video API endpoint.

Yes. Pass a callback_url (HTTPS) when submitting the task and EvoLink can POST task updates to your endpoint when the task reaches a terminal state. Polling the task status endpoint also works if you do not provide a callback URL.

Failed tasks return a failed status with an error reason. For application-level retry, inspect the error, keep the original parameters for debugging, and resubmit only when the input or transient failure mode is clear.

Yes — this is one of Gemini Omni's main workflow differences. Use a natural-language edit instruction and validate how well the selected route preserves the surrounding scene, subject identity, and motion across iterations.

Generation modes support configurable 3-10 second or Auto clips. Auto is estimated as 10 seconds for reservation. Video edit accepts one MP4 input up to 10 seconds. For longer narratives, chain multiple clips using long-context character consistency.

Yes. Pass a reference image URL and Gemini Omni uses it as an identity anchor for the generated video.

Seedance 2.0 has strong benchmark and multimodal reference signals, while Veo 3.1 remains a strong cinematic generation baseline with advanced Flow and extension workflows. Gemini Omni is different because developers are evaluating it for conversational editing, multi-input generation, and short-form iteration.

Yes. EvoLink exposes Gemini Omni, Veo 3.1, Nano Banana 2, and the rest of the Gemini family through a single API key. Switch by changing the model parameter.

All Gemini Video API Models

EvoLink provides unified access to Google's video and media model family through a single API key. All models share the same EvoLink API endpoint. Switch models with one parameter.

Explore Gemini family View Veo 3.1 View Nano Banana 2

API Reference

Select endpoint

Endpoints

Authentication

All APIs require Bearer Token authentication.

Header

Authorization: 
Bearer YOUR_API_KEY

Get API Key

POST

/v1/videos/generations

Create Gemini Omni Flash Video Task

Text to Video uses the unified EvoLink video generation endpoint. Select the mode by changing the model parameter.

Asynchronous processing returns a task ID. Use it to , or provide callback_url for completion notifications.

Generated outputs should be stored in your own system when result URLs are time-limited.

Request Parameters

modelstringRequiredDefault: gemini-omni-flash-text-to-video

Gemini Omni Flash model name. Fixed to gemini-omni-flash-text-to-video for text-to-video generation.

Examplegemini-omni-flash-text-to-video

promptstringRequired

Natural-language instruction describing the requested video.

ExampleCreate a cinematic product video with smooth camera motion and natural audio ambience

aspect_ratiostringOptionalDefault: 16:9

Output aspect ratio. Use auto to let the provider choose.

Value	Description
16:9	Landscape video
9:16	Portrait video
auto	Let the provider choose the output ratio

Example16:9

durationinteger or stringOptionalDefault: 10 if omitted

Output video duration in seconds. The Playground sends auto by default.

Value	Description
3-10	Any integer from 3 to 10 seconds. If omitted, the API default is 10 seconds.
auto	Let the provider decide the output duration. Playground sends auto by default and estimates it as 10 seconds.

Notes

Use auto to let the model decide the duration; reservations estimate auto as 10 seconds
Affects the estimated reservation; completed tasks are billed from API usage tokens

Exampleauto

callback_urlstringOptional

Optional HTTPS callback address after task completion.

Notes

Use polling if no callback_url is provided
Store outputs promptly when result URLs are time-limited

Examplehttps://your-domain.com/webhooks/video-task-completed

Request Example

{
  "model": "gemini-omni-flash-text-to-video",
  "prompt": "Create a cinematic product video with smooth camera motion and natural audio ambience",
  "aspect_ratio": "16:9",
  "duration": "auto",
  "callback_url": "https://your-domain.com/webhooks/video-task-completed"
}

Response Example

{
  "id": "task-video-xxxxxxxx",
  "model": "gemini-omni-flash-text-to-video",
  "object": "video.generation.task",
  "status": "processing",
  "progress": 0,
  "task_info": {
    "estimated_time": 60,
    "can_cancel": false,
    "video_duration": 10
  },
  "usage": {
    "credits_reserved": 59.1089,
    "billing_rule": "per_token"
  },
  "type": "video",
  "created": 1782940800
}

Billing Rules

Gemini Omni Flash is billed by token usage. The task returns a credits_reserved estimate on creation and settles from the actual usage tokens once the task completes. Token counts per material:

Text input — counted from the prompt tokens.
Video output — 5,792 tokens per second of 720p video (audio included).
Duration only affects the reservation estimate; Auto is estimated as 10 seconds.