Veo 3.1 API

Leverage Google DeepMind's Veo 3.1 model. Create 1080p videos with dialogue and SFX. Choose Fast for speed or Pro for maximum quality.

Estimated Cost
60 Credits
Sample Result

No sample available

0 (suggested: 2,000)

Upload up to 3 images

Click to upload or drag and drop

Supported formats: JPG, JPEG, PNG, WEBP
Maximum file size: 10MB; Maximum files: 3

Only 16:9 is supported when using reference images or 3 images.

Click Generate to see preview

History

Max 20 items

0 running · 0 completed

Your generation history will appear here

Veo 3.1 API — Production-ready video with synced audio

Integrate Google's latest generative video model. Produce 4–8s clips with perfectly aligned speech and ambient sound. Supports vertical formats, reference images, and rapid prompt iteration.

example 1

What can you build with Veo 3.1 API?

Instant social media content

Automate the creation of 9:16 Shorts and Reels. The Veo 3.1 API delivers high-quality renders, making it perfect for content engines.

example 2

Precision control with references

Maintain character and style consistency. Pass reference images or start/end frames via the API to guide the video generation process accurately.

example 3

Synchronized audio soundscapes

Generate video and audio in a single pass. The model creates dialogue, Foley, and soundtracks that match the visual action frame-by-frame.

example 4

Why developers choose Veo 3.1 API

Veo 3.1 offers two variants: Fast for speed and cost efficiency, Pro for maximum visual fidelity. Both include native audio generation.

Two variants for different needs

Fast variant for rapid iteration and cost efficiency. Pro variant for maximum quality and complex scenes.

Cost-effective scaling

Lower compute costs per second make it feasible to run thousands of iterations for A/B testing ads or personalizing user content.

Production-ready outputs

Delivers 720p for drafts and 1080p for final export, with built-in watermarking to ensure safety and compliance.

How to integrate Veo 3.1

A simple API workflow to generate video with audio from text or images.

1

Step 1 — Choose variant & configure

Select Fast or Pro variant. Set your desired duration (4s, 6s, 8s), aspect ratio, and resolution (720p/1080p).

2

Step 2 — Send prompt & references

Submit your text prompt along with optional reference images for style control or specific start/end frames for transitions.

3

Step 3 — Retrieve video + audio

Receive the MP4 output with fully embedded, synchronized audio ready for immediate playback or publishing.

Key Capabilities

Advanced features available via the Veo 3.1 API endpoint

Audio

Native Audio Generation

Creates speech, music, and sound effects that are temporally aligned with video actions.

Flexibility

Fast & Pro Variants

Choose Fast for speed and cost efficiency, or Pro for maximum visual quality.

Control

Visual Control

Use image-to-video or start/end frame inputs to dictate flow and composition.

Resolution

Flexible Resolutions

Switch between 720p for speed and 1080p for quality without changing the model.

Physics

Physics Simulation

Updated world model handles fluid dynamics, lighting, and collisions with high realism.

Trust

SynthID Watermarking

Imperceptible watermarking embedded by default for responsible AI content usage.

Veo 3.1 API Variants Comparison

Compare Fast and Pro variants

ModelDurationResolutionPriceStrength
Veo 3.1 Fast4/6/8s720p / 1080p~$0.15/sec (EvoLink)Lowest latency; Native Audio; Up to 3 reference images; Ideal for rapid iteration.
Veo 3.1 Pro4/6/8s720p / 1080pPremium pricingMaximum visual fidelity; Complex physics; First/last frame mode; Best for final assets.
Sora (Pro)10–15sUp to 1080p~$0.20/10s (Standard)Longer native duration; strong prompt adherence; competitive physics.

Frequently Asked Questions

Everything you need to know about the product and billing.

The Veo 3.1 API provides programmatic access to Google's video generation model. It offers two variants: Fast for speed and cost efficiency, Pro for maximum visual fidelity. Both support 1080p resolution and native audio.
Fast variant prioritizes speed and lower cost, ideal for rapid iteration. Pro variant offers higher visual quality and better handling of complex scenes, ideal for final production assets.
Yes. Both Fast and Pro variants generate native audio (including dialogue, ambience, and music) that matches the video content in a single generation pass.
Absolutely. The API supports 'Reference Images' to guide the visual style. Fast supports up to 3 images, Pro supports up to 2 images for first/last frame mode.
The API outputs MP4 files in 720p or 1080p resolution. You can choose between 16:9 (landscape) or 9:16 (vertical) aspect ratios, with durations of 4, 6, or 8 seconds.
Yes, Veo 3.1 is designed for commercial workflows, including advertising and social media automation. It includes SynthID watermarking to ensure transparency and compliance.
POST
/v1/videos/generations

Create Video

Veo 3.1 Fast Lite (veo3.1-fast) model supports text-to-video, first-frame image-to-video and other modes.

Asynchronous processing mode, use the returned task ID to .

Generated video links are valid for 24 hours, please save them promptly.

Request Parameters

modelstringRequiredDefault: veo3.1-fast

Video generation model name.

Exampleveo3.1-fast
promptstringRequired

Prompt describing what kind of video to generate.

Notes
  • Limited to 2000 tokens
ExampleA cat playing piano
aspect_ratiostringOptionalDefault: auto

Video aspect ratio. When set to auto: image-to-video will automatically select based on the input image ratio, text-to-video will automatically select based on the prompt content.

ValueDescription
autoAutomatic selection based on input
16:9Landscape video
9:16Portrait video
Exampleauto
image_urlsarrayOptional

Reference image URL list for image-to-video feature.

Notes
  • 1 image for first-frame video generation
  • 2 images for first-and-last-frame video generation
  • Up to 3 images for reference image to video
  • Max size: 10MB per image
  • Formats: .jpg, .jpeg, .png, .webp
  • URLs must be directly viewable by the server
Examplehttp://example.com/image1.jpg
generation_typestringOptional

Video generation mode, default matches based on image count.

ValueDescription
TEXTText to video
FIRST&LASTFirst and last frame to video (1-2 images)
REFERENCEReference image to video (up to 3 images, 16:9 only)
ExampleTEXT
enhance_promptbooleanOptionalDefault: true

Whether to automatically translate the prompt to English. When enabled, non-English prompts will be automatically translated to English for better generation results.

Exampletrue
callback_urlstringOptional

HTTPS callback address after task completion.

Notes
  • Triggered on completion, failure, or cancellation
  • Sent after billing confirmation
  • HTTPS only, no internal IPs
  • Max length: 2048 chars
  • Timeout: 10s, Max 3 retries
Examplehttps://your-domain.com/webhooks/video-task-completed

Request Example

{
  "model": "veo3.1-fast",
  "prompt": "A cat playing piano",
  "aspect_ratio": "16:9"
}

Response Example

{
  "created": 1757169743,
  "id": "task-unified-1757169743-7cvnl5zw",
  "model": "veo3.1-fast",
  "object": "video.generation.task",
  "progress": 0,
  "status": "pending",
  "task_info": {
    "can_cancel": true,
    "estimated_time": 180,
    "video_duration": 8
  },
  "type": "video",
  "usage": {
    "billing_rule": "per_call",
    "credits_reserved": 60,
    "user_group": "default"
  }
}
Veo 3.1 API: High-Quality Video Generation with Audio | EvoLink