Kling 3.0 API
Kling 3.0 video model with text-to-video, image-to-video, and custom element creation. Supports 3-15 second videos with per-second billing.
Upload first frame or end frame image to generate video
No sample available
First frame image. At least one of first/end frame is required.
Click to upload or drag and drop
Supported formats: JPG, JPEG, PNG
Maximum file size: 10MB; Maximum files: 1
End-frame image (optional)
Click to upload or drag and drop
Supported formats: JPG, JPEG, PNG
Maximum file size: 10MB; Maximum files: 1
Click Generate to see preview
Historique
Max 20 éléments0 en cours · 0 terminé
Billing Rules
- •Price shown is per second
- •Duration range: 3-15 seconds
- •Total = price/second × duration
Pricing
| Model | Mode | Quality | Sound | Price |
|---|---|---|---|---|
| Kling 3.0 Image to Video | Video Generation | 720p | Off | Popular $0.0750/ second(5.4 Credits) |
| Kling 3.0 Image to Video | Video Generation | 720p | On | $0.1125/ second(8.1 Credits) |
| Kling 3.0 Image to Video | Video Generation | 1080p | Off | $0.1000/ second(7.1982 Credits) |
| Kling 3.0 Image to Video | Video Generation | 1080p | On | $0.1500/ second(10.8 Credits) |
If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.
Kling 3.0 API for video creation
Build with the Kling 3.0 model. Generate videos from text or images through one unified API with 3-15 second output support.

What can you build with the Kling 3.0 API?
Text-to-video creation
Generate videos directly from text prompts with Kling 3.0. Describe scenes, actions, and styles in natural language and let the model produce 3-15 second clips ready for marketing, social media, or creative projects.

Image-driven video generation
Use images to guide video generation. Kling 3.0 supports image-to-video mode, giving teams precise control over visual style, character consistency, and scene composition.

Multi-shot and sound effects
Create complex multi-shot videos with scene transitions and add AI-generated sound effects. Kling 3.0 supports customizable shot sequences and audio generation for professional-quality output.

Why teams choose Kling 3.0
Kling 3.0 provides text-to-video and image-to-video modes in a single model family with competitive pricing.
Two specialized modes
Text and image modes cover the core video creation workflow.
3.0 architecture
Built on Kling's 3.0 foundation for quality video generation.
Flexible 3-15s output
Generate videos from 3 to 15 seconds with per-second billing.
How to integrate the Kling 3.0 API
From input to production-ready video in three steps.
Choose your mode
Select text-to-video or image-to-video based on your workflow needs.
Submit a generation task
Send your request with prompts or images. Track the async task until results are ready.
Review and iterate
Download results, compare variations, and reuse the same structure for fast iteration across campaigns.
Core capabilities of the Kling 3.0 API
Video AI with two specialized modes
Text-to-video generation
Generate videos purely from text descriptions. Kling 3.0 interprets natural language prompts to produce dynamic video content without requiring any visual input.
Image-to-video transformation
Transform static images into dynamic videos. Provide reference images and let Kling 3.0 animate them with natural motion and scene dynamics.
Multi-shot support
Create complex multi-shot videos with customizable scene transitions, per-shot prompts, and duration control for professional video production.
Sound effects
Add AI-generated sound effects to your videos. Toggle sound on or off based on your needs, with transparent pricing for audio generation.
Per-second billing
Pay only for what you generate with per-second billing. Videos range from 3 to 15 seconds, giving teams precise cost control for every project.
720p & 1080p quality
Choose between standard 720p and high-quality 1080p output resolution to balance quality and cost for your specific use case.
Frequently Asked Questions
Everything you need to know about the product and billing.
API Reference
Select endpoint
Authentication
All APIs require Bearer Token authentication.
Authorization:
Bearer YOUR_API_KEY/v1/videos/generationsCreate Video
Kling 3.0 Image to Video (kling-v3-image-to-video) transforms static images into dynamic videos using the 3.0 model. Supports first frame, end frame, subject control, multi-shot, and sound effects.
Asynchronous processing mode, use the returned task ID to query status.
Generated video links are valid for 24 hours, please save them promptly.
Important Notes
- At least one of image_start (first frame) or image_end (end frame) is required.
- Image requirements: JPG/JPEG/PNG, ≤ 10MB, width/height ≥ 300px, aspect ratio 1:2.5 ~ 2.5:1.
- Video duration: 3-15 seconds, billed per second.
- Pricing varies by quality and sound: 720p+off = 1.0x, 720p+on = 1.5x, 1080p+off = 1.333x, 1080p+on = 2.0x.
Request Parameters
modelstringRequiredDefault: kling-v3-image-to-videoVideo generation model name.
kling-v3-image-to-videopromptstringOptionalText prompt describing what kind of motion and video to generate.
Notes
- Max 2500 characters
- Optional for image-to-video
A gentle breeze moves through the scene, creating subtle motion and life.image_startstringOptionalFirst-frame image URL. At least one of image_start or image_end must be provided.
Notes
- JPG/JPEG/PNG format
- Max size: 10MB
- Width/height ≥ 300px, aspect ratio 1:2.5 ~ 2.5:1
https://example.com/first-frame.jpgimage_endstringOptionalEnd-frame image URL. At least one of image_start or image_end must be provided.
Notes
- Optional
- Same format requirements as image_start
https://example.com/end-frame.jpgdurationintegerOptionalDefault: 5Specifies the generated video duration in seconds.
Notes
- Range: 3-15 seconds (integer)
- Base price: 5.4 credits per second
- Minimum billing: 3 seconds
5aspect_ratiostringOptionalVideo aspect ratio. When a first-frame image is provided, this can be omitted (auto-adapts to image ratio).
| Value | Description |
|---|---|
| 16:9 | Landscape video |
| 9:16 | Portrait video |
| 1:1 | Square video |
16:9qualitystringOptionalDefault: 720pVideo resolution quality. Affects billing multiplier.
| Value | Description |
|---|---|
| 720p | Standard 720P (1.0x base) |
| 1080p | High quality 1080P (1.333x base) |
720psoundstringOptionalDefault: offSound effect control. Affects billing multiplier.
| Value | Description |
|---|---|
| off | No sound effects (1.0x) |
| on | Generate sound effects (1.5x) |
Notes
- Combined multiplier: 720p+off=1.0x, 720p+on=1.5x, 1080p+off=1.333x, 1080p+on=2.0x
offcallback_urlstringOptionalHTTPS callback address after task completion.
Notes
- Triggered on completion, failure, or cancellation
- HTTPS only, no internal IPs
- Max length: 2048 chars
- Timeout: 10s, Max 3 retries
https://your-domain.com/webhooks/video-task-completedmodel_params.multi_shotbooleanOptionalDefault: falseEnable multi-shot mode for generating videos with multiple camera angles or scenes.
Notes
- When enabled, shot_type and multi_prompt become relevant
truemodel_params.shot_typestringOptionalShot type for multi-shot mode. Required when multi_shot is true.
| Value | Description |
|---|---|
| customize | Custom per-shot prompts and durations |
| intelligence | AI auto-plans shots based on prompt |
Notes
- Only effective when multi_shot=true
customizemodel_params.multi_promptarrayOptionalPer-shot prompt array. Required when multi_shot=true and shot_type=customize. Each item defines a shot segment.
Notes
- Format: [{index: number, prompt: string, duration: string}, ...]
- Max 6 shots
- Total duration of all shots should match the requested duration
- When used, top-level prompt can be empty
[{"index": 1, "prompt": "Scene one", "duration": "5"}, {"index": 2, "prompt": "Scene two", "duration": "5"}]model_params.element_listarrayOptionalSubject library list for referencing pre-trained subjects in the video.
Notes
- Format: [{element_id: long}, ...]
- Max 3 subjects
- Reference subjects in prompt using <<<element_N>>> placeholder
[{"element_id": 123456}]negative_promptstringOptionalNegative prompt describing what you don't want in the video.
Notes
- Max 2500 characters
- Optional
blurry, watermark, text, low qualitymodel_params.watermark_infoobjectOptionalWatermark configuration for the generated video.
Notes
- Format: {enabled: boolean}
{"enabled": false}