
Wan 2.7 API Guide: Text-to-Video, Image-to-Video, Reference Video & Video Edit — Complete Integration Handbook

TL;DR
- Wan 2.7 is four models in one endpoint. Text-to-video, image-to-video (with first/last frame control), multi-character reference video (with voice cloning), and instruction-based video editing — all through
POST /v1/videos/generations. - Pricing on EvoLink: $0.086/sec at 720p, $0.144/sec at 1080p. A 10-second 720p clip costs $0.86. No subscriptions.
- Model IDs:
wan2.7-text-to-video,wan2.7-image-to-video,wan2.7-reference-video,wan2.7-video-edit. - Async workflow. Every request returns a task ID immediately. Poll
GET /v1/tasks/\{task_id\}for status. Video URLs expire in 24 hours. - What Wan 2.7 adds over Wan 2.6 on EvoLink: Video editing through the Wan 2.7 route, first-and-last-frame control in I2V, and multi-character reference video with voice cloning.
- Failed tasks are not billed for reference-video and video-edit modes.
Table of contents
- Quick start: your first Wan 2.7 video in 60 seconds
- Choose the right model ID
- Mode 1: Text-to-video
- Mode 2: Image-to-video with frame control
- Mode 3: Reference video with voice cloning
- Mode 4: Video editing
- Pricing and cost math
- Async workflow and task management
- Error handling and common status codes
- Production patterns and guardrails
- Migration from Wan 2.6 to Wan 2.7
- Parameter reference cheat sheet
- FAQ
1. Quick start: your first Wan 2.7 video in 60 seconds
Step 1: Generate a video
curl -X POST https://api.evolink.ai/v1/videos/generations \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan2.7-text-to-video",
"prompt": "A drone shot over a misty mountain lake at sunrise, slow camera push forward, cinematic color grading",
"quality": "720p",
"aspect_ratio": "16:9",
"duration": 5
}'{
"id": "task-unified-1757169743-7cvnl5zw",
"status": "pending",
"created": 1757169743
}Step 2: Poll for the result
curl https://api.evolink.ai/v1/tasks/task-unified-1757169743-7cvnl5zw \
-H "Authorization: Bearer YOUR_API_KEY"status is "completed", the response includes a results array with the video URL. Download it within 24 hours — the link expires.Step 3: That's it
model parameter to switch between the four modes below.2. Choose the right model ID
| Model ID | Mode | Best for | Duration |
|---|---|---|---|
wan2.7-text-to-video | Text → Video | Ad creatives, social clips, script-first generation | 2-15 sec |
wan2.7-image-to-video | Image → Video | Product animations, storyboard-to-video, first/last frame control | 2-15 sec |
wan2.7-reference-video | Reference → Video | Brand spokesperson, multi-character series, voice cloning | 2-15 sec (image-only refs), 2-10 sec (with video refs) |
wan2.7-video-edit | Video → Edited Video | Style transfer, background swap, clothing change, colorization | 2-10 sec |
POST /v1/videos/generations. The model parameter is the only thing that changes.3. Mode 1: Text-to-video
What it does
Generates a video from a text prompt. Supports optional driving audio for lip-sync or music-synced output. Auto-generates background music when no audio is provided.
Key parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
model | Yes | — | wan2.7-text-to-video |
prompt | Yes | — | Scene description, up to 5000 characters |
negative_prompt | No | — | What to exclude, up to 500 characters |
audio_urls | No | — | Array with 1 driving audio URL (wav/mp3, 2-30 sec, max 15MB) |
quality | No | 720p | 720p or 1080p |
aspect_ratio | No | 16:9 | 16:9, 9:16, 1:1, 4:3, 3:4 |
duration | No | 5 | 2-15 seconds (integer) |
seed | No | random | 1-2147483647 for reproducible output |
prompt_extend | No | false | LLM-powered prompt rewriting (set true for short prompts) |
callback_url | No | — | HTTPS URL for task completion webhook |
Multi-shot narrative
Control shot structure directly in the prompt:
{
"model": "wan2.7-text-to-video",
"prompt": "A tense detective story. Shot 1 [0-3s] wide angle: rainy night street, neon lights. Shot 2 [3-6s] medium: detective enters old building. Shot 3 [6-9s] close-up: detective's determined eyes. Shot 4 [9-12s] medium: cautious advance through dim corridor. Shot 5 [12-15s] close-up: discovers key clue.",
"aspect_ratio": "16:9",
"duration": 15
}With driving audio
{
"model": "wan2.7-text-to-video",
"prompt": "A cartoon general in golden armor on a horse, reciting a classical poem",
"audio_urls": ["https://your-cdn.com/recital.mp3"],
"duration": 10
}duration, only the first N seconds are used. If shorter, the remaining video portion is silent.4. Mode 2: Image-to-video with frame control
What it does
Three generation modes
generation_mode | Inputs | Use case |
|---|---|---|
first_frame | image_start (+ optional audio_urls) | Animate a product photo or character illustration |
first_last_frame | image_start + image_end (+ optional audio_urls) | Define start and end states, model fills the motion |
video_continuation | video_urls[0] (+ optional image_end) | Extend an existing clip, optionally specify the ending frame |
generation_mode is omitted, the server infers it from the provided media.Valid input combinations
image_startonlyimage_start+audio_urlsimage_start+image_endimage_start+image_end+audio_urlsvideo_urls(continuation)video_urls+image_end(continuation with ending frame)
Any other combination will be rejected.
Example: First-and-last-frame
curl -X POST https://api.evolink.ai/v1/videos/generations \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan2.7-image-to-video",
"generation_mode": "first_last_frame",
"prompt": "A product bottle rotating 360 degrees with soft studio lighting",
"image_start": "https://your-cdn.com/bottle-front.jpg",
"image_end": "https://your-cdn.com/bottle-back.jpg",
"quality": "1080p",
"duration": 5
}'Example: Video continuation
{
"model": "wan2.7-image-to-video",
"generation_mode": "video_continuation",
"prompt": "The scene continues with the character walking toward the sunset",
"video_urls": ["https://your-cdn.com/previous-clip.mp4"],
"image_end": "https://your-cdn.com/sunset-ending.jpg",
"duration": 5
}5. Mode 3: Reference video with voice cloning
What it does
Generates new video scenes while preserving the appearance of characters from reference images or videos — and optionally cloning their voice from a short audio sample. This is how you build multi-character video series where each person looks and sounds consistent across episodes.
Key constraints
image_urls+video_urlscombined: max 5 items totalimage_startand voice audio do not count toward this 5-item limit- Duration: 2-15 sec (image-only references), 2-10 sec (when video references are included)
- Billing: input video duration + output video duration. Failed tasks are free.
Character indexing in prompts
Reference characters by their position in the input arrays:
- English:
Image 1,Image 2,Video 1,Video 2 - Chinese:
图1,图2,视频1,视频2
Image 1 and Video 1 can coexist.Voice cloning: two methods
voice_bindings (recommended)Precise key-value mapping between character references and voice audio:
{
"model": "wan2.7-reference-video",
"prompt": "Image 1 holds Image 2 and says: 'What lovely sunshine today'",
"image_urls": [
"https://your-cdn.com/girl.jpg",
"https://your-cdn.com/toy.png"
],
"model_params": {
"voice_bindings": {
"image1": "https://your-cdn.com/girl-voice.mp3"
}
},
"duration": 10
}audio_urls (legacy positional)image_urls / video_urls. Works but less explicit. Use voice_bindings for new integrations.Example: Multi-character brand video
{
"model": "wan2.7-reference-video",
"prompt": "Image 1 and Image 2 are having a conversation in a modern office. Image 1 explains the product while Image 2 takes notes. The scene is professional and well-lit.",
"image_urls": [
"https://your-cdn.com/spokesperson-a.jpg",
"https://your-cdn.com/spokesperson-b.jpg"
],
"image_start": "https://your-cdn.com/office-wide-shot.jpg",
"model_params": {
"voice_bindings": {
"image1": "https://your-cdn.com/voice-a.mp3",
"image2": "https://your-cdn.com/voice-b.mp3"
}
},
"quality": "1080p",
"duration": 10
}Multi-grid storyboard
For single-image references with multiple panels (e.g., a 3×3 grid of character poses):
{
"model": "wan2.7-reference-video",
"prompt": "Reference image. 3D cartoon style. 1. Wide shot of fantasy forest. 2. Boy parts the vines. 3. Robot scans ahead. 4. Close-up of map. 5. Boy's excited face. 6. They leap over roots.",
"image_urls": ["https://your-cdn.com/storyboard-grid.png"],
"duration": 15
}6. Mode 4: Video editing
What it does
Key parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
model | Yes | — | wan2.7-video-edit |
prompt | Yes | — | Natural language edit instruction |
video_urls | Yes | — | Array with exactly 1 source video (mp4/mov, 2-10 sec) |
image_urls | No | — | Up to 4 reference images for style/content guidance |
keep_original_sound | No | false | true preserves original audio; false lets model handle audio |
duration | No | 0 | 0 = keep original length; explicit values: 2-10 sec |
quality | No | 720p | 720p or 1080p |
Example: Instruction-only style change
curl -X POST https://api.evolink.ai/v1/videos/generations \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "wan2.7-video-edit",
"prompt": "Convert the entire scene to a vintage film look with warm color grading and film grain",
"video_urls": ["https://your-cdn.com/source-clip.mp4"],
"keep_original_sound": true,
"duration": 0
}'Example: Reference-guided clothing replacement
{
"model": "wan2.7-video-edit",
"prompt": "Replace the girl's outfit with the clothes from the reference image",
"video_urls": ["https://your-cdn.com/source.mp4"],
"image_urls": ["https://your-cdn.com/target-outfit.png"]
}What you can edit
- Style transfer: "convert to anime style", "apply watercolor painting effect"
- Background swap: "change background to a rain-soaked Tokyo street at night"
- Object/clothing change: "swap the jacket to red", "replace the hat with a crown"
- Colorization: "convert this black-and-white footage to color"
- Lighting: "shift lighting to golden hour"
7. Pricing and cost math
EvoLink Wan 2.7 pricing
| Quality | Cost per second | 10-second clip |
|---|---|---|
| 720p | $0.086 | $0.86 |
| 1080p | $0.144 (1.67× of 720p) | $1.44 |
No subscriptions, no minimum commitments. You pay only for successfully generated video.
Cost comparison with other providers (as listed on provider pages, May 2026)
| Provider | Per-second rate | 10-sec 720p cost |
|---|---|---|
| EvoLink | $0.086/sec | $0.86 |
| Together AI | $0.10/sec | $1.00 |
| Segmind (720p clip) | ~$0.063/sec (based on $0.625/10sec) | $0.625 |
| Segmind (1080p clip) | ~$0.094/sec (based on $0.9375/10sec) | $0.9375 |
Special billing for reference-video and video-edit
Budget estimation formula
Monthly cost = (avg_duration × cost_per_second × daily_volume × 30)
8. Async workflow and task management
Every Wan 2.7 request follows the same async pattern:
POST /v1/videos/generations → returns task id + status "pending"
GET /v1/tasks/{task_id} → poll until status is "completed" or "failed"
Download video URL from results array within 24 hours
Task lifecycle
| Status | Meaning |
|---|---|
pending | Task accepted, waiting in queue |
processing | Task is actively generating |
completed | Video is ready, URL available in results array |
failed | Generation failed (check error message) |
Callback URL (webhook)
callback_url in your request. EvoLink will POST to this URL when the task completes, fails, or is cancelled. The callback fires after billing confirmation.Production best practices
- Persist the task ID immediately after submission. If your service crashes, you can recover.
- Use exponential backoff when polling. Start at 5 seconds, cap at 30 seconds.
- Download and archive results immediately. Video URLs expire in 24 hours.
- Make submissions idempotent. Hash request payloads and deduplicate to prevent double-billing from retry storms.
9. Error handling and common status codes
| HTTP Code | Error Code | Meaning | Action |
|---|---|---|---|
| 400 | invalid_request | Bad parameters | Check model ID, prompt length, duration range, media URLs |
| 401 | unauthorized | Invalid or expired token | Refresh your API key |
| 402 | insufficient_quota | Not enough credits | Top up your account |
| 403 | model_access_denied | Token lacks model access | Check API key permissions |
| 429 | rate_limit_exceeded | Too many requests | Back off and retry with exponential delay |
| 500 | internal_error | Server error | Retry after 30 seconds; if persistent, contact support |
Common mistakes
- Using the wrong model ID spelling. It's
wan2.7-text-to-video, notwan-2.7-text-to-videoorwan27-t2v. A stale model ID returns a clean 404 with no helpful error. - Sending invalid media combinations in I2V mode. Check the valid input combinations table.
- Not downloading results in time. Video URLs expire in 24 hours. Build automatic download into your pipeline.
10. Production patterns and guardrails
Budget guardrails
1. Cap maximum duration server-side (e.g., 10 seconds for social content)
2. Default to 720p unless the use case specifically requires 1080p
3. Track spend by user, feature, and model ID
4. Separate reference-video budgeting (input+output billing) from T2V/I2V
5. Set per-user daily limits before scaling traffic
Reliability patterns
- Retry with idempotency key. Hash your request payload and check for existing tasks before resubmitting.
- Timeout handling. If a task hasn't completed after 5 minutes, mark it for manual review rather than resubmitting blindly.
- Fallback strategy. Consider falling back to Wan 2.6 or Wan 2.5 if Wan 2.7 returns persistent errors on a specific mode.
- Asset validation. Validate image dimensions, video duration, and audio format before submission. Bad assets cause failures that look like model quality issues.
Queue architecture
For production systems generating more than 100 videos/day:
User request → validation → job queue → Wan 2.7 API → result handler → CDN archive → notify user
Never call the API directly from user-facing request handlers. Always go through a background job system.
11. Migration from Wan 2.6 to Wan 2.7
What stays the same
- API endpoint:
POST /v1/videos/generations - Authentication: same API key and Bearer token
- Async pattern: same task ID / polling / callback flow
- EvoLink billing: same account and credit system
What changes
wan2.7-t2v-2026-04-25).| Aspect | Wan 2.6 | Wan 2.7 |
|---|---|---|
| Model IDs | wan2.6-text-to-video, wan2.6-image-to-video, wan2.6-reference-video | wan2.7-text-to-video, wan2.7-image-to-video, wan2.7-reference-video, wan2.7-video-edit |
| I2V frame control | First frame only (image_start) | First AND last frame (image_start + image_end) |
| I2V generation modes | Implicit | Explicit generation_mode parameter (first_frame, first_last_frame, video_continuation) |
| Reference video | Single reference, no voice | Up to 5 refs, voice cloning via voice_bindings |
| Video editing | Not available | New: wan2.7-video-edit |
| Multi-shot T2V | Supported | Supported (same prompt syntax) |
Step-by-step migration
- Change model parameter. Replace
wan2.6-text-to-videowithwan2.7-text-to-video(same for other modes). - Test with existing prompts. Wan 2.7 handles the same prompt format. No rewriting needed.
- Adopt new features gradually. Add
generation_mode,image_end,voice_bindings, or video-edit as your workflow requires. - Keep Wan 2.6 as fallback. Both versions run in parallel on EvoLink. You don't have to migrate everything at once.
12. Parameter reference cheat sheet
Shared parameters (all modes)
| Parameter | Type | Description |
|---|---|---|
model | string | Required. One of the four model IDs |
prompt | string | Required. Up to 5000 characters |
quality | string | 720p (default) or 1080p |
callback_url | string | HTTPS webhook for task completion |
Text-to-video specific
| Parameter | Type | Description |
|---|---|---|
negative_prompt | string | Up to 500 characters |
audio_urls | array | 1 driving audio (wav/mp3, 2-30 sec, max 15MB) |
aspect_ratio | string | 16:9, 9:16, 1:1, 4:3, 3:4 |
duration | number | 2-15 seconds |
seed | integer | 1-2147483647 |
prompt_extend | boolean | LLM prompt rewriting (default false) |
Image-to-video specific
| Parameter | Type | Description |
|---|---|---|
generation_mode | string | first_frame, first_last_frame, video_continuation |
image_start | string | First frame image URL |
image_end | string | Last frame image URL |
video_urls | array | Source video for continuation |
audio_urls | array | Driving audio (not for video_continuation) |
duration | number | 2-15 seconds |
Reference video specific
| Parameter | Type | Description |
|---|---|---|
image_urls | array | Reference images (counted toward 5-item limit) |
video_urls | array | Reference videos (counted toward 5-item limit) |
image_start | string | Starting frame (not counted toward limit) |
model_params.voice_bindings | object | Map of reference key to voice audio URL |
audio_urls | array | Legacy voice binding (positional) |
duration | number | 2-15 sec (image-only) or 2-10 sec (with video refs) |
Video edit specific
| Parameter | Type | Description |
|---|---|---|
video_urls | array | Exactly 1 source video |
image_urls | array | Up to 4 reference images |
keep_original_sound | boolean | true preserves original audio |
duration | number | 0 = original length; explicit: 2-10 sec |
13. FAQ
How much does Wan 2.7 cost on EvoLink?
$0.086/sec at 720p, $0.144/sec at 1080p. A 10-second 720p clip costs $0.86. No subscriptions or minimum commitments.
What is the difference between Wan 2.7 and Wan 2.6?
On EvoLink, Wan 2.7 exposes video editing, multi-character reference video with voice cloning, and first-and-last-frame control in I2V mode. Wan 2.6 remains useful for cinematic storytelling and Flash variants for faster iteration. Both run in parallel on EvoLink.
Does Wan 2.7 generate audio automatically?
audio_urls, the model auto-generates background music or sound effects matching the visual content.Are failed tasks billed?
For reference-video and video-edit modes, failed tasks are explicitly not billed. For text-to-video and image-to-video, billing is based on actual generated video duration.
Can I use Wan 2.7 for NSFW content?
invalid_content error.What audio formats are supported for voice cloning?
wav and mp3. Duration should be 1-10 seconds for voice cloning, 2-30 seconds for driving audio. Maximum file size is 15MB.How do I handle video URL expiration?
Video URLs expire after 24 hours. Build an automatic download-and-archive step into your pipeline immediately after task completion. Store the final asset in your own CDN or object storage.
Can I migrate from Wan 2.6 without downtime?
model parameter from wan2.6-* to wan2.7-*. The endpoint, authentication, and async pattern are identical. Both versions run in parallel, so you can migrate route by route.Next steps
- Try the playground: Wan 2.7 model page
- Compare Wan models: Wan API family collection
- Full pricing breakdown: Wan API pricing guide
- Wan 2.6 production patterns: Wan 2.6 API guide
- Wan 2.5 review: Wan 2.5 API review


