Gemini Omni coming soonLearn more
Wan 2.7 API Guide: Text-to-Video, Image-to-Video, Reference Video & Video Edit — Complete Integration Handbook
Tutorial

Wan 2.7 API Guide: Text-to-Video, Image-to-Video, Reference Video & Video Edit — Complete Integration Handbook

EvoLink Team
EvoLink Team
Product Team
May 22, 2026
18 min read
This is the definitive Wan 2.7 API guide — covering all four video modes, every parameter that matters in production, code examples you can paste into a terminal, real cost math, error handling, and a migration path from Wan 2.6. It is written for developers and engineers who need to ship, not just experiment.
For the product overview and playground, visit the Wan 2.7 model page. For the family-level comparison, visit the Wan API family collection. For the pricing breakdown across the full Wan lineup, visit the Wan API pricing guide.

TL;DR

  • Wan 2.7 is four models in one endpoint. Text-to-video, image-to-video (with first/last frame control), multi-character reference video (with voice cloning), and instruction-based video editing — all through POST /v1/videos/generations.
  • Pricing on EvoLink: $0.086/sec at 720p, $0.144/sec at 1080p. A 10-second 720p clip costs $0.86. No subscriptions.
  • Model IDs: wan2.7-text-to-video, wan2.7-image-to-video, wan2.7-reference-video, wan2.7-video-edit.
  • Async workflow. Every request returns a task ID immediately. Poll GET /v1/tasks/\{task_id\} for status. Video URLs expire in 24 hours.
  • What Wan 2.7 adds over Wan 2.6 on EvoLink: Video editing through the Wan 2.7 route, first-and-last-frame control in I2V, and multi-character reference video with voice cloning.
  • Failed tasks are not billed for reference-video and video-edit modes.

Table of contents

  1. Quick start: your first Wan 2.7 video in 60 seconds
  2. Choose the right model ID
  3. Mode 1: Text-to-video
  4. Mode 2: Image-to-video with frame control
  5. Mode 3: Reference video with voice cloning
  6. Mode 4: Video editing
  7. Pricing and cost math
  8. Async workflow and task management
  9. Error handling and common status codes
  10. Production patterns and guardrails
  11. Migration from Wan 2.6 to Wan 2.7
  12. Parameter reference cheat sheet
  13. FAQ

1. Quick start: your first Wan 2.7 video in 60 seconds

Prerequisites: An EvoLink account and an API key from the dashboard.

Step 1: Generate a video

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-text-to-video",
    "prompt": "A drone shot over a misty mountain lake at sunrise, slow camera push forward, cinematic color grading",
    "quality": "720p",
    "aspect_ratio": "16:9",
    "duration": 5
  }'
Response:
{
  "id": "task-unified-1757169743-7cvnl5zw",
  "status": "pending",
  "created": 1757169743
}

Step 2: Poll for the result

curl https://api.evolink.ai/v1/tasks/task-unified-1757169743-7cvnl5zw \
  -H "Authorization: Bearer YOUR_API_KEY"
When status is "completed", the response includes a results array with the video URL. Download it within 24 hours — the link expires.

Step 3: That's it

You just generated a video for ~$0.43 (5 seconds × $0.086/sec). Change the model parameter to switch between the four modes below.

2. Choose the right model ID

Model IDModeBest forDuration
wan2.7-text-to-videoText → VideoAd creatives, social clips, script-first generation2-15 sec
wan2.7-image-to-videoImage → VideoProduct animations, storyboard-to-video, first/last frame control2-15 sec
wan2.7-reference-videoReference → VideoBrand spokesperson, multi-character series, voice cloning2-15 sec (image-only refs), 2-10 sec (with video refs)
wan2.7-video-editVideo → Edited VideoStyle transfer, background swap, clothing change, colorization2-10 sec
All four use the same endpoint: POST /v1/videos/generations. The model parameter is the only thing that changes.

3. Mode 1: Text-to-video

What it does

Generates a video from a text prompt. Supports optional driving audio for lip-sync or music-synced output. Auto-generates background music when no audio is provided.

Key parameters

ParameterRequiredDefaultDescription
modelYeswan2.7-text-to-video
promptYesScene description, up to 5000 characters
negative_promptNoWhat to exclude, up to 500 characters
audio_urlsNoArray with 1 driving audio URL (wav/mp3, 2-30 sec, max 15MB)
qualityNo720p720p or 1080p
aspect_ratioNo16:916:9, 9:16, 1:1, 4:3, 3:4
durationNo52-15 seconds (integer)
seedNorandom1-2147483647 for reproducible output
prompt_extendNofalseLLM-powered prompt rewriting (set true for short prompts)
callback_urlNoHTTPS URL for task completion webhook

Multi-shot narrative

Control shot structure directly in the prompt:

{
  "model": "wan2.7-text-to-video",
  "prompt": "A tense detective story. Shot 1 [0-3s] wide angle: rainy night street, neon lights. Shot 2 [3-6s] medium: detective enters old building. Shot 3 [6-9s] close-up: detective's determined eyes. Shot 4 [9-12s] medium: cautious advance through dim corridor. Shot 5 [12-15s] close-up: discovers key clue.",
  "aspect_ratio": "16:9",
  "duration": 15
}

With driving audio

{
  "model": "wan2.7-text-to-video",
  "prompt": "A cartoon general in golden armor on a horse, reciting a classical poem",
  "audio_urls": ["https://your-cdn.com/recital.mp3"],
  "duration": 10
}
Audio truncation rules: if audio is longer than duration, only the first N seconds are used. If shorter, the remaining video portion is silent.

4. Mode 2: Image-to-video with frame control

What it does

Generates video from one or two keyframe images. This is the mode that gives you first-and-last-frame control — define both endpoints and the model infers the motion trajectory in between.

Three generation modes

generation_modeInputsUse case
first_frameimage_start (+ optional audio_urls)Animate a product photo or character illustration
first_last_frameimage_start + image_end (+ optional audio_urls)Define start and end states, model fills the motion
video_continuationvideo_urls[0] (+ optional image_end)Extend an existing clip, optionally specify the ending frame
When generation_mode is omitted, the server infers it from the provided media.

Valid input combinations

  1. image_start only
  2. image_start + audio_urls
  3. image_start + image_end
  4. image_start + image_end + audio_urls
  5. video_urls (continuation)
  6. video_urls + image_end (continuation with ending frame)

Any other combination will be rejected.

Example: First-and-last-frame

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-image-to-video",
    "generation_mode": "first_last_frame",
    "prompt": "A product bottle rotating 360 degrees with soft studio lighting",
    "image_start": "https://your-cdn.com/bottle-front.jpg",
    "image_end": "https://your-cdn.com/bottle-back.jpg",
    "quality": "1080p",
    "duration": 5
  }'

Example: Video continuation

{
  "model": "wan2.7-image-to-video",
  "generation_mode": "video_continuation",
  "prompt": "The scene continues with the character walking toward the sunset",
  "video_urls": ["https://your-cdn.com/previous-clip.mp4"],
  "image_end": "https://your-cdn.com/sunset-ending.jpg",
  "duration": 5
}

5. Mode 3: Reference video with voice cloning

What it does

Generates new video scenes while preserving the appearance of characters from reference images or videos — and optionally cloning their voice from a short audio sample. This is how you build multi-character video series where each person looks and sounds consistent across episodes.

Key constraints

  • image_urls + video_urls combined: max 5 items total
  • image_start and voice audio do not count toward this 5-item limit
  • Duration: 2-15 sec (image-only references), 2-10 sec (when video references are included)
  • Billing: input video duration + output video duration. Failed tasks are free.

Character indexing in prompts

Reference characters by their position in the input arrays:

  • English: Image 1, Image 2, Video 1, Video 2
  • Chinese: 图1, 图2, 视频1, 视频2
Images and videos are counted independently — Image 1 and Video 1 can coexist.

Voice cloning: two methods

Method 1: voice_bindings (recommended)

Precise key-value mapping between character references and voice audio:

{
  "model": "wan2.7-reference-video",
  "prompt": "Image 1 holds Image 2 and says: 'What lovely sunshine today'",
  "image_urls": [
    "https://your-cdn.com/girl.jpg",
    "https://your-cdn.com/toy.png"
  ],
  "model_params": {
    "voice_bindings": {
      "image1": "https://your-cdn.com/girl-voice.mp3"
    }
  },
  "duration": 10
}
Method 2: audio_urls (legacy positional)
Audio clips aligned by position to image_urls / video_urls. Works but less explicit. Use voice_bindings for new integrations.

Example: Multi-character brand video

{
  "model": "wan2.7-reference-video",
  "prompt": "Image 1 and Image 2 are having a conversation in a modern office. Image 1 explains the product while Image 2 takes notes. The scene is professional and well-lit.",
  "image_urls": [
    "https://your-cdn.com/spokesperson-a.jpg",
    "https://your-cdn.com/spokesperson-b.jpg"
  ],
  "image_start": "https://your-cdn.com/office-wide-shot.jpg",
  "model_params": {
    "voice_bindings": {
      "image1": "https://your-cdn.com/voice-a.mp3",
      "image2": "https://your-cdn.com/voice-b.mp3"
    }
  },
  "quality": "1080p",
  "duration": 10
}

Multi-grid storyboard

For single-image references with multiple panels (e.g., a 3×3 grid of character poses):

{
  "model": "wan2.7-reference-video",
  "prompt": "Reference image. 3D cartoon style. 1. Wide shot of fantasy forest. 2. Boy parts the vines. 3. Robot scans ahead. 4. Close-up of map. 5. Boy's excited face. 6. They leap over roots.",
  "image_urls": ["https://your-cdn.com/storyboard-grid.png"],
  "duration": 15
}

6. Mode 4: Video editing

What it does

Takes an existing video and applies text-guided edits — style transfer, background replacement, clothing changes, colorization, old footage restoration — without re-generating from scratch. On EvoLink's current Wan routes, video editing is exposed through Wan 2.7.

Key parameters

ParameterRequiredDefaultDescription
modelYeswan2.7-video-edit
promptYesNatural language edit instruction
video_urlsYesArray with exactly 1 source video (mp4/mov, 2-10 sec)
image_urlsNoUp to 4 reference images for style/content guidance
keep_original_soundNofalsetrue preserves original audio; false lets model handle audio
durationNo00 = keep original length; explicit values: 2-10 sec
qualityNo720p720p or 1080p
Billing: input video duration + output video duration. Failed tasks are free.

Example: Instruction-only style change

curl -X POST https://api.evolink.ai/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "wan2.7-video-edit",
    "prompt": "Convert the entire scene to a vintage film look with warm color grading and film grain",
    "video_urls": ["https://your-cdn.com/source-clip.mp4"],
    "keep_original_sound": true,
    "duration": 0
  }'

Example: Reference-guided clothing replacement

{
  "model": "wan2.7-video-edit",
  "prompt": "Replace the girl's outfit with the clothes from the reference image",
  "video_urls": ["https://your-cdn.com/source.mp4"],
  "image_urls": ["https://your-cdn.com/target-outfit.png"]
}

What you can edit

  • Style transfer: "convert to anime style", "apply watercolor painting effect"
  • Background swap: "change background to a rain-soaked Tokyo street at night"
  • Object/clothing change: "swap the jacket to red", "replace the hat with a crown"
  • Colorization: "convert this black-and-white footage to color"
  • Lighting: "shift lighting to golden hour"

7. Pricing and cost math

QualityCost per second10-second clip
720p$0.086$0.86
1080p$0.144 (1.67× of 720p)$1.44

No subscriptions, no minimum commitments. You pay only for successfully generated video.

Cost comparison with other providers (as listed on provider pages, May 2026)

ProviderPer-second rate10-sec 720p cost
EvoLink$0.086/sec$0.86
Together AI$0.10/sec$1.00
Segmind (720p clip)~$0.063/sec (based on $0.625/10sec)$0.625
Segmind (1080p clip)~$0.094/sec (based on $0.9375/10sec)$0.9375

Special billing for reference-video and video-edit

These two modes are billed on input video duration + output video duration. If you pass a 5-second reference video and generate a 10-second output, you're billed for 15 seconds. Failed tasks are not billed.

Budget estimation formula

Monthly cost = (avg_duration × cost_per_second × daily_volume × 30)
Example: 100 clips/day × 8 seconds × $0.086/sec × 30 days = $2,064/month at 720p.

8. Async workflow and task management

Every Wan 2.7 request follows the same async pattern:

POST /v1/videos/generations → returns task id + status "pending" GET /v1/tasks/{task_id} → poll until status is "completed" or "failed" Download video URL from results array within 24 hours

Task lifecycle

StatusMeaning
pendingTask accepted, waiting in queue
processingTask is actively generating
completedVideo is ready, URL available in results array
failedGeneration failed (check error message)

Callback URL (webhook)

Instead of polling, provide a callback_url in your request. EvoLink will POST to this URL when the task completes, fails, or is cancelled. The callback fires after billing confirmation.

Production best practices

  1. Persist the task ID immediately after submission. If your service crashes, you can recover.
  2. Use exponential backoff when polling. Start at 5 seconds, cap at 30 seconds.
  3. Download and archive results immediately. Video URLs expire in 24 hours.
  4. Make submissions idempotent. Hash request payloads and deduplicate to prevent double-billing from retry storms.

9. Error handling and common status codes

HTTP CodeError CodeMeaningAction
400invalid_requestBad parametersCheck model ID, prompt length, duration range, media URLs
401unauthorizedInvalid or expired tokenRefresh your API key
402insufficient_quotaNot enough creditsTop up your account
403model_access_deniedToken lacks model accessCheck API key permissions
429rate_limit_exceededToo many requestsBack off and retry with exponential delay
500internal_errorServer errorRetry after 30 seconds; if persistent, contact support

Common mistakes

  • Using the wrong model ID spelling. It's wan2.7-text-to-video, not wan-2.7-text-to-video or wan27-t2v. A stale model ID returns a clean 404 with no helpful error.
  • Sending invalid media combinations in I2V mode. Check the valid input combinations table.
  • Not downloading results in time. Video URLs expire in 24 hours. Build automatic download into your pipeline.

10. Production patterns and guardrails

Budget guardrails

1. Cap maximum duration server-side (e.g., 10 seconds for social content) 2. Default to 720p unless the use case specifically requires 1080p 3. Track spend by user, feature, and model ID 4. Separate reference-video budgeting (input+output billing) from T2V/I2V 5. Set per-user daily limits before scaling traffic

Reliability patterns

  • Retry with idempotency key. Hash your request payload and check for existing tasks before resubmitting.
  • Timeout handling. If a task hasn't completed after 5 minutes, mark it for manual review rather than resubmitting blindly.
  • Fallback strategy. Consider falling back to Wan 2.6 or Wan 2.5 if Wan 2.7 returns persistent errors on a specific mode.
  • Asset validation. Validate image dimensions, video duration, and audio format before submission. Bad assets cause failures that look like model quality issues.

Queue architecture

For production systems generating more than 100 videos/day:

User request → validation → job queue → Wan 2.7 API → result handler → CDN archive → notify user

Never call the API directly from user-facing request handlers. Always go through a background job system.


11. Migration from Wan 2.6 to Wan 2.7

What stays the same

  • API endpoint: POST /v1/videos/generations
  • Authentication: same API key and Bearer token
  • Async pattern: same task ID / polling / callback flow
  • EvoLink billing: same account and credit system

What changes

The IDs below are EvoLink route model IDs, not raw DashScope / Alibaba Cloud model names. If you use Alibaba's API directly, model names follow a different convention (e.g., wan2.7-t2v-2026-04-25).
AspectWan 2.6Wan 2.7
Model IDswan2.6-text-to-video, wan2.6-image-to-video, wan2.6-reference-videowan2.7-text-to-video, wan2.7-image-to-video, wan2.7-reference-video, wan2.7-video-edit
I2V frame controlFirst frame only (image_start)First AND last frame (image_start + image_end)
I2V generation modesImplicitExplicit generation_mode parameter (first_frame, first_last_frame, video_continuation)
Reference videoSingle reference, no voiceUp to 5 refs, voice cloning via voice_bindings
Video editingNot availableNew: wan2.7-video-edit
Multi-shot T2VSupportedSupported (same prompt syntax)

Step-by-step migration

  1. Change model parameter. Replace wan2.6-text-to-video with wan2.7-text-to-video (same for other modes).
  2. Test with existing prompts. Wan 2.7 handles the same prompt format. No rewriting needed.
  3. Adopt new features gradually. Add generation_mode, image_end, voice_bindings, or video-edit as your workflow requires.
  4. Keep Wan 2.6 as fallback. Both versions run in parallel on EvoLink. You don't have to migrate everything at once.

12. Parameter reference cheat sheet

Shared parameters (all modes)

ParameterTypeDescription
modelstringRequired. One of the four model IDs
promptstringRequired. Up to 5000 characters
qualitystring720p (default) or 1080p
callback_urlstringHTTPS webhook for task completion

Text-to-video specific

ParameterTypeDescription
negative_promptstringUp to 500 characters
audio_urlsarray1 driving audio (wav/mp3, 2-30 sec, max 15MB)
aspect_ratiostring16:9, 9:16, 1:1, 4:3, 3:4
durationnumber2-15 seconds
seedinteger1-2147483647
prompt_extendbooleanLLM prompt rewriting (default false)

Image-to-video specific

ParameterTypeDescription
generation_modestringfirst_frame, first_last_frame, video_continuation
image_startstringFirst frame image URL
image_endstringLast frame image URL
video_urlsarraySource video for continuation
audio_urlsarrayDriving audio (not for video_continuation)
durationnumber2-15 seconds

Reference video specific

ParameterTypeDescription
image_urlsarrayReference images (counted toward 5-item limit)
video_urlsarrayReference videos (counted toward 5-item limit)
image_startstringStarting frame (not counted toward limit)
model_params.voice_bindingsobjectMap of reference key to voice audio URL
audio_urlsarrayLegacy voice binding (positional)
durationnumber2-15 sec (image-only) or 2-10 sec (with video refs)

Video edit specific

ParameterTypeDescription
video_urlsarrayExactly 1 source video
image_urlsarrayUp to 4 reference images
keep_original_soundbooleantrue preserves original audio
durationnumber0 = original length; explicit: 2-10 sec

13. FAQ

$0.086/sec at 720p, $0.144/sec at 1080p. A 10-second 720p clip costs $0.86. No subscriptions or minimum commitments.

What is the difference between Wan 2.7 and Wan 2.6?

On EvoLink, Wan 2.7 exposes video editing, multi-character reference video with voice cloning, and first-and-last-frame control in I2V mode. Wan 2.6 remains useful for cinematic storytelling and Flash variants for faster iteration. Both run in parallel on EvoLink.

Does Wan 2.7 generate audio automatically?

In text-to-video mode, yes — if you don't provide audio_urls, the model auto-generates background music or sound effects matching the visual content.

Are failed tasks billed?

For reference-video and video-edit modes, failed tasks are explicitly not billed. For text-to-video and image-to-video, billing is based on actual generated video duration.

Can I use Wan 2.7 for NSFW content?

No. The model will reject prompts that violate content policies. If your prompt is rejected, you'll receive an invalid_content error.

What audio formats are supported for voice cloning?

wav and mp3. Duration should be 1-10 seconds for voice cloning, 2-30 seconds for driving audio. Maximum file size is 15MB.

How do I handle video URL expiration?

Video URLs expire after 24 hours. Build an automatic download-and-archive step into your pipeline immediately after task completion. Store the final asset in your own CDN or object storage.

Can I migrate from Wan 2.6 without downtime?

Yes. Change the model parameter from wan2.6-* to wan2.7-*. The endpoint, authentication, and async pattern are identical. Both versions run in parallel, so you can migrate route by route.

Next steps

Ready to Reduce Your AI Costs by 89%?

Start using EvoLink today and experience the power of intelligent API routing.