Seedance 2.0 Mini is now availableTry now

Seed Audio 1.0 API

Build AI audio generation features with Doubao Seed Audio 1.0 through EvoLink's unified API gateway. Model ID doubao-seed-audio-1-0, per-second billing, up to 120s output.

Price: $0.0012(~ 0.08 credits) per second

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

83 (suggested: 2,000)

Reference type. Reference Audio and Reference Image are mutually exclusive.

Click Generate to see preview

History

Max 20 items

0 running · 0 completed

Your generation history will appear here

Seed Audio 1.0 API for AI Audio Generation

Build creator tools, voice agents, audio drama workflows, and short-video production features with Doubao Seed Audio 1.0 through EvoLink's unified API gateway.

Seed Audio 1.0 AI audio generation on EvoLink

Pricing

Doubao Seed Audio 1.0
Audio Generation (per second)
Price:
$0.0012/ second
(0.08 Credits)

If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.

What Can You Build with Seed Audio 1.0?

Creator Tools and Audio Workflows

Seed Audio 1.0 is prompt-based AI audio generation, not just text-to-speech. Generate narration, voiceover, and sound design from a single prompt, and use reference audio to keep a consistent voice across an entire production. Ideal for podcast tooling, audiobook pipelines, and short-video content workflows where speech, music, and ambience need to be produced together.

creator tools and audio workflows

Voice Agents and AI Companions

Give voice agents, assistants, and AI companions an expressive, controllable voice. Adjust speed, pitch, and volume to match each interaction, and pass reference audio to anchor a recurring character voice. Output streams back through the same EvoLink gateway you already use for other models, so you manage usage and cost from one place.

voice agents and AI companions

Audio Drama, Games, and Interactive Stories

Compose multi-character dialogue, emotion, and non-verbal expression directly in the prompt to drive audio drama, game scenes, and interactive narratives. Long-form consistency makes it suitable for audiobooks, audio dramas, and episodic content where the same characters must sound consistent across many generations.

audio drama, games, and interactive stories

Why Use Seed Audio 1.0 via EvoLink?

Seed Audio 1.0 is already live on EvoLink, so you can integrate a new audio model early through one unified gateway.

Fast Model Adoption

Seed Audio 1.0 is live on EvoLink today. Use model ID doubao-seed-audio-1-0 with your existing EvoLink API key to start integrating a new AI audio generation model early — no separate account, contract, or onboarding for a single provider.

Cost Visibility by Output Duration

Seed Audio 1.0 is billed by generated audio duration, charged per second of output. That makes batch workloads easy to estimate before you run them. Check the EvoLink console for the latest unit price, and monitor real usage from the same dashboard as your other models.

Unified Gateway for Audio Models

Access Seed Audio 1.0 alongside other audio models through one EvoLink API. Compare options, manage keys and usage in one place, and route or fall back across models without rewiring your integration for each provider.

How to Integrate Seed Audio 1.0

Three steps to call Doubao Seed Audio 1.0 through EvoLink.

How to Integrate Seed Audio 1.0
1

Create an EvoLink API Key

Sign up on EvoLink and generate an API key from the console. The same key gives you access to Seed Audio 1.0 and the other models on the gateway, and lets you set usage limits and monitor consumption from one dashboard.

2

Use Model ID doubao-seed-audio-1-0

Point your request at model ID doubao-seed-audio-1-0. Provide your text prompt (up to 1.5k characters) and optional reference audio, then set output options such as format, sample rate, speed, pitch, and volume.

3

Submit an Async Task and Retrieve Audio

Seed Audio 1.0 uses an asynchronous task model: submit the generation request, receive a task ID, then poll the task status endpoint to retrieve the finished audio (up to 120s). Stream, download, or embed the result directly in your product.

Capabilities and Limits

The concrete facts you need before integrating Seed Audio 1.0.

Generation

Prompt-Based Audio Generation

Seed Audio 1.0 generates audio from a prompt, optionally guided by reference audio. It goes beyond plain TTS: multi-character dialogue, emotion, and non-verbal expression can be written directly into the prompt.

Input

Reference Audio Support

Provide up to 3 reference audio clips per request, each no longer than 30 seconds, via base64 or URL, to guide timbre and delivery. Reference image and reference audio cannot be supplied in the same request.

Limits

Output Limit up to 120s

Each request synthesizes up to 120 seconds of audio. Text input is capped at 1.5k characters, which is convenient for batching long-form content into segments.

Formats

Flexible Output Formats

Export audio as wav (default), mp3, pcm, or ogg_opus, so you can match your downstream pipeline without extra transcoding. Explicit and implicit watermarking are supported.

Quality

Selectable Sample Rates

Choose 48K, 24K (default), 16K, or 8K sample rates to balance fidelity and file size for web delivery, production, or real-time processing.

Control

Languages and Delivery Controls

Supports Chinese and English, with mainstream domestic accent delivery (pure dialects are not supported). Adjust speed, pitch, and volume per request. SSML is not supported.

Frequently Asked Questions about Seed Audio 1.0

Everything you need to know about the product and billing.

Seed Audio 1.0 (Doubao-Seed-Audio 1.0) is ByteDance's prompt-based AI audio generation model. From a text prompt — optionally guided by reference audio — it can generate speech, multi-character dialogue, and audio with emotion and non-verbal expression. It is broader than traditional text-to-speech and is designed for AI audio generation use cases.
Yes. Seed Audio 1.0 is live on EvoLink and can be accessed through EvoLink's unified API gateway with your existing API key, alongside the other models on the platform.
Use the model ID doubao-seed-audio-1-0 in your request when calling Seed Audio 1.0 through EvoLink.
Seed Audio 1.0 is billed by generated audio duration, charged per second of output, which makes batch workloads straightforward to estimate. Pricing can change, so check the latest unit price in the EvoLink console and pricing page before you scale.
Text input is up to 1.5k characters. You can provide up to 3 reference audio clips, each no longer than 30 seconds, via base64 or URL. A single request synthesizes up to 120 seconds of audio. Output formats are wav (default), mp3, pcm, and ogg_opus, with sample rates of 48K, 24K (default), 16K, and 8K. Reference image and reference audio cannot be supplied at the same time; other limits may vary, so check the latest EvoLink console and official docs.
No. While it can synthesize speech from text, Seed Audio 1.0 is prompt-based AI audio generation. You can compose multi-character dialogue, emotion, and non-verbal expression in the prompt and guide output with reference audio, which goes well beyond a single-voice text-to-speech engine.
No. SSML is not supported. Delivery is controlled through prompt instructions and request parameters such as speed, pitch, and volume.
POST
/v1/audios/generations

Generate Audio

Create an audio generation task from a text prompt, optionally guided by reference voices or a reference image.

Asynchronous processing mode, use the returned task ID to .

Result audio URLs are CDN-hosted and long-lived. Billed per output second (up to 120s).

Three Generation Modes

Text-to-speechPass only prompt — generate audio directly from the prompt.
Voice cloningprompt + audio_references — reference a voice ID or reference audio. Use @音频N in the prompt to reference the N-th item.
Image-guidedprompt + image_urls — generate audio guided by a reference image.

⚠️ audio_references and image_urls are mutually exclusive — use one or the other.

Request Parameters

modelstringRequiredDefault: doubao-seed-audio-1-0

Audio generation model name.

ValueDescription
doubao-seed-audio-1-0Doubao Seed Audio 1.0 multimodal audio generation
Exampledoubao-seed-audio-1-0
promptstringRequired

The text content to synthesize, or a prompt describing the audio. Use @音频N to reference the N-th item of audio_references.

Notes
  • Limited to 1.5k characters
Example@音频1 Hi there! @音频2 How's your day going?
audio_referencesarrayOptional

Reference voices. Each item is a voice ID or a reference audio URL (items starting with 'http' are treated as URLs, otherwise as voice IDs). Order maps to @音频1 / @音频2 in the prompt.

Notes
  • Up to 3 items; mutually exclusive with image_urls
  • Voice IDs look like 'zh_female_xxx'
  • Reference audio: each ≤ 30s / ≤ 10MB, wav/mp3/pcm/ogg_opus
Example["zh_female_example_id", "https://your-bucket.com/ref-voice.mp3"]

See Preset Voice IDs in the left sidebar for curated voices and the full catalog link.

image_urlsarrayOptional

Reference image URL to drive audio generation.

Notes
  • Currently at most 1 image; mutually exclusive with audio_references
  • ≤ 10MB, jpeg/png/webp
Example["https://your-bucket.com/scene.jpg"]
speech_ratenumberOptionalDefault: 1.0

Speech speed multiplier.

Notes
  • Range: 0.5 to 2.0 (1.0 = normal, 2.0 = double speed, 0.5 = half speed)
  • Accepts two decimals
Example1.2
loudness_ratenumberOptionalDefault: 1.0

Loudness multiplier.

Notes
  • Range: 0.5 to 2.0 (1.0 = normal)
  • Accepts two decimals
Example1.0
pitch_rateintegerOptionalDefault: 0

Pitch adjustment in semitones.

Notes
  • Range: -12 to 12 (0 = no change)
Example0
formatstringOptionalDefault: wav

Output audio format.

ValueDescription
wavWAV
mp3MP3
pcmPCM
ogg_opusOGG Opus
Examplemp3
sample_rateintegerOptionalDefault: 24000

Output sample rate in Hz.

ValueDescription
80008 kHz
1600016 kHz
2400024 kHz
4800048 kHz
Example24000
callback_urlstringOptional

HTTPS callback address after task completion.

Notes
  • Triggered on completion, failure, or cancellation
  • Sent after billing confirmation
  • HTTPS only, no internal IPs
  • Max length: 2048 chars
Examplehttps://your-domain.com/webhooks/audio-task-completed

Request Example — Text-to-Speech

{
  "model": "doubao-seed-audio-1-0",
  "prompt": "欢迎使用语音合成服务,今天天气真不错。",
  "format": "mp3",
  "speech_rate": 1.2
}

Request Example — Voice Cloning (multi-voice)

{
  "model": "doubao-seed-audio-1-0",
  "prompt": "@音频1 Hi there! @音频2 How's your day going?",
  "audio_references": [
    "zh_female_example_id",
    "https://your-bucket.com/ref-voice.mp3"
  ]
}

Response Example

Submit (task created):

{
  "id": "task-unified-xxxxxxxx",
  "object": "audio.generation.task",
  "model": "doubao-seed-audio-1-0",
  "type": "audio",
  "status": "processing",
  "progress": 0,
  "task_info": { "can_cancel": false, "estimated_time": 15 }
}

Query (completed):

{
  "id": "task-unified-1782491238-7b6bmmv2",
  "object": "audio.generation.task",
  "model": "doubao-seed-audio-1-0",
  "type": "audio",
  "status": "completed",
  "progress": 100,
  "created": 1782491238,
  "duration": 41,
  "results": ["https://files.evolink.ai/.../seed-audio-xxx.wav"],
  "result_data": [
    {
      "audio_url": "https://files.evolink.ai/.../seed-audio-xxx.wav",
      "duration": 10.18,
      "format": "wav"
    }
  ],
  "task_info": { "can_cancel": false },
  "usage": { "credits_used": 0.88, "original_duration": 10.18 }
}