Seed Audio 1.0 API

Build AI audio generation features with Doubao Seed Audio 1.0 through EvoLink's unified API gateway. Model ID doubao-seed-audio-1-0, per-second billing, up to 120s output.

Model Type:

Price: $0.0012(~ 0.08 credits) per second

Highest stability with guaranteed 99.9% uptime. Recommended for production environments.

Use the same API endpoint for all versions. Only the model parameter differs.

Prompt*

83 (suggested: 2,000)

Reference Mode

Reference type. Reference Audio and Reference Image are mutually exclusive.

Click Generate to see preview

History

Max 20 items

0 running · 0 completed

Your generation history will appear here

Seed Audio 1.0 API for AI Audio Generation

Name: EvoLink AI Model API Platform
Brand: EvoLink
Availability: InStock

Build creator tools, voice agents, audio drama workflows, and short-video production features with Doubao Seed Audio 1.0 through EvoLink's unified API gateway.

Seed Audio 1.0 AI audio generation on EvoLink

Pricing

Model	Mode	Price
Doubao Seed Audio 1.0	Audio Generation (per second)	$0.0012/ second(0.08 Credits)

Doubao Seed Audio 1.0

Audio Generation (per second)

Price:

$0.0012/ second

(0.08 Credits)

If it's down, we automatically use the next cheapest available—ensuring 99.9% uptime at the best possible price.

What Can You Build with Seed Audio 1.0?

Creator Tools and Audio Workflows

Seed Audio 1.0 is prompt-based AI audio generation, not just text-to-speech. Generate narration, voiceover, and sound design from a single prompt, and use reference audio to keep a consistent voice across an entire production. Ideal for podcast tooling, audiobook pipelines, and short-video content workflows where speech, music, and ambience need to be produced together.

Start Building

Voice Agents and AI Companions

Give voice agents, assistants, and AI companions an expressive, controllable voice. Adjust speed, pitch, and volume to match each interaction, and pass reference audio to anchor a recurring character voice. Output streams back through the same EvoLink gateway you already use for other models, so you manage usage and cost from one place.

Audio Drama, Games, and Interactive Stories

Compose multi-character dialogue, emotion, and non-verbal expression directly in the prompt to drive audio drama, game scenes, and interactive narratives. Long-form consistency makes it suitable for audiobooks, audio dramas, and episodic content where the same characters must sound consistent across many generations.

Why Use Seed Audio 1.0 via EvoLink?

Seed Audio 1.0 is already live on EvoLink, so you can integrate a new audio model early through one unified gateway.

Fast Model Adoption

Seed Audio 1.0 is live on EvoLink today. Use model ID doubao-seed-audio-1-0 with your existing EvoLink API key to start integrating a new AI audio generation model early — no separate account, contract, or onboarding for a single provider.

Cost Visibility by Output Duration

Seed Audio 1.0 is billed by generated audio duration, charged per second of output. That makes batch workloads easy to estimate before you run them. Check the EvoLink console for the latest unit price, and monitor real usage from the same dashboard as your other models.

Unified Gateway for Audio Models

Access Seed Audio 1.0 alongside other audio models through one EvoLink API. Compare options, manage keys and usage in one place, and route or fall back across models without rewiring your integration for each provider.

How to Integrate Seed Audio 1.0

Three steps to call Doubao Seed Audio 1.0 through EvoLink.

Create an EvoLink API Key

Sign up on EvoLink and generate an API key from the console. The same key gives you access to Seed Audio 1.0 and the other models on the gateway, and lets you set usage limits and monitor consumption from one dashboard.

Use Model ID doubao-seed-audio-1-0

Point your request at model ID doubao-seed-audio-1-0. Provide your text prompt (up to 1.5k characters) and optional reference audio, then set output options such as format, sample rate, speed, pitch, and volume.

Submit an Async Task and Retrieve Audio

Seed Audio 1.0 uses an asynchronous task model: submit the generation request, receive a task ID, then poll the task status endpoint to retrieve the finished audio (up to 120s). Stream, download, or embed the result directly in your product.

Capabilities and Limits

The concrete facts you need before integrating Seed Audio 1.0.

Generation

Prompt-Based Audio Generation

Seed Audio 1.0 generates audio from a prompt, optionally guided by reference audio. It goes beyond plain TTS: multi-character dialogue, emotion, and non-verbal expression can be written directly into the prompt.

Input

Reference Audio Support

Provide up to 3 reference audio clips per request, each no longer than 30 seconds, via base64 or URL, to guide timbre and delivery. Reference image and reference audio cannot be supplied in the same request.

Limits

Output Limit up to 120s

Each request synthesizes up to 120 seconds of audio. Text input is capped at 1.5k characters, which is convenient for batching long-form content into segments.

Formats

Flexible Output Formats

Export audio as wav (default), mp3, pcm, or ogg_opus, so you can match your downstream pipeline without extra transcoding. Explicit and implicit watermarking are supported.

Quality

Selectable Sample Rates

Choose 48K, 24K (default), 16K, or 8K sample rates to balance fidelity and file size for web delivery, production, or real-time processing.

Control

Languages and Delivery Controls

Supports Chinese and English, with mainstream domestic accent delivery (pure dialects are not supported). Adjust speed, pitch, and volume per request. SSML is not supported.

Frequently Asked Questions about Seed Audio 1.0

Everything you need to know about the product and billing.

Seed Audio 1.0 (Doubao-Seed-Audio 1.0) is ByteDance's prompt-based AI audio generation model. From a text prompt — optionally guided by reference audio — it can generate speech, multi-character dialogue, and audio with emotion and non-verbal expression. It is broader than traditional text-to-speech and is designed for AI audio generation use cases.

Yes. Seed Audio 1.0 is live on EvoLink and can be accessed through EvoLink's unified API gateway with your existing API key, alongside the other models on the platform.

Use the model ID doubao-seed-audio-1-0 in your request when calling Seed Audio 1.0 through EvoLink.

Seed Audio 1.0 is billed by generated audio duration, charged per second of output, which makes batch workloads straightforward to estimate. Pricing can change, so check the latest unit price in the EvoLink console and pricing page before you scale.

Text input is up to 1.5k characters. You can provide up to 3 reference audio clips, each no longer than 30 seconds, via base64 or URL. A single request synthesizes up to 120 seconds of audio. Output formats are wav (default), mp3, pcm, and ogg_opus, with sample rates of 48K, 24K (default), 16K, and 8K. Reference image and reference audio cannot be supplied at the same time; other limits may vary, so check the latest EvoLink console and official docs.

No. While it can synthesize speech from text, Seed Audio 1.0 is prompt-based AI audio generation. You can compose multi-character dialogue, emotion, and non-verbal expression in the prompt and guide output with reference audio, which goes well beyond a single-voice text-to-speech engine.

No. SSML is not supported. Delivery is controlled through prompt instructions and request parameters such as speed, pitch, and volume.

API Reference

Select endpoint

Authentication

All APIs require Bearer Token authentication.

Header

Authorization: 
Bearer YOUR_API_KEY

Get API Key

POST

/v1/audios/generations

Generate Audio

Create an audio generation task from a text prompt, optionally guided by reference voices or a reference image.

Asynchronous processing mode, use the returned task ID to .

Result audio URLs are CDN-hosted and long-lived. Billed per output second (up to 120s).

Three Generation Modes

Text-to-speechPass only prompt — generate audio directly from the prompt.

Voice cloningprompt + audio_references — reference a voice ID or reference audio. Use @音频N in the prompt to reference the N-th item.

Image-guidedprompt + image_urls — generate audio guided by a reference image.

⚠️ audio_references and image_urls are mutually exclusive — use one or the other.

Request Parameters

modelstringRequiredDefault: doubao-seed-audio-1-0

Audio generation model name.

Value	Description
doubao-seed-audio-1-0	Doubao Seed Audio 1.0 multimodal audio generation

Exampledoubao-seed-audio-1-0

promptstringRequired

The text content to synthesize, or a prompt describing the audio. Use @音频N to reference the N-th item of audio_references.

Notes

Limited to 1.5k characters

Example@音频1 Hi there! @音频2 How's your day going?

audio_referencesarrayOptional

Reference voices. Each item is a voice ID or a reference audio URL (items starting with 'http' are treated as URLs, otherwise as voice IDs). Order maps to @音频1 / @音频2 in the prompt.

Notes

Up to 3 items; mutually exclusive with image_urls
Voice IDs look like 'zh_female_xxx'
Reference audio: each ≤ 30s / ≤ 10MB, wav/mp3/pcm/ogg_opus

Example["zh_female_example_id", "https://your-bucket.com/ref-voice.mp3"]

See Preset Voice IDs in the left sidebar for curated voices and the full catalog link.

image_urlsarrayOptional

Reference image URL to drive audio generation.

Notes

Currently at most 1 image; mutually exclusive with audio_references
≤ 10MB, jpeg/png/webp

Example["https://your-bucket.com/scene.jpg"]

speech_ratenumberOptionalDefault: 1.0

Speech speed multiplier.

Notes

Range: 0.5 to 2.0 (1.0 = normal, 2.0 = double speed, 0.5 = half speed)
Accepts two decimals

Example1.2

loudness_ratenumberOptionalDefault: 1.0

Loudness multiplier.

Notes

Range: 0.5 to 2.0 (1.0 = normal)
Accepts two decimals

Example1.0

pitch_rateintegerOptionalDefault: 0

Pitch adjustment in semitones.

Notes

Range: -12 to 12 (0 = no change)

Example0

formatstringOptionalDefault: wav

Output audio format.

Value	Description
wav	WAV
mp3	MP3
pcm	PCM
ogg_opus	OGG Opus

Examplemp3

sample_rateintegerOptionalDefault: 24000

Output sample rate in Hz.

Value	Description
8000	8 kHz
16000	16 kHz
24000	24 kHz
48000	48 kHz

Example24000

callback_urlstringOptional

HTTPS callback address after task completion.

Notes

Triggered on completion, failure, or cancellation
Sent after billing confirmation
HTTPS only, no internal IPs
Max length: 2048 chars

Examplehttps://your-domain.com/webhooks/audio-task-completed

Request Example — Text-to-Speech

{
  "model": "doubao-seed-audio-1-0",
  "prompt": "欢迎使用语音合成服务，今天天气真不错。",
  "format": "mp3",
  "speech_rate": 1.2
}

Request Example — Voice Cloning (multi-voice)

{
  "model": "doubao-seed-audio-1-0",
  "prompt": "@音频1 Hi there! @音频2 How's your day going?",
  "audio_references": [
    "zh_female_example_id",
    "https://your-bucket.com/ref-voice.mp3"
  ]
}

Response Example

Submit (task created):

{
  "id": "task-unified-xxxxxxxx",
  "object": "audio.generation.task",
  "model": "doubao-seed-audio-1-0",
  "type": "audio",
  "status": "processing",
  "progress": 0,
  "task_info": { "can_cancel": false, "estimated_time": 15 }
}

Query (completed):

{
  "id": "task-unified-1782491238-7b6bmmv2",
  "object": "audio.generation.task",
  "model": "doubao-seed-audio-1-0",
  "type": "audio",
  "status": "completed",
  "progress": 100,
  "created": 1782491238,
  "duration": 41,
  "results": ["https://files.evolink.ai/.../seed-audio-xxx.wav"],
  "result_data": [
    {
      "audio_url": "https://files.evolink.ai/.../seed-audio-xxx.wav",
      "duration": 10.18,
      "format": "wav"
    }
  ],
  "task_info": { "can_cancel": false },
  "usage": { "credits_used": 0.88, "original_duration": 10.18 }
}