Seedance 2.0 Support and Multimodal Reference-to-Video Workflow

Mar 25, 2026 · Update

1. Seedance 2.0 model support

This model is currently available only to enterprise users.

Seedance multimodal reference-to-video supports reference images (0-9), reference videos (0-3), reference audio clips (0-3), and optional text prompts.

2. Prompt specifications

Chinese and English prompts are supported. We recommend keeping Chinese prompts under 500 characters and English prompts under 1,000 words. When prompts are too long, information can become scattered and the model may ignore details or focus only on key points, causing missing elements in the video.

3. Single-image input requirements

Format: jpeg, png, webp, bmp, tiff, gif.

Aspect ratio (width / height): (0.4, 2.5).

Width and height length (px): (300, 6000).

Size: each image must be under 30 MB. Request body size must not exceed 64 MB. Do not use Base64 encoding for large files.

Image count: 1 image for image-to-video first frame; 2 images for image-to-video first and last frames; 1-9 images for Seedance 2.0 and 2.0 fast multimodal reference-to-video.

4. Single-video input requirements

Video format: mp4, mov.

Resolution: 480p, 720p.

Duration: each video must be 2-15 seconds. Up to 3 reference videos can be provided, and total duration across all videos must not exceed 15 seconds.

Aspect ratio (width / height): [0.4, 2.5]. Width and height length (px): [300, 6000].

Frame pixels (width x height): [409600, 927408]. For example, 640 x 640 = 409600 meets the minimum value, and 834 x 1112 = 927408 meets the maximum value.

Size: each video must be under 50 MB. Frame rate (FPS): [24, 60].

5. Single-audio input requirements

Format: wav, mp3.

Duration: each audio clip must be 2-15 seconds. Up to 3 reference audio clips can be provided, and total duration across all audio clips must not exceed 15 seconds.

Size: each audio clip must be under 15 MB. Request body size must not exceed 64 MB.

6. Run compliance detection first

If the input contains people in images, videos, or audio, click the Compliance Check button above the node before generation.

7. Generate after detection succeeds

A green mark means detection succeeded. You can then proceed to generation.

8. Batch asset detection workflow

If you have many people-related images, videos, or audio clips, run batch asset detection in advance to avoid copyright issues.

9. Enter the Asset Library from the homepage

10. Create an asset group

11. Add assets and wait for them to be stored

The "Stored" status means the asset passed detection. Note that uploaded assets can include images, videos, and audio.

Image requirements: jpeg, png, webp, bmp, tiff, gif, heic/heif; aspect ratio (width / height): (0.4, 2.5); width and height length (px): (300, 6000); each image must be under 30 MB.

Video requirements: mp4, mov; resolution: 480p, 720p; each video must be 2-15 seconds; aspect ratio (width / height): [0.4, 2.5]; width and height length (px): [300, 6000]; total pixels must fit [409600, 927408]; each video must be under 50 MB; frame rate (FPS): [24, 60].

Audio requirements: wav, mp3; each audio clip must be 2-15 seconds; each audio clip must be under 15 MB.