Stable Audio Open 1.0

Preview：

<br />

Introduce：

Stable Audio Open 1.0 is an AI model that utilizes autoencoders, T5-based text embedding, and transformer-based diffusion models to generate up to 47 seconds of stereo audio. It generates music and audio through text prompts, supporting research and experimentation to explore the current capabilities of generative AI models. The model was trained on datasets from Freesound and the Free Music Archive (FMA), ensuring data diversity and copyright legitimacy.
Stable Audio Open 1.0
Stakeholders:
The product is suitable for music producers, audio engineers, researchers, and any individual or team interested in AI music generation. It provides a tool for artists to experiment and create new musical works, while providing a platform for researchers to explore and improve generative AI models.
Usage Scenario Examples:

Music producers use this model to generate new background music based on text prompts.
Researchers use models to analyze and improve the scientific state of generative AI models.
Audio engineers use this model to explore the generation of sound effects under different text prompts.

The features of the tool:

Generate up to 47 seconds of stereo audio.
An audio sampling rate of 44.1kHz is supported.
Music and audio generation based on text prompts.
The autoencoder compresses the waveform to a manageable sequence length.
Text condition processing based on T5 text embedding technology.
The diffusion model operates in the potential space of the autoencoder.

Steps for Use:

Download and install the required stable-audio-tools library.
Download the pre-trained model using the code examples provided.
Set text and time conditions that define the start time and total length of the audio.
Call model to generate diffused condition audio.
The generated audio is rearranged, peak-normalized, clipped, converted to int16 format, and saved as a file.

Tool’s Tabs: AI music generation, audio processing