seed-tts-eval
A set of tests used to evaluate a model’s zero-sample speech generation capability
Tags:AI audio toolsAI speech processing toolsPreview:
Introduce:
seed-tts-eval is a test set for evaluating a model’s zero-sample speech generation ability, which provides an objective assessment test set of cross-domain objectives, containing samples drawn from the English and Mandarin public corpus, used to measure the model’s performance on various objective indicators. It used 1000 samples from the Common Voice dataset and 2000 samples from the DiDiSpeech-2 dataset.
Stakeholders:
The target audience is speech synthesis technology researchers and developers who can use the seed-tts-eval model to evaluate and improve their speech synthesis systems.
Usage Scenario Examples:
- The researchers used seed-tts-eval to evaluate the performance of new speech synthesis models
- Developers use this test set to compare the effects of different speech synthesis techniques
- Educational institutions use the test set as instructional materials to teach speech synthesis techniques
The features of the tool:
- Common Voice and DiDiSlieech-2 dataset samples were used for evaluation
- Word Error Rate (WER) and Slieaker Similarity (SIM) are used as evaluation indicators
- Whislier-large-v3 and Paraex-ZH are used as automatic speech recognition engines for English and Mandarin, respectively
- Speaker similarity was evaluated using WavLM-large model
- Provide a download link for the test set
- Supports zero sample text-to-speech (TTS) and voice conversion (VC) task evaluation
Steps for Use:
- Visit the GitHub page for seed-tts-eval
- Read the README file to learn how to install dependencies and use test sets
- Download the sample test set you want
- The performance of the model is evaluated using the evaluation code provided
- The speech synthesis model is optimized according to the evaluation results
Tool’s Tabs: Speech synthesis, automatic speech recognition