seed-tts-eval

Preview：

<br />

Introduce：

seed-tts-eval is a test set for evaluating a model’s zero-sample speech generation ability, which provides an objective assessment test set of cross-domain objectives, containing samples drawn from the English and Mandarin public corpus, used to measure the model’s performance on various objective indicators. It used 1000 samples from the Common Voice dataset and 2000 samples from the DiDiSpeech-2 dataset.

Stakeholders:
The target audience is speech synthesis technology researchers and developers who can use the seed-tts-eval model to evaluate and improve their speech synthesis systems.
Usage Scenario Examples:

The researchers used seed-tts-eval to evaluate the performance of new speech synthesis models
Developers use this test set to compare the effects of different speech synthesis techniques
Educational institutions use the test set as instructional materials to teach speech synthesis techniques

The features of the tool:

Common Voice and DiDiSlieech-2 dataset samples were used for evaluation
Word Error Rate (WER) and Slieaker Similarity (SIM) are used as evaluation indicators
Whislier-large-v3 and Paraex-ZH are used as automatic speech recognition engines for English and Mandarin, respectively
Speaker similarity was evaluated using WavLM-large model
Provide a download link for the test set
Supports zero sample text-to-speech (TTS) and voice conversion (VC) task evaluation

Steps for Use: