MOSS-TTS-Nano Demo

State-of-the-art text-to-speech demo for multilingual voice cloning.

Built with MOSS-TTS-Nano.

Using the selected demo prompt speech.
Generation Options
0 keeps the current default behavior. Set Max TTS Batch Size to 1 to force split chunks to run one by one. Buffered generation keeps chunk order and decodes codec sub-batches no larger than the current TTS batch. Realtime Streaming Decode keeps output order and uses the smallest active chunk-group width among auto batching, Max TTS Batch Size, and Max Codec Batch Size.
This app is CPU-only. CPU Threads maps to torch.set_num_threads for that request.
WeTextProcessing and normalize_tts_text can now be toggled independently for each request. WeTextProcessing is preloaded during startup so enabling it does not add first-request graph-build latency.
Warmup complete. device=cpu elapsed=8.27s | WeTextProcessing ready.
WeTextProcessing ready. languages=zh,en
Idle.
The current sentence will be highlighted here during playback.
Checkpoint: OpenMOSS-Team/MOSS-TTS-Nano
Audio Tokenizer: OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano