Audio comparison demos for reconstruction quality and similarity across short-form and longform settings.
Comparison of original audio and reconstructed samples across different codecs and tokenization methods under comparable bps.
| Samples | Original | TASTE-S (ours) | Text-only | TASTE | TaDiCodec | Encodec (1500bps) | DM-Codec (1000bps) | Mimi (1000bps) | SpeechTokenizer (500bps) | BigCodec (1040bps) | WavTokenizer (480bps) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample 1 | |||||||||||
| Sample 2 | |||||||||||
| Sample 3 |
Comparison of longform speech reconstruction between TASTE-S and TASTE.
| Samples | Original | TASTE-S (ours; w/ built-in ASR) | TASTE-S (ours; w/ external ASR) | TASTE (w/ external ASR) |
|---|---|---|---|---|
| Sample 1 |
Duration: 173.6 s |
Duration: 173.9 s |
Duration: 175.3 s |
Duration: 183.1 s |