Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F0 Representation

Pongsathon Janyoi and Pusadee Seresangtakul

Speech samples to support the submission. The synthetic speeches are generated by using the same spectral parameters with the different F0 contours.
This page contains following samples:
  1. Natural speech.
  2. Frame-based RNN : F0 values are genereted frame-by-frame.
  3. DCT-based RNN : F0 constours are represented by DCT coefficients and generated syllable-by-syllable.
  4. SAMP-based RNN : Proposed model.
#sample Natural speech Frame-based RNN DCT-based RNN SAMP-based RNN
1
2
3
4
5
6
7
8
9
10
11