Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F₀ Representation

Pongsathon Janyoi and Pusadee Seresangtakul

Speech samples to support the submission. The synthetic speeches are generated by using the same spectral parameters with the different F0 contours.
This page contains following samples:

Natural speech.
Frame-based RNN : F0 values are genereted frame-by-frame.
DCT-based RNN : F0 constours are represented by DCT coefficients and generated syllable-by-syllable.
SAMP-based RNN : Proposed model.

#sample	Natural speech	Frame-based RNN	DCT-based RNN	SAMP-based RNN
1
2
3
4
5
6
7
8
9
10
11

Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F0 Representation

Pongsathon Janyoi and Pusadee Seresangtakul

Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F₀ Representation