TITLE:
An Intonation Speech Synthesis Model for Indonesian Using Pitch Pattern and Phrase Identification
AUTHORS:
Yohanes Suyanto, Subanar , Agus Harjoko, Sri Hartati
KEYWORDS:
Speech Synthesis, PESQ, Intonation, Indonesian
JOURNAL NAME:
Journal of Signal and Information Processing,
Vol.5 No.3,
July
30,
2014
ABSTRACT:
Prosody in speech
synthesis systems (text-to-speech) is a determinant of tone, duration, and
loudness of speech sound. Intonation is a part of prosody which determines the
speech tone. In Indonesian, intonation is determined by the structure of
sentences, types of sentences, and also the position of the word in a sentence.
In this study, a model of speech synthesis that focuses on its intonation is
proposed. The speech intonation is determined by sentence structure, intonation
patterns of the example sentences, and general rules of Indonesian
pronunciation. The model receives texts and intonation patterns as inputs.
Based on the general principle of Indonesian pronunciation, a prosody file was
made. Based on input text, sentence structure is determined and then interval
among parts of a sentence (phrase) can be determined. These intervals are used
to correct the duration of the initial prosody file. Furthermore, the
frequencies in prosody file were corrected using intonation patterns. The final
result is prosody file that can be pronounced by speech engine application.
Experiment results of studies using the original voice of radio news announcer
and the speech synthesis show that the peaks ofF0are determined by general rules or
intonation patterns which are dominant. Similarity test with the PESQ method
shows that the result of the synthesis is 1.18 at MOS-LQO scale.