Responsive image
博碩士論文 etd-0830110-111455 詳細資訊
Title page for etd-0830110-111455
論文名稱
Title
基於隱藏式馬可夫模型之語者相關情緒語音合成
A Hidden Markov Model-Based Approach for Emotional Speech Synthesis
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
47
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2010-07-28
繳交日期
Date of Submission
2010-08-30
關鍵字
Keywords
模型結合、線性迴歸、模型內插、馬氏距離、情緒語音、隱藏式馬可夫模型、語音合成
speech synthesis, HMM, emotional expressiveness, model combination, linear regression, model interpolation, Mahalanobis distance
統計
Statistics
本論文已被瀏覽 5680 次,被下載 0
The thesis/dissertation has been browsed 5680 times, has been downloaded 0 times.
中文摘要
在這篇論文中,我們利用隱藏式馬可夫模型,開發了兩個藉由目標語者不帶情緒的語音來合成出情緒語音的方法。
在第一個方法裡,我們藉由將目標語者不帶情緒的模型和資料庫中帶有情緒的模型做模型內插來合成目標語者帶有情緒的語音。我們提出了monophone-based Mahalanobis distance (MBMD)來選擇適當的模型,並且估算出模型的內插值。
在第二個方法中,我們用線性迴歸來描述不帶情緒的模型和情緒模型之間的差異。將訓練線性迴歸所得到的參數與目標語者不帶情緒的模型結合來達成我們所要的
效果。
在實驗中,我們合成出帶有生氣、快樂和悲傷的語音並且做了客觀的評估。由評估結果得知,我們的方法可以有效的合成出目標語者的情緒語音。
Abstract
In this thesis, we describe two approaches to automatically synthesize the emotional speech of a target speaker based on the hidden Markov model for his/her neutral speech.
In the interpolation based method, the basic idea is the model interpolation between the neutral model of the target speaker and an emotional model selected from a candidate pool. Both the interpolation model selection and the interpolation weight computation are determined based on a model-distance measure. We propose a monophone-based Mahalanobis
distance (MBMD).
In the parallel model combination (PMC) based method, our basic idea is to model the mismatch between neutral model and emotional model. We train linear regression model to describe this mismatch. And then we combine the target speaker neutral model with the linear regression model.
We evaluate our approach on the synthesized emotional speech of angriness, happiness, and sadness with several subjective tests. Experimental results show that the implemented system is able to synthesize speech with emotional expressiveness of the target speaker.
目次 Table of Contents
List of Tables iii
List of Figures iv
Acknowledgments vi
Chapter 1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 Hidden Markov Model-Based Speech Synthesis System 4
2.1 HMM-Based Speech Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 HMM-Based Mandarin Speech Synthesis System . . . . . . . . . . . . . . . 7
2.2.1 Segmental Tonal Modeling . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Questions Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Context-Dependent Label Format . . . . . . . . . . . . . . . . . . . 9
Chapter 3 The Proposed Algorithm 11
3.1 Interpolation Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Model Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 PMC Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Model of the Environment . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 The Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 18
i
Chapter 4 Experiment Results 22
4.1 Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 The Experiment of Interpolation Based . . . . . . . . . . . . . . . . . . . . . 23
4.2.1 Emotional Expressiveness Test . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 Naturalness Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.3 Similarity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.4 Comparison with Naive Interpolation . . . . . . . . . . . . . . . . . 25
4.3 The Experiment of PMC Based . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.2 Emotional Expressiveness Test . . . . . . . . . . . . . . . . . . . . . 27
4.3.3 Naturalness Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.4 Similarity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 Paired Comparison Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 5 Conclusion and Future Works 35
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
參考文獻 References
[1] A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using
a large speech database,” in Proc. ICASSP, pp. 373–376, 1996.
[2] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous
Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis,” in Proc.
of Eurospeech, pp. 2347–2350, 1999.
[3] J. Tao, Y. Kang, and A. Li, “Prosody conversion from neutral speech to emotional
speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4,
pp. 1145–1154, 2006.
[4] C. Wu, C. Hsia, T. Liu, and J. Wang, “Voice conversion using duration-embedded bi-
HMMs for expressive speech synthesis,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 14, no. 4, pp. 1109–1116, 2006.
[5] J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, “Acoustic modeling of speaking
styles and emotional expressions in HMM-based speech synthesis,” IEICE Transactions
on Information and Systems, vol. 88, no. 3, pp. 502–509, 2005.
[6] J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker
adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR
adaptation algorithm,” IEEE Transactions on Audio, Speech, and Language Processing,
vol. 17, no. 1, pp. 66–83, 2009.
[7] M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, “Speech synthesis with various
emotional expressions and speaking styles by style interpolation and morphing,”
IEICE Transactions on Information and Systems, vol. 88, no. 11, pp. 2484–2491, 2005.
[8] T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, “A style control technique for
HMM-based expressive speech synthesis,” IEICE Transactions on Information and Systems,
vol. 90, no. 9, pp. 1406–1413, 2007.
[9] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Speaker interpolation
for HMM-based speech synthesis system,” Acoustical Science and Technology,
vol. 21, no. 4, pp. 199–206, 2000.
[10] M. Gales and S. Young, “Robust continuous speech recognition using parallel model
combination,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5,
pp. 352–359, 1996.
[11] K. Tokuda, T. Kobayashi, and S. Imai, “Adaptive cepstral analysis of speech,” IEEE
Transactions on Speech and Audio Processing, vol. 3, no. 6, pp. 481–489, 1995.
[12] T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, “An adaptive algorithm for melcepstral
analysis of speech,” in Proc. ICASSP, vol. 92, pp. 137–140, 1992.
[13] J. Wu, “Pitch Prediction Using Prosody Hierarchy and Dynamic Features for HMMbased
Mandarin Speech Synthesis,” 2008.
[14] C. Huang, Y. Shi, J. Zhou, M. Chu, T. Wang, and E. Chang, “Segmental tonal modeling
for phone set design in Mandarin LVCSR,” in IEEE International Conference on
Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04), 2004.
[15] M. Pucher, D. Schabus, J. Yamagishi, F. Neubarth, and V. Strom, “Modeling and interpolation
of Austrian German and Viennese dialect in HMM-based speech synthesis,”
Speech Communication, vol. 52, no. 2, pp. 164–179, 2010.
[16] M. Gales and S. Young, “Robust speech recognition in additive and convolutional
noise using parallel model combination,” Computer Speech & Language, vol. 9, no. 4,
pp. 289–307, 1995.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.138.101.95
論文開放下載的時間是 校外不公開

Your IP address is 3.138.101.95
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code