國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於時域上基週同步疊加法之歌聲合成系統,Singing Voice Synthesis System Based On Time Domain Pitch Synchronized Overlap-Add

論文名稱 Title	基於時域上基週同步疊加法之歌聲合成系統 Singing Voice Synthesis System Based On Time Domain Pitch Synchronized Overlap-Add
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	101 學年度第 2 學期 The spring semester of Academic Year 101	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	49
研究生 Author	吳銘冠 Ming-Kuan Wu
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	吳宗憲 Chung-hsien Wu
口試委員 Advisory Committee	賴玟杏, 王新民 Wen-Hsing Lai; Hsin-Min Wang
口試日期 Date of Exam	2013-07-25	繳交日期 Date of Submission	2013-09-02
關鍵字 Keywords	串接合成、歌聲合成、時域上基週同步疊加法 concatenation synthesis, singing synthesis, TD-PSOLA
統計 Statistics	本論文已被瀏覽 5691 次，被下載 1111 次 The thesis/dissertation has been browsed 5691 times, has been downloaded 1111 times.

中文摘要
在本論文中，我們提出並實作一個串接式的歌聲合成系統，用來產生具有配樂的合成歌聲。語料庫的錄製是根據注音符號檢字表來錄製，並錄製3 種不同的音高。我們使用MIDI 格式中的主旋律來當作合成的資訊，其中包括力度、音符編號、起始時間和結束時間，並加入了轉音的資訊。之後，將MIDI 中的配樂抽取出來，用於合成具有配樂的合成歌聲。在合成單元的處理上，採用時域上基週同步疊加法來對合成單元做時域上的修改。我們提供一個歌曲的選擇介面供使用者來進行歌曲的合成，並加入了一些對於合成歌曲的調整。包括了整體上音符編號的調整、歌詞的修改等等。此外，也做了一些聽測實驗，來進行合成歌曲的品質、清晰度和相似度的評估。品質評估方面，合成歌曲加上配樂有改善的效果。清晰度和相似度評估方面，簡單的歌曲有較好的表現。評測中歌曲的分類為童謠、民謠、抒情、快節奏、悲壯、中國風和節奏藍調七種。本論文提出的方式，可以推廣到其他語言的歌聲合成。此外，也可以應用在哼唱的歌聲合成。
Abstract
In this thesis, we propose and implement a concatenation synthesis system to synthesize the singing voice with background music. For all syllables in phonetic symbols word table, we record three different pitches to build our corpus. The synthesis informations, including velocity, note number, start time and end time are extracted from the main melody in MIDI. Runs and riffs information was added into consideration afterward. We use TD-PSOLA to modify the synthesis units in time domain. At last, we add back the background music extracted from MIDI to our synthesis song. We implemented a user interface for users to synthesize songs. This interface can be used to adjust the synthesis songs, for example, adjust the overall pitches in the song, modify syllables, etc. Finally, we did some experiments to evaluate the quality, clarity and similarity of the synthesis songs. The results show that the proposed method achieve better results with simple songs than with fast songs. In our experiments, the synthesis songs are divided into seven categories, including nursery rhymes, folk, lyrical, fast pace, solemn and stirring, Chinese style, Rhythm and blues. The proposed method can feasibly apply other languages, and can be used in humming singing synthesis.

目次 Table of Contents
Chapter 1 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 歌聲合成研究之回顧. . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 研究方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 訊息處理與合成單元的建立. . . . . . . . . . . . . . . . 6 2.1 合成訊息處理. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 6 2.2 音節錄音. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 切音及標音. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 音量處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 合成單元挑取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3 中文歌聲合成方式. . . . . . . . . . . . . . . . . . . . . . 12 3.1 音節音量調整. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 時域上基週同步疊加法簡介. . . . . . . . . . . . . . . . . . . . 12 3.3 後續處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.1 轉音. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.2 音節串接處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 結合配樂之歌聲合成. . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 4 中文歌聲合成系統的實作. . . . . . . . . . . . . . . . . . 18 4.1 系統建置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 選擇階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.2 合成階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 具有配樂之歌唱合成系統的實作. . . . . . . . . . . . . . . . . . 21 Chapter 5 中文歌聲合成實驗. . . . . . . . . . . . . . . . . . .. . . . . 22 5.1 聽測實驗規劃. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.1 品質評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.2 清晰度評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1.3 相似度評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1.4 評估歌曲選擇. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 聽測實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Chapter 6 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

參考文獻 References
[1] J. Bonada and A. Loscos, “Sample-based singing voice synthesizer by spectral concatenation,” in Proceedings of the Stockholm Music Acoustics Conference (SMAC), pp. 1–4, August 2003. [2] X. Rodet, “Synthesis and processing of the singing voice,” in Proceedings of the Model based Processing and Coding of Audio (MPCA), pp. 1–10, November 2002. [3] H. Kenmochi and H. Ohshita, “VOCALOID - Commercial singing synthesizer based on sample concatenation,” in Proceedings of 8th Annual Conference of the International Speech Communication Association (ISCA), Antwerp, Belgium, pp. 4009–4010, August 2007. [4] T. Nakano and M. Goto, “Vocalistener2: A singing synthesis system able to mimic a user s singing in terms of voice timbre changes as well as pitch and dynamics,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 453–456, May 2011. [5] T. Saitou, M. Goto, M. Unoki, and M. Akagi, “Vocal conversion from speaking voice to singing voice using STRAIGHT,” in Processing of 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, pp. 4005–4006, August 2007. [6] C.-Y. Lin, T.-Y. Lin, and J.-S. R. Jang, “A corpus-based singing voice synthesis system for Mandarin Chinese,” in Proceedings of 13th ACM international conference on Multimedia, pp. 359–362, 2005. [7] H.-Y. Gu and H.-L. Liau, “Mandarin singing voice synthesis using an HNM based scheme,” in Proceedings of 2008 Congress on Image and Signal Processing (CISP), pp. 347–351, 2008. [8] J.-C. Wang, H.-Y. Gu, and H.-M. Wang, “Mandarin singing voice synthesis based on harmonic plus noise model and singing expression analysis,” Technical Report, Spoken Language Group, Institute of Information Science, Academia Sinica, Taipei, pp. 1–8, March 2008. [9] H. Valbret, E. Moulines, and J. P. Tubach, “Voice transformation using PSOLA technique,” in Proceedingd of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 11, pp. 175–187, June 1992. [10] C. Hamon, E. Moulines, and F. Charpentier, “Diphone synthesis system based on timedomain prosodic modifications of speech,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 238–241, 1989. [11] E. Moulines and F. Charpentier, “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Communication, vol. 9, pp. 453– 467, December 1990. [12] V. Colotte and Y. Laprie, “Higher precision pitch marking for TD-PSOLA,” in Proceedings of 11th European Signal Processing Conference (EUSIPCO), 2002. [13] F. J. Charpentier and M. Stella, “Diphone synthesis using an overlapadd technique for speech waveforms,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2015–2018, 1986. [14] “魔鏡歌詞網.” introduction: http://mojim.com/. [Online]. Available. [15] 網際智慧股份有限公司, “中文拼音查詢.” introduction: http://www.iq-t. com/SYSCOM/pinyin2.asp, 2004-2013. [Online]. Available. [16] K. Schutte, “MATLAB and MIDI.” introduction: http://www.kenschutte. com/midi, 2012. [Online]. Available. [17] H. Babba and B. Singhal, “Real time staff notation generation of guitar along with standard guitar tuner,” International Conference on Energy, Automation, and Signal (ICEAS), pp. 1–5, 2011. [18] Y. Tabata and T. Shimamura, “Noise robust pitch extraction based on auto-correlation analysis in the frequency somain,” International Symposium on Intelligent Multimedia, video and Speech Processing, pp. 193–196, May 2001. [19] S. Roucos and A. Wilgus, “High-quality time scale modification of speech,” in Proceedingd of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 236–239, 1985. [20] Mathworks Inc., “Accelerating the pace of engineering and science.” introduction: http://www.mathworks.com, 1994-2003. [Online]. Available.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0727113-132932.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS