Responsive image
博碩士論文 etd-0727113-132932 詳細資訊
Title page for etd-0727113-132932
論文名稱
Title
基於時域上基週同步疊加法之歌聲合成系統
Singing Voice Synthesis System Based On Time Domain Pitch Synchronized Overlap-Add
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
49
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-07-25
繳交日期
Date of Submission
2013-09-02
關鍵字
Keywords
串接合成、歌聲合成、時域上基週同步疊加法
concatenation synthesis, singing synthesis, TD-PSOLA
統計
Statistics
本論文已被瀏覽 5691 次,被下載 1111
The thesis/dissertation has been browsed 5691 times, has been downloaded 1111 times.
中文摘要
在本論文中,我們提出並實作一個串接式的歌聲合成系統,用來產生具有配樂的合成歌聲。語料庫的錄製是根據注音符號檢字表來錄製,並錄製3 種不同的音高。我們使用MIDI 格式中的主旋律來當作合成的資訊,其中包括力度、音符編號、起始時間和結束時間,並加入了轉音的資訊。之後,將MIDI 中的配樂抽取出來,用於合成具有配樂的合成歌聲。在合成單元的處理上,採用時域上基週同步疊加法來對合成單元做時域上的修改。我們提供一個歌曲的選擇介面供使用者來進行歌曲的合成,並加入了一些對於合成歌曲的調整。包括了整體上音符編號的調整、歌詞的修改等等。此外,也做了一些聽測實驗,來進行合成歌曲的品質、清晰度和相似度的評估。品質評估方面,合成歌曲加上配樂有改善的效果。清晰度和相似度評估方面,簡單的歌曲有較好的表現。評測中歌曲的分類為童謠、民謠、抒情、快節奏、悲壯、中國風和節奏藍調七種。本論文提出的方式,可以推廣到其他語言的歌聲合成。此外,也可以應用在哼唱的歌聲合成。
Abstract
In this thesis, we propose and implement a concatenation synthesis system to synthesize the singing voice with background music. For all syllables in phonetic symbols word table, we record three different pitches to build our corpus. The synthesis informations, including velocity, note number, start time and end time are extracted from the main melody in MIDI. Runs and riffs information was added into consideration afterward. We use TD-PSOLA to modify the synthesis units in time domain. At last, we add back the background music extracted from MIDI to our synthesis song. We implemented a user interface for users to synthesize songs. This interface can be used to adjust the synthesis songs, for example, adjust the overall pitches in the song, modify syllables, etc. Finally, we did some experiments to evaluate the quality, clarity and similarity of the synthesis songs. The results show that the proposed method achieve better results with simple songs than with fast songs. In our experiments, the synthesis songs are divided into seven categories, including nursery rhymes, folk, lyrical, fast pace, solemn and stirring, Chinese style, Rhythm and blues. The proposed method can feasibly apply other languages, and can be used in humming singing synthesis.
目次 Table of Contents
Chapter 1 緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.1 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 歌聲合成研究之回顧. . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 研究方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 訊息處理與合成單元的建立. . . . . . . . . . . . . . . . 6
2.1 合成訊息處理. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 6
2.2 音節錄音. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 切音及標音. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 音量處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 合成單元挑取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3 中文歌聲合成方式. . . . . . . . . . . . . . . . . . . . . . 12
3.1 音節音量調整. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 時域上基週同步疊加法簡介. . . . . . . . . . . . . . . . . . . . 12
3.3 後續處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 轉音. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.2 音節串接處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 結合配樂之歌聲合成. . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 4 中文歌聲合成系統的實作. . . . . . . . . . . . . . . . . . 18
4.1 系統建置. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 選擇階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 合成階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 具有配樂之歌唱合成系統的實作. . . . . . . . . . . . . . . . . . 21
Chapter 5 中文歌聲合成實驗. . . . . . . . . . . . . . . . . . .. . . . . 22
5.1 聽測實驗規劃. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.1 品質評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.2 清晰度評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.3 相似度評估. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.4 評估歌曲選擇. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 聽測實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 6 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
參考文獻 References
[1] J. Bonada and A. Loscos, “Sample-based singing voice synthesizer by spectral concatenation,”
in Proceedings of the Stockholm Music Acoustics Conference (SMAC), pp. 1–4,
August 2003.
[2] X. Rodet, “Synthesis and processing of the singing voice,” in Proceedings of the Model
based Processing and Coding of Audio (MPCA), pp. 1–10, November 2002.
[3] H. Kenmochi and H. Ohshita, “VOCALOID - Commercial singing synthesizer based on
sample concatenation,” in Proceedings of 8th Annual Conference of the International
Speech Communication Association (ISCA), Antwerp, Belgium, pp. 4009–4010, August
2007.
[4] T. Nakano and M. Goto, “Vocalistener2: A singing synthesis system able to mimic a
user s singing in terms of voice timbre changes as well as pitch and dynamics,” in Proceedings
of the International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), pp. 453–456, May 2011.
[5] T. Saitou, M. Goto, M. Unoki, and M. Akagi, “Vocal conversion from speaking voice
to singing voice using STRAIGHT,” in Processing of 8th Annual Conference of the
International Speech Communication Association, Antwerp, Belgium, pp. 4005–4006,
August 2007.
[6] C.-Y. Lin, T.-Y. Lin, and J.-S. R. Jang, “A corpus-based singing voice synthesis system
for Mandarin Chinese,” in Proceedings of 13th ACM international conference on
Multimedia, pp. 359–362, 2005.
[7] H.-Y. Gu and H.-L. Liau, “Mandarin singing voice synthesis using an HNM based
scheme,” in Proceedings of 2008 Congress on Image and Signal Processing (CISP),
pp. 347–351, 2008.
[8] J.-C. Wang, H.-Y. Gu, and H.-M. Wang, “Mandarin singing voice synthesis based on
harmonic plus noise model and singing expression analysis,” Technical Report, Spoken
Language Group, Institute of Information Science, Academia Sinica, Taipei, pp. 1–8,
March 2008.
[9] H. Valbret, E. Moulines, and J. P. Tubach, “Voice transformation using PSOLA technique,”
in Proceedingd of the International Conference on Acoustics, Speech and Signal
Processing (ICASSP), vol. 11, pp. 175–187, June 1992.
[10] C. Hamon, E. Moulines, and F. Charpentier, “Diphone synthesis system based on timedomain
prosodic modifications of speech,” in Proceedings of the International Conference
on Acoustics, Speech and Signal Processing (ICASSP), pp. 238–241, 1989.
[11] E. Moulines and F. Charpentier, “Pitch-synchronous waveform processing techniques
for text-to-speech synthesis using diphones,” Speech Communication, vol. 9, pp. 453–
467, December 1990.
[12] V. Colotte and Y. Laprie, “Higher precision pitch marking for TD-PSOLA,” in Proceedings
of 11th European Signal Processing Conference (EUSIPCO), 2002.
[13] F. J. Charpentier and M. Stella, “Diphone synthesis using an overlapadd technique
for speech waveforms,” in Proceedings of the International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 2015–2018, 1986.
[14] “魔鏡歌詞網.” introduction: http://mojim.com/. [Online]. Available.
[15] 網際智慧股份有限公司, “中文拼音查詢.” introduction: http://www.iq-t.
com/SYSCOM/pinyin2.asp, 2004-2013. [Online]. Available.
[16] K. Schutte, “MATLAB and MIDI.” introduction: http://www.kenschutte.
com/midi, 2012. [Online]. Available.
[17] H. Babba and B. Singhal, “Real time staff notation generation of guitar along with
standard guitar tuner,” International Conference on Energy, Automation, and Signal
(ICEAS), pp. 1–5, 2011.
[18] Y. Tabata and T. Shimamura, “Noise robust pitch extraction based on auto-correlation
analysis in the frequency somain,” International Symposium on Intelligent Multimedia,
video and Speech Processing, pp. 193–196, May 2001.
[19] S. Roucos and A. Wilgus, “High-quality time scale modification of speech,” in Proceedingd
of the International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 236–239, 1985.
[20] Mathworks Inc., “Accelerating the pace of engineering and science.” introduction:
http://www.mathworks.com, 1994-2003. [Online]. Available.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code