國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,流行音樂之主副歌判別與其情緒分析,Popular Music Analysis: Chorus and Emotion Detection

論文名稱 Title	流行音樂之主副歌判別與其情緒分析 Popular Music Analysis: Chorus and Emotion Detection
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	98 學年度第 2 學期 The spring semester of Academic Year 98	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	102
研究生 Author	林裕敦 Yu-Dun Lin
指導教授 Advisor	葉家宏 Chia-Hung Yeh
召集委員 Convenor	李明穗 Ming-Sui Lee
口試委員 Advisory Committee	許蒼嶺, 邱日清, 周孜燦 Tsang-Ling Sheu; Jih-ching Chiu; Zi-Tsan Chou
口試日期 Date of Exam	2010-07-06	繳交日期 Date of Submission	2010-08-16
關鍵字 Keywords	梅爾倒頻譜係數、情緒、韻律、節奏、類神經網路 tempo, MFCCs, rhythm, emotion, neural network
統計 Statistics	本論文已被瀏覽 5642 次，被下載 0 次 The thesis/dissertation has been browsed 5642 times, has been downloaded 0 times.

中文摘要
近年來隨著多媒體資訊的發展迅速，流行音樂的取得越來越方便，音樂檢索系統的需求也因應而生。此外自2007 年起，許多歌唱比賽與節目受到許多關注，其評斷參賽者好壞的依據變是評估其是否唱出與原唱相同的情緒並且與觀眾達到共鳴。在本篇論文中，我們提出了副歌偵測與情緒偵測來完成以情緒為分類的音樂檢索系統。副歌是一首歌的靈魂，也是表達主要情緒的部分，因此首先要偵測副歌的片段。副歌偵測是透過擷取各個音樂頻段並產生colormap，再透過色彩分類法colormap 切割成不同片段來顯示一首歌的音樂結構，最後分別計算各個片段的梅爾倒頻譜係數(MFCCs)與相關性(Similiarity)來決定其片段是否為副歌。偵測到副歌片段之後，基於Thayer 的情緒模型，我們在副歌片段擷取其強度、韻律以及節奏來表達此情緒模型。在本篇論文中，我們分別利用類神經網路分類器與Adaboost 分類器來訓練與測試情緒分類的準確率。實驗結果顯示副歌偵測部分準確率與情緒偵測部分皆達到88％以上，因此我們可以藉由副歌與情緒偵測達到以情緒為分類的音樂檢索系統。
Abstract
In this thesis, a chorus detection and an emotion detection algorithm for popular music are proposed. First, a popular music is decomposed into chorus and verse segments based on its color representation and MFCCs (Mel-frequency cepstral coefficients). Four features including intensity, tempo and rhythm regularity are extracted from these structured segments for emotion detection. The emotion of a song is classified into four classes of emotions: happy, angry, depressed and relaxed via two classification methods. One is back-propagation neural network classifier and the other is Adaboost classifier. A test database consisting of 350 popular music songs is utilized in our experiment. Experimental results show that the average recall and precision of the proposed chorus detection are approximated to 95% and 84%, respectively; the average precision rate of emotion detection is 86% for neural network classifier and 92% for Adaboost classifier. The emotions of a song with different cover versions are also detected in our experiment. The precision rate is 92%.

目次 Table of Contents
中文摘要 ................................................................................................................. i Abstract ................................................................................................................ ii Contents .............................................................................................................. iii List of Figures ................................................................................................................ v List of Tables .............................................................................................................. vii Chpater 1 Introduction ................................................................................................... 1 1.1 Overview of Music ................................................................................. 1 1.2 Music Retrieval System ......................................................................... 5 1.3 Motivation .............................................................................................. 9 1.4 Contribution ......................................................................................... 11 1.5 Organization ......................................................................................... 12 Chpater 2 Background Review .................................................................................... 13 2.1 Audio Signal Processing ...................................................................... 14 2.2 Audio Features ..................................................................................... 19 2.3 Chorus Detection and Emotion Model ................................................. 23 2.4 Adaboost .............................................................................................. 27 Chpater 3 Chorus Detection ......................................................................................... 33 3.1 Overview .............................................................................................. 36 3.2 Colormap Generation ........................................................................... 38 3.3 Chorus and Verse Designation ............................................................. 42 Chpater 4 Emotion Detection ....................................................................................... 49 4.1 Overview .............................................................................................. 49 4.2 Preprocessing ....................................................................................... 54 4.3 Neural Network Classifier.................................................................... 60 4.4 Adaboost Classifier .............................................................................. 62 Chpater 5 Experimental Results ................................................................................... 67 5.1 Chorus Detection .................................................................................. 68 5.2 Emotion detection ................................................................................ 73 5.3 Emotion Detection of Cover Songs...................................................... 78 5.4 Discussion ............................................................................................ 80 Chpater 6 Conclusions and Future Work ..................................................................... 83 Reference .............................................................................................................. 85 Curriculum Vitae .......................................................................................................... 90 Publications .............................................................................................................. 91

參考文獻 References
[1] K. Negus, “Popular music in theory: An introduction,” Hanover and London, Wesleyan university press, 1996. [2] R. Shuker, “Understanding popular music culture,” Routledge, 2007 [3] D. Stein, “Engaging music: Essay in music analysis,” New York, Oxford university press, 2005. [4] P. Juslin and A. Sloboda, “Music and emotion,” Oxford university press, October, 2001. [5] C. E. Bolte, “Secrets of successful song writing,” Arco publishing, New York, 1984. [6] 1soundfx, “http://www.1soundfx.com/” [7] Y. Li, S.-H. Lee, C.-H. Yeh, and C.-C. Jay Kuo, “Techniques for movie content analysis and skimming,” IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 79~89, March 2006. [8] S.-H. Lee, C.-H. Yeh, and C. -C. Jay Kuo, “Automatic movie skimming with story units via general tempo analysis,” in Proceedings of SPIE Electronic Image Storage and Retrieval Methods and Applications for Multimedia, vol. 5307, pp. 396-407, 2004. [9] N. Kosugi, Y. Nishihara, S. Kon'ya, M. Yamanuro, and K. Kushima, “Music retrieval by humming,” in Proceedings of Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 404-407, 1999. [10] N. Kosugi, Y. Nishihara, , T. Sakata, M. Yamamuro, and K. Kushima, “A practical query-by humming system for a large music database,” in Proceedings of the 8th ACM, pp. 333-342, 2000. [11] M. Lslam, H. Lee, A. Paul, and J. Baek, “Content-based music retrieval using beat information,” in Proceedings of International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 317-321, 2007. [12] R. McNab, L. Smith, I. Witten, C. Henderson, and S. Cunningham, “Towards the digital music library: tune retrieval form acoustic input,” in Proceedings of ACM Digital Libraries’96, pp. 11-18, 1996. [13] S. Blackburn and D. DeRoure, “A tool for content based navigation of music,” in Proceedings of the 6th ACM Multimedia, pp. 361-368, 1998. [14] R. Lowrance and R.A. Wagner, “An extension of the string-to-string correction problem,” Journal of the ACM, vol. 22, pp. 177–183, April 1975. [15] T. Tsai, and J. Hung, “Content-based retrieval of mp3 songs for one singer using quantization tree indexing and melody-line tracking method,” in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 5, pp. 505-508, May 2006. [16] F. Kuo and M. Shan, “Music retrieval by melody style,” in Proceedings of International Symposium on Multimedia, pp. 613-618, 2009. [17] T. Mulder, J. Martens, S. Pauws, F. Vignoli, M. Lesaffre, M. Lenman, B. Baets, and H. Meyer, “Factors affecting music retrieval in query by melody,” IEEE Transactions on Multimedia, vol.8, pp. 728-739, 2006. [18] Y. Zhu, C. Xu, and M. Kankanhalli, “Melody curve processing for music retrieval,” in Proceedings of International conference on Multimedia and Expo, pp. 285-288, 2003. [19] R.Cai, C. Zhang, L. Zhang, and W. Ma, “Scalable music recommendation by search,” in Proceedings of the 15th international conference on Multimedia, pp. 1065-1074, 2007. [20] http://www.moodagent.com [21] C. E. Shannon and W. Weaver, “The mathematical theory of communication,” University of Illinois press, 1949. [22] Y. Shiu, H. Jeong, & C.-C. Jay Kuo, “Similar segment detection for music structure analysis via Viterbi algorithm,” in Proceedings of IEEE international Conference on Multimedia and Expo., pp. 789-792, 2006. [23] B. Mark A, & W. H. Gregory, “To catch a chorus: using chroma-based representations for audio thumbnailing,” in Proceedings of IEEE workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 15-18, 2001. [24] C. Mathew & J. Foote, “Automatic music summarization via similarity analysis,” in Proceedings of International Conference on Music Information Retrieval, pp. 81-85, 2002. [25] C. Matthew & J. Foote, “Summarizing popular music via structural similarity analysis,” in Proceedings of IEEE workshop on the Applications of Signal Processing to Audio and Acoustics, pp. 127-130, 2003. [26] J. Foote, “Visualizing music and audio using self-similarity,” in Proceedings of ACM Multimedia, pp. 77-80, November 1999. [27] W. Dowling, and J. Harwood, “Music Cognition,” Academic Press, pp. 202, December 1985. [28] R. Thayer, “The biopsychology of mood and arousal,” Oxford university press, May, 1989. [29] L. Lu, D. Liu, and H.-J. Zhang, “Automatic mood detection and tracking of music audio signals,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 5-18, December 2005. [30] D. Yang and W. Lee, “Disambiguating music emotion using software agents,” in Proceedings of International workshop on Human-centered multimedia, pp. 52-58, 2004. [31] A. Tellegen, D. Watson, and L. Clark, “On the dimensional and hierarchical structure of affect,” Psychological Science, vol. 10, no. 4, pp. 297-303, July 1999. [32] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, vol. 55, August 1997. [33] G. Li, C. An, J. Pang, M. Tan and X. Tu, “Color image adaptive clustering segmentation,” in Proceedings of Third International Conference on Image and and Graphics, pp.104-107, 2004. [34] S. C. Ahalt, A. K. Krishnamurty, P. Chen, and D. E. Melton, “ Competitive learning algorithms for vector quantization,” Neural Networks, vol. 3, pp. 277-291,1990. [35] L. Xu, and A. Krzyzak, “Rival penalized competitive learning for clustering analysis, RBF Net, and curve detection,” IEEE Transactions on Neural Networks, vol. 4, no. 4, July 1993. [36] D. E. Rumelhart and D. Zipser, “Feature discovery by competitive learning,” Cognitive Science, vol. 9, pp. 75-112, 1985. [37] S. Grossberg, “Competitive learning: from iterative activation to adaptive resonance,” Cognitive Science, vol. 11, pp. 23-63, 1987. [38] R. Hecht-Nielsen, “Counter propagation networks,” Applied Optics, vol. 26, pp. 4979-4984, 1987. [39] M. Cooper, and J. Foote, “Scene boundary detection via video self-similarity analysis,” in Proceedings of International Conference on Image Processing, vol.3, p.p. 378-381, 2001. [40] H. T. Chen, M. H. Hsiao, W. J. Tsai, S. Y. Lee, and J. Y. Yo, “A tempo analysis system for automatic music accompaniment,” in Proceedings on IEEE Multimedia and Expo., pp. 64-67, 2007. [41] A. Schutz, and D. Slock, “Periodic signal modeling for the octave problem in music transcription,” in Proceedings of Digital Signal Processing, pp. 1-6, 2009.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.15.235.196 論文開放下載的時間是校外不公開 Your IP address is 3.15.235.196 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS