國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,片段音樂情緒辨識,Emotion Recognition of Music Clips

論文名稱 Title	片段音樂情緒辨識 Emotion Recognition of Music Clips
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	102 學年度第 2 學期 The spring semester of Academic Year 102	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	60
研究生 Author	許又壬 Yu-jen Hsu
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	吳宗憲 Chung-hsien Wu
口試委員 Advisory Committee	王新民 Hsin-Min Wang
口試日期 Date of Exam	2014-07-24	繳交日期 Date of Submission	2014-09-01
關鍵字 Keywords	後處理、融合架構、支撐向量機、音樂情緒辨識 Post-Process, Fusion Architecture, SVM, Music Emotion Recognition
統計 Statistics	本論文已被瀏覽 5719 次，被下載 2386 次 The thesis/dissertation has been browsed 5719 times, has been downloaded 2386 times.

中文摘要
本篇論文提出一個對短時間的歌曲作音樂情緒辨識的架構。首先，我們採用泰爾平面的音樂情緒空間，將情緒分成4 種情緒類別，分別對應到泰爾平面上。其次利用兩種不同工具來擷取音樂特徵，分別為Psysound3和MIRtoolbox，總共擷取25種特徵。我們的音樂資料庫是由525個英文歌曲的音樂片段所組成，每個音樂片段皆為10秒鐘。500首歌曲的音樂片段當作訓練資料，25首歌曲的音樂片段當作測試資料。訓練資料經由特徵擷取之後，我們會取得40維和37維的特徵向量。接下來我們利用訓練資料訓練SVM模型。最後，利用投票策略來決定音樂情緒分類結果。最高票的類別即代表該首測試音樂的情緒。此外，我們實作了兩種融合的方法。分別為早期的融合架構和晚期的融合架構。早期的融合架構是直接將40維和37維特徵向量結合成77維的特徵向量，拿來訓練模型和辨識分類結果；晚期的融合架構則是將40維和37維經由訓練模型及辨識結果後得到決策值，之後將決策值當作新的特徵向量融合來訓練模型並得到新的辨識結果。最後，我們提出一個後處理實驗架構，利用得到的混淆矩陣表搭配演算法決定出最不好的分類情緒。而剩下的三類情緒和此情緒的兩類分類器重新比對，獲得改善後的實驗結果。其實驗結果平均為81.9%，其中快樂類別的正確率高達90.0%以上。
Abstract
In this paper, we propose the architecture of music emotion recognition for short time songs. First, we use Thayer’s 2-Dimentional plane as our music emotion model. We also divide into 4 kind of emotional classes and correspond to Thayer’s 2-Dimentional plane respectively. Then, using 2 different tools like Psysound3 and MIRtoolbox to extract music features. We totally extract 25 kind of music features. Our music database consists of 525 music clips in English. Each music clip is only a short 10 seconds. 500 clips are for training data, 25 clips are for testing data. After feature extraction, we obtain 40 and 37 dimensional feature vector. Next, we use training data to train SVM model. Finally, using voting strategy in predicting process to decide music emotion classification result. The class which gets high votes represents the emotion of the test music clip for emotion recognition. Besides, we implement 2 kind of fusion methods. There are early and late fusion architecture respectively. Early fusion architecture is merging 40 and 37 dimensional to 77 dimensional feature vector. Then, train SVM model and predict classification result. Late fusion architecture is using 40 and 37 dimensional feature vector to train SVM model, respectively. Next, we can get decision values in predicting process as our new feature vector. Final, combining new feature vector to train new model and predict new classification result. Finally, we propose the architecture of post-processing experiment. We use confusion matrix table coping with algorithm to decide what emotional class is worst. While the rest emotional classes compare with 2 categories classifier of worst emotional class. Then, we can get improved result. The experimental result’s average is 81.9%, the accuracy in happy class is even more than 90.0%.

目次 Table of Contents
論文審定書 ii Acknowledgments iii 摘要 iv ABSTRACT v Table of Contents vii List of Tables ix List of Figures x Chapter 1 介紹 1 1.1 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 系統架構 4 2.1 架構流程圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 情緒模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 人工標籤. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 3 音訊分析及訓練模型 9 3.1 特徵擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1 Psysound3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 MIR toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 訓練模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 支撐向量機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 將支撐向量機用於多元分類以及如何給樣本分數. . . . . . . . . 21 Chapter 4 系統描述 23 4.1 音樂資料庫收集及情緒表達度測試. . . . . . . . . . . . . . . . . . . . . 23 4.2 實驗流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2.1 未融合前的實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1.1 40維實驗結果( baseline ) . . . . . . . . . . . . . . . . . . 25 4.2.1.2 37維實驗結果. . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 融合後的實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.2.1 早期的融合架構. . . . . . . . . . . . . . . . . . . . . . 26 4.2.2.2 晚期的融合架構. . . . . . . . . . . . . . . . . . . . . . 27 4.2.3 針對40維做處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2.4 兩類分類器利用兩層SVM的實驗. . . . . . . . . . . . . . . . . . 31 4.2.5 加入不同預測方法. . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.5.1 分數策略. . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.5.2 Sigmoid策略. . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.6 後處理實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.7 DSVM 實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Chapter 5 總結及未來展望 41 5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

參考文獻 References
[1] P. N. Juslin and J. A. Sloboda, Music and Emotion: Theory and Research. Oxford University Press, 2001. [2] C.-F. Huang, “Unveil the Mysteries behind Emotions Provoked by Music,” Journal of Aesthetic Education, NO.186, March 2012. [3] 楊奕軒, “音樂中的喜怒哀愁—淺談「音樂情緒辨識」,” March 2012. 台灣數位文化協會PanSci泛科學網. [4] X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Ehmann, “The 2007 MIREX Audio Mood Classification Task: Lessons Learned,” in proceedings of the 9th International Conference on Music Information Retrieval, Philadelphia, USA, pp. 462–467, September 2008. [5] 陳傳祐, “Advanced Digital Signal Processing Tutorial - Music Emotion Recognition,” 2012. [6] C.-H. Yeh, H.-H. Lin, and H.-T. Chang, “An Efficient Emotion Detection Scheme for Popular Music,” in proceedings of 2009 IEEE International Symposium on Circuits and Systems (ISCAS), Taipei, Taiwan, pp. 1799–1802, May 2009. [7] 林耿生, “Music Emotion Distribution Prediction,” 2012. [8] F. Miller, M. Stiksel, and R. Jones, Last.fm in numbers. February 2008. Last.fm press material. [9] Y. H. Yang, Y. C. Lin, H. T. Cheng, and H. H. Chen, “Mr. Emo: music retrieval in the emotion plane,” in Proceedings of the 16th ACM international conference on Multimedia, New York, USA, pp. 1003–1004, 2008. [10] Y.-H. Yang, “Music emotion recognition: The role of individuality,” in proceedings of the international workshop on human-centered multimedia, New York, USA, p. 13–22, 2007. [11] L. Lu, D. Liu, and H. Zhang, “Automatic Mood Detection and Tracking of Music Audio Signals,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, pp. 5– 18, december 2006. [12] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, “A Regression Approach to Music Emotion Recognition,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 2, pp. 448–457, 2008. [13] C. Laurier, M. Sordo, J. Serra, and P. Herrera, “Music Mood Representation from Social Tags,” in Proceedings of the International Society for Music Information Retrieval( ISMIR), Kobe, Japan, pp. 381–386, October 2009. [14] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet, “Semantic Annotation and Retrieval of Music and Sound Effects,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 2, pp. 467–476, 2008. [15] D. Cabrera, “Psysound: A Computer Program for Psychoacoustical Analysis,” in In Proceedings of Australian Acoustical Society Conference, Melbourne, pp. 47–54, November 1999. [Online]http://www.psysound.org. [16] 語言學名詞審定委員會, 語言學名詞. 北京商務印書館, 2011. [17] 吳宗濟，林茂燦主編, 實驗語言學概要. 北京高等教育出版社, 1989. [18] Zwicker and H. F. Eberhard, Psychoacoustics - Facts and Models. 2007. [19] W. Aures, “Berechnungsverfahren f¨ur den sensorischenWohlklang beliebiger Schallsignale,” in Acustica, 59, pp. 130–141, 1985. [20] S. Malloch, Timbre and technology. PhD thesis, (PhD thesis,University of Edinburgh, 1997. [21] 張智星, “Audio Signal Processing and Recognition,” March 2009. [22] W. Hutchinson and L. Knopoff, “The acoustical component of western consonance,” in Interface, 7, pp. 1–29, 1978. [23] Sethares and W. A., “Local consonance and the relationship between timbre and scale,” The Journal of the Acoustical Society of America, vol. 94, no. 3, pp. 1218–1228, 1993. [24] Y.-H. Yang and H. H. Chen, Music Emotion Recognition. Multimedia Computing, Communication and Intelligence, CRC Press Taylor & Francis Group,LLC, 2011. [25] Sethares and W. A., Tuning, timbre, spectrum, scale. London: Springer, 2nd ed ed., 2005. [26] R. Parncutt, Harmony : A Psychoacoustical Approach, vol. 19. Springer, 1989. [27] O. Lartillot and P. Toiviainen, “MIR in MATLAB (II): A Toolbox for Musical Feature Extraction from Audio,” in In Proceedings of International Conference on Music Information Retrieval, pp. 127–130, Austrian Computer Society, 2007. [Online]http: //users.jyu.fi/lartillo/mirtoolbox/. [28] A. Klapuri, “Sound Onset Detection by Applying Psychoacoustic Knowledge,” in in proceedings of the IEEE International Conference of Acoustics, Speech, and Signal Processing, Washington, DC, USA, vol. 6, pp. 115–118, 1999. [29] J. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. Sandler, “A Tutorial on Onset Detection in Music Signals,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5-2, pp. 1035–1047, 2005. [30] Y.-H. Chin, C.-H. Lin, E. Siahaan, I.-C.Wang, and J.-C.Wang, “Music Emotion Classification Using Double-Layer Support Vector Machines,” in In Proceedings IEEE International Conference on Orange Technologies(ICOT), pp. 193–196, March 2013. [31] F.-C. Hwang, J. Wang, P.-C. Chung, and C.-F. Yang, “Detecting Emotional Expression of Music with Feature Selection Approach,” in In Proceedings IEEE International Conference on Orange Technologies(ICOT), pp. 282–286, March 2013. [32] G. Tzanetakis and P. Cook, “Music Genre Classification of Audio Signals,” IEEE Transactions on Speech and Audio Processing, vol. 10, pp. 293–302, July 2002. [Online] http://marsyas.sness.net/. [33] C.-C. Chang and C.-J. Lin, “LIBSVM : A Library for Support Vector Machines,” ACM Transactions on Intelligent Systems and Technology(TIST), vol. 2, pp. 27:1–27:27, May 2011. [34] B. Boser, I. Guyon, and V. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” in Proceedings of The 5th Annual Workshop on Computational Learning Theory (COLT’92), Pittsburgh, PA, USA, pp. 144–152, ACM Press, July 1992. [35] 林宗勳, “Support vector machine簡介,” 2006. [36] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A Pratical Guide to Support Vector Classification,” tech. rep., Department of Computer Science, National Taiwan University, 2003.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0801114-112329.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS