Responsive image
博碩士論文 etd-0801114-112329 詳細資訊
Title page for etd-0801114-112329
論文名稱
Title
片段音樂情緒辨識
Emotion Recognition of Music Clips
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
60
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-07-24
繳交日期
Date of Submission
2014-09-01
關鍵字
Keywords
後處理、融合架構、支撐向量機、音樂情緒辨識
Post-Process, Fusion Architecture, SVM, Music Emotion Recognition
統計
Statistics
本論文已被瀏覽 5719 次,被下載 2386
The thesis/dissertation has been browsed 5719 times, has been downloaded 2386 times.
中文摘要
本篇論文提出一個對短時間的歌曲作音樂情緒辨識的架構。首先,我們採用泰爾平面的音樂情緒空間,將情緒分成4 種情緒類別,分別對應到泰爾平面上。其次利用兩種不同工具來擷取音樂特徵,分別為Psysound3和MIRtoolbox,總共擷取25種特徵。我們的音樂資料庫是由525個英文歌曲的音樂片段所組成,每個音樂片段皆為10秒鐘。500首歌曲的音樂片段當作訓練資料,25首歌曲的音樂片段當作測試資料。訓練資料經由特徵擷取之後,我們會取得40維和37維的特徵向量。接下來我們利用訓練資料訓練SVM模型。最後,利用投票策略來決定音樂情緒分類結果。最高票的類別即代表該首測試音樂的情緒。此外,我們實作了兩種融合的方法。分別為早期的融合架構和晚期的融合架構。早期的融合架構是直接將40維和37維特徵向量結合成77維的特徵向量,拿來訓練模型和辨識分類結果;晚期的融合架構則是將40維和37維經由訓練模型及辨識結果後得到決策值,之後將決策值當作新的特徵向量融合來訓練模型並得到新的辨識結果。最後,我們提出一個後處理實驗架構,利用得到的混淆矩陣表搭配演算法決定出最不好的分類情緒。而剩下的三類情緒和此情緒的兩類分類器重新比對,獲得改善後的實驗結果。其實驗結果平均為81.9%,其中快樂類別的正確率高達90.0%以上。
Abstract
In this paper, we propose the architecture of music emotion recognition for short time songs. First, we use Thayer’s 2-Dimentional plane as our music emotion model. We also divide into 4 kind of emotional classes and correspond to Thayer’s 2-Dimentional plane respectively. Then, using 2 different tools like Psysound3 and MIRtoolbox to extract music features. We totally extract 25 kind of music features. Our music database consists of 525 music clips in English. Each music clip is only a short 10 seconds. 500 clips are for training data, 25 clips are for testing data. After feature extraction, we obtain 40 and 37 dimensional feature vector. Next, we use training data to train SVM model. Finally, using voting strategy in predicting process to decide music emotion classification result. The class which gets high votes represents the emotion of the test music clip for emotion recognition. Besides, we implement 2 kind of fusion methods. There are early and late fusion architecture respectively. Early fusion architecture is merging 40 and 37 dimensional to 77 dimensional feature vector. Then, train SVM model and predict classification result. Late fusion architecture is using 40 and 37 dimensional feature vector to train SVM model, respectively. Next, we can get decision values in predicting process as our new feature vector. Final, combining new feature vector to train new model and predict new classification result. Finally, we propose the architecture of post-processing experiment. We use confusion matrix table coping with algorithm to decide what emotional class is worst. While the rest emotional classes compare with 2 categories classifier of worst emotional class. Then, we can get improved result. The experimental result’s average is 81.9%, the accuracy in happy class is even more than 90.0%.
目次 Table of Contents
論文審定書 ii
Acknowledgments iii
摘要 iv
ABSTRACT v
Table of Contents vii
List of Tables ix
List of Figures x
Chapter 1 介紹 1
1.1 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 系統架構 4
2.1 架構流程圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 情緒模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 人工標籤. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 3 音訊分析及訓練模型 9
3.1 特徵擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Psysound3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 MIR toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 訓練模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 支撐向量機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 將支撐向量機用於多元分類以及如何給樣本分數. . . . . . . . . 21
Chapter 4 系統描述 23
4.1 音樂資料庫收集及情緒表達度測試. . . . . . . . . . . . . . . . . . . . . 23
4.2 實驗流程. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 未融合前的實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1.1 40維實驗結果( baseline ) . . . . . . . . . . . . . . . . . . 25
4.2.1.2 37維實驗結果. . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 融合後的實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2.1 早期的融合架構. . . . . . . . . . . . . . . . . . . . . . 26
4.2.2.2 晚期的融合架構. . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 針對40維做處理. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.4 兩類分類器利用兩層SVM的實驗. . . . . . . . . . . . . . . . . . 31
4.2.5 加入不同預測方法. . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.5.1 分數策略. . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.5.2 Sigmoid策略. . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.6 後處理實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.7 DSVM 實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 5 總結及未來展望 41
5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
參考文獻 References
[1] P. N. Juslin and J. A. Sloboda, Music and Emotion: Theory and Research. Oxford
University Press, 2001.
[2] C.-F. Huang, “Unveil the Mysteries behind Emotions Provoked by Music,” Journal of
Aesthetic Education, NO.186, March 2012.
[3] 楊奕軒, “音樂中的喜怒哀愁—淺談「音樂情緒辨識」,” March 2012. 台灣數位文
化協會PanSci泛科學網.
[4] X. Hu, J. S. Downie, C. Laurier, M. Bay, and A. F. Ehmann, “The 2007 MIREX Audio
Mood Classification Task: Lessons Learned,” in proceedings of the 9th International
Conference on Music Information Retrieval, Philadelphia, USA, pp. 462–467, September
2008.
[5] 陳傳祐, “Advanced Digital Signal Processing Tutorial - Music Emotion Recognition,”
2012.
[6] C.-H. Yeh, H.-H. Lin, and H.-T. Chang, “An Efficient Emotion Detection Scheme for
Popular Music,” in proceedings of 2009 IEEE International Symposium on Circuits and
Systems (ISCAS), Taipei, Taiwan, pp. 1799–1802, May 2009.
[7] 林耿生, “Music Emotion Distribution Prediction,” 2012.
[8] F. Miller, M. Stiksel, and R. Jones, Last.fm in numbers. February 2008. Last.fm press
material.
[9] Y. H. Yang, Y. C. Lin, H. T. Cheng, and H. H. Chen, “Mr. Emo: music retrieval in the
emotion plane,” in Proceedings of the 16th ACM international conference on Multimedia,
New York, USA, pp. 1003–1004, 2008.
[10] Y.-H. Yang, “Music emotion recognition: The role of individuality,” in proceedings of
the international workshop on human-centered multimedia, New York, USA, p. 13–22,
2007.
[11] L. Lu, D. Liu, and H. Zhang, “Automatic Mood Detection and Tracking of Music Audio
Signals,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, pp. 5–
18, december 2006.
[12] Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, “A Regression Approach to Music
Emotion Recognition,” IEEE Transactions on Audio, Speech and Language Processing,
vol. 16, no. 2, pp. 448–457, 2008.
[13] C. Laurier, M. Sordo, J. Serra, and P. Herrera, “Music Mood Representation from
Social Tags,” in Proceedings of the International Society for Music Information Retrieval(
ISMIR), Kobe, Japan, pp. 381–386, October 2009.
[14] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet, “Semantic Annotation and Retrieval
of Music and Sound Effects,” IEEE Transactions on Audio, Speech and Language
Processing, vol. 16, no. 2, pp. 467–476, 2008.
[15] D. Cabrera, “Psysound: A Computer Program for Psychoacoustical Analysis,” in In Proceedings
of Australian Acoustical Society Conference, Melbourne, pp. 47–54, November
1999. [Online]http://www.psysound.org.
[16] 語言學名詞審定委員會, 語言學名詞. 北京商務印書館, 2011.
[17] 吳宗濟,林茂燦主編, 實驗語言學概要. 北京高等教育出版社, 1989.
[18] Zwicker and H. F. Eberhard, Psychoacoustics - Facts and Models. 2007.
[19] W. Aures, “Berechnungsverfahren f¨ur den sensorischenWohlklang beliebiger Schallsignale,”
in Acustica, 59, pp. 130–141, 1985.
[20] S. Malloch, Timbre and technology. PhD thesis, (PhD thesis,University of Edinburgh,
1997.
[21] 張智星, “Audio Signal Processing and Recognition,” March 2009.
[22] W. Hutchinson and L. Knopoff, “The acoustical component of western consonance,” in
Interface, 7, pp. 1–29, 1978.
[23] Sethares and W. A., “Local consonance and the relationship between timbre and scale,”
The Journal of the Acoustical Society of America, vol. 94, no. 3, pp. 1218–1228, 1993.
[24] Y.-H. Yang and H. H. Chen, Music Emotion Recognition. Multimedia Computing,
Communication and Intelligence, CRC Press Taylor & Francis Group,LLC, 2011.
[25] Sethares and W. A., Tuning, timbre, spectrum, scale. London: Springer, 2nd ed ed.,
2005.
[26] R. Parncutt, Harmony : A Psychoacoustical Approach, vol. 19. Springer, 1989.
[27] O. Lartillot and P. Toiviainen, “MIR in MATLAB (II): A Toolbox for Musical Feature
Extraction from Audio,” in In Proceedings of International Conference on Music Information
Retrieval, pp. 127–130, Austrian Computer Society, 2007. [Online]http:
//users.jyu.fi/lartillo/mirtoolbox/.
[28] A. Klapuri, “Sound Onset Detection by Applying Psychoacoustic Knowledge,” in in
proceedings of the IEEE International Conference of Acoustics, Speech, and Signal Processing,
Washington, DC, USA, vol. 6, pp. 115–118, 1999.
[29] J. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. Sandler, “A Tutorial on
Onset Detection in Music Signals,” IEEE Transactions on Speech and Audio Processing,
vol. 13, no. 5-2, pp. 1035–1047, 2005.
[30] Y.-H. Chin, C.-H. Lin, E. Siahaan, I.-C.Wang, and J.-C.Wang, “Music Emotion Classification
Using Double-Layer Support Vector Machines,” in In Proceedings IEEE International
Conference on Orange Technologies(ICOT), pp. 193–196, March 2013.
[31] F.-C. Hwang, J. Wang, P.-C. Chung, and C.-F. Yang, “Detecting Emotional Expression
of Music with Feature Selection Approach,” in In Proceedings IEEE International Conference
on Orange Technologies(ICOT), pp. 282–286, March 2013.
[32] G. Tzanetakis and P. Cook, “Music Genre Classification of Audio Signals,” IEEE Transactions
on Speech and Audio Processing, vol. 10, pp. 293–302, July 2002. [Online]
http://marsyas.sness.net/.
[33] C.-C. Chang and C.-J. Lin, “LIBSVM : A Library for Support Vector Machines,” ACM
Transactions on Intelligent Systems and Technology(TIST), vol. 2, pp. 27:1–27:27, May
2011.
[34] B. Boser, I. Guyon, and V. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,”
in Proceedings of The 5th Annual Workshop on Computational Learning Theory
(COLT’92), Pittsburgh, PA, USA, pp. 144–152, ACM Press, July 1992.
[35] 林宗勳, “Support vector machine簡介,” 2006.
[36] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A Pratical Guide to Support Vector Classification,”
tech. rep., Department of Computer Science, National Taiwan University, 2003.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code