國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,高效率音訊品質評估方法與硬體之實現,An Efficient Method to Assess Audio Quality and Its Hardware Implementation

論文名稱 Title	高效率音訊品質評估方法與硬體之實現 An Efficient Method to Assess Audio Quality and Its Hardware Implementation
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	73
研究生 Author	吳美蓉 Mei-Jung Wu
指導教授 Advisor	謝東佑 Tong-Yu Hsieh
召集委員 Convenor	鄺獻榮 Shiann-Rong Kuang
口試委員 Advisory Committee	邱日清, 丁信文 Jih-Ching Chiu; Hsin-Wen Ting
口試日期 Date of Exam	2017-05-24	繳交日期 Date of Submission	2017-09-01
關鍵字 Keywords	錯誤音訊、容誤、音訊可接受度評估、音訊處理電路、PEAQ audio processing circuits, audio acceptability evaluation, error-tolerance, PEAQ, erroneous audio
統計 Statistics	本論文已被瀏覽 5692 次，被下載 23 次 The thesis/dissertation has been browsed 5692 times, has been downloaded 23 times.

中文摘要
這資料量龐大的時代中，音訊資料通常會進行有損壓縮以利儲存或傳輸，然而此壓縮方式可能使得音訊品質下降。另一方面，製程的微小化使音訊處理晶片更易受製程缺陷或是雜訊干擾而無法正常運作，另外，電路老化也極可能對音訊品質造成不良影響。當音訊受到毀損時，所傳遞訊息將容易被錯誤解讀。此問題在語音辨識扮演越來越重要角色的物聯網系統應用中更加極需解決。幸運的是，音訊具有容誤特性。因人耳的聽覺敏感程度有限，對於較細微的聲音變化時常無法察覺，即使音訊晶片運算有誤，人耳也可能不易感受錯誤存在。因此晶片仍極有可能可繼續使用，而可延長其使用壽命。故關鍵的問題為如何有效評估音訊之可接受度。過去研究中，已有許多準確音訊品質評估方法被提出，但這些方法運算量相當龐大，以軟體執行將會耗費大量運算時間，而若以硬體實現，複雜的運算也會導致龐大硬體成本。本研究以降低運算量與簡化運算複雜度為目標，考量人耳聽覺特性，開發能判斷音訊品質是否可被接受的方法。 PEAQ (Perceptual Evaluation of Audio Quality)為現今常被使用的音訊品質評估方法，本研究以其運算結果做為品質評估準確與否之依據，並進行效能評估。本論文共提出兩套高效率測試方法，相較於PEAQ，軟體執行時間降幅可分別達94.24%及77.27%。第一套方法可有效評估出錯音訊處理電路之輸出結果品質，對一般音訊來說測試準確率可達82.33%，而對於語音訊號來說準確度甚至可達92.42%。此方法也相當適合以硬體方式實現，硬體成本僅佔商用MP3解碼器的3.98%。第二套方法則對進行壓縮後之音訊品質評估相當有效，在原音訊頻帶夠寬時準確率可達100%，而對錯誤壓縮音訊之準確率亦有80.80%。相較於第一套方法，此方法將需較高硬體成本，故較適合以軟體方式實現。然而，若能適當與音訊處理系統進行硬體共用與整合，硬體成本將可大幅降低。
Abstract
To facilitate storage and transmission, lossy compression is usually used for audio signals despite the possible quality degradation. On the other hand, scaling down of transistor sizes makes audio processing chips more sensible to process defects and noises and result in erroneous audio. Aging effects may also result in adverse impacts on audio quality. These would make messages delivered mis-understood. This problem becomes more critical for internet of things applications where speech recognition is expected to play an important role. Fortunately, minor variations in audio are likely to be imperceptible due to human beings’ hearing insensitivity. This makes errors possibly still acceptable. The life time of audio chips can thus be extended. Therefore, one critical issue is how to effectively evaluate the acceptability of audio. In the literature there have been a number of accurate audio assessment methods developed. However, high computation complexity is required for these methods where long software execution time or unaffordable high hardware cost would be incurred. In this work, our goal is to develop an efficient audio acceptability evaluation method based on human beings’ hearing sensibility. PEAQ (Perceptual Evaluation of Audio Quality) is one of the widely used audio quality assessment methods. In this work, we use PEAQ results to evaluate accuracy of the proposed methods and also evaluate the performance. Two efficient methods are proposed. Compared to PEAQ, both methods can reduce the software execution time by 94.24% and 77.27%, respectively. The first method can effectively assess output quality of faulty audio circuits — 82.33% accuracy for ordinary audio and even 92.42% for speech. This method is also easy to be implemented by hardware. The incurred hardware cost is only 3.98% with respect to commercial MP3 decoders. The second method is effective for compressed audio. 100% accuracy is achievable when the bandwidth of the reference audio is large enough. For faulty compressed audio, 80.80% accuracy can be achieved. Compared with the first proposed method, this method has higher hardware implementation complexity, and thus software implementation is preferred. Nevertheless, the hardware cost can be reduced if hardware in audio systems is used and investigated with the proposed method.

目次 Table of Contents
論文審定書 i 致謝 ii 摘要 iii Abstract iv 目錄 v 圖目錄 viii 表目錄 x 第一章概論 1 1.1 研究動機 1 1.2 貢獻 1 1.3 論文大綱 3 第二章背景及相關文獻回顧 4 2.1 音訊容誤 4 2.2 人耳聽覺特性 4 2.2.1 聽力閾值 (Hearing Threshold) 4 2.2.2 遮蔽效應 (Masking Effect) [10]-[12] 5 2.3 MP3 (MPEG-1 Layer3) 7 2.4 Perceptual Evaluation of Audio Quality (PEAQ) 8 第三章基於時域強度差異之容誤品質評估方法與實現 11 3.1 簡介 11 3.2 音訊之時域遮蔽效應與容誤 11 3.3 時域強度區間劃分 11 3.4 時域強度差異與可接受度探討 13 3.5 音訊點數量與可接受度探討 16 3.6 時域強度差異容誤品質評估方法 17 3.6.1 評估流程 17 3.6.2 準確率分析 20 3.7 效能評估 21 3.8 硬體實現 23 3.8.1 架構 23 3.8.2 記憶體內容 25 3.8.3 運作流程 25 3.8.4 準確率 26 3.8.5 成本分析 27 3.9 語音之應用 28 3.9.1 語音品質之重要性 28 3.9.2 軟體準確率 29 3.9.3 硬體準確率 29 3.10 討論與延伸 30 第四章整合頻域及時域差異之容誤品質評估方法與實現 31 4.1 簡介 31 4.2 音頻與容誤 31 4.2.1 頻率與音訊品質 31 4.2.2 時域強度轉折點 34 4.2.3 頻率強度差異 34 4.3 時域強度轉折點分析 35 4.3.1 轉折點差異與頻譜變化 35 4.3.2 轉折點差異與可接受度 36 4.4 頻率強度差異與可接受度分析 36 4.4.1 3KHz 37 4.4.2 12KHz 37 4.5 整合之評估方法 38 4.6 準確率分析 41 4.7 效能評估 44 4.8 硬體實現 46 4.8.1 架構 46 4.8.2 運作流程 49 4.8.3 記憶體內容 50 4.8.4 準確率 50 4.8.5 成本分析 51 4.9 綜合比較與探討 53 第五章總結與未來展望 55 第六章參考文獻 56

參考文獻 References
[1] Dinu Câmpeanu, Andrei Câmpeanu, “PEAQ — an Objective Method to Assess the Perceptual Quality of Audio Compressed Files,” the International Symposium on Theory, Automation, Robotics, Computers, Informatics, Electronics and Instrumentation, pp. 487-492, 2005. [2] Rainer Huber and Birger Kollmeier, “PEMO-Q — a New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception,” IEEE Transactions on Audio, Speech, and Language Processing, 14(6): pp. 1902-1910, 2006. [3] John G. Beerends, Christian Schmidmer, Jens Berger, Matthias Obermann, Raphael Ullmann, Joachim Pomy and Michael Keyhl, “Perceptual Objective Listening Quality Assessment (POLQA), the Third Generation ITU-T Standard for End-to-end Speech Quality Measurement Part I — Temporal Alignment,” Journal of the Audio Engineering Society, 61(6): pp. 366-384, 2013. [4] Henry H. Kuok, “Audio Recording Apparatus Using an Imperfect Memory Circuit,” United States Patent 5414758, 1995. [5] Melvin A. Breuer and Haiyang Zhu, “An Illustrated Methodology for Analysis of Error Tolerance,” IEEE Design and Test of Computers, 25(2): pp. 168-177, 2008. [6] Kaoru Ashihara, “Hearing Thresholds for Pure Tones above 16 KHz,” Acoustic Society of America, 122(3): pp. 52-57, 2007. [7] Kaoru Ashihara, Kenji Kurakata, Taru Mizunami and Kazuma Matsushita, “Hearing Threshold for Pure Tones above 20kHz,” Acoustical Science and Technology, pp. 12-19, 2006. [8] Harvey Fletcher, “Auditory Patterns,” Review of Modern Physics, 12: pp. 47-65, 1940. [9] ISO 389-7, Acoustics-Reference Zero for the Calibration of Audiometric Equipment-Part 7: Reference Threshold of Hearing under Free-field and Diffuse-field listening Conditions, 1996. [10] Peter Noll, “Wideband Speech and Audio Coding,” IEEE Communication Magazine, 31(11): pp. 34-44, 1993. [11] Eliathamby Ambikairajah, Andrew Davis and Kenneth Wong, “Auditory Masking and MPEG-1 Audio Compression,” Electronics and Communication Engineering Journal, 9(4): pp. 165-175, 1997. [12] Hugo Fastl and Eberhard Zwicker, Psychoacoustics: Facts and Models, Springer, 3rd Edition, 2006. [13] Munesh Chandra Trivedi and Naresh Kumar Trivedi, “Audio Masking for Watermark Embedding Under Time Domain Audio Signals,” Proceedings of International Conference on Computational Intelligence and Communication Networks, pp. 771-775, 2014. [14] Karlheinz Brandenburg and Harald Popp, An Introduction to MPEG Layer-3, EBU Technical Review, 2000. [15] Tze-Ying Chang, Research and Implementation of MP3 Encoding Algorithm, Master Thesis, National Chiao Tung University, 2002. [16] Rassol Raissi, The Theory behind MP3, 2002. [17] Staffan Gadd and Thomas Lenart, A Hardware Accelerated MP3 Decoder with Bluetooth Streaming Capabilities, Master of Science Thesis in Cooperation with C Technologies AB, 2001. [18] LAME MP3 Encoder Website. Accessed on June 30, 2017. [Online]. Available: http://lame.sourceforge.net/. [19] Peter Pocta and John G. Beerends, “Subjective and Objective Assessment of Perceived Audio Quality of Current Digital Audio Broadcasting Systems and Web-Casting Applications,” IEEE Transactions on Broadcasting, 61(3): pp. 407-415, 2015. [20] Mohammad Mosleh, Hadi Latifpour, Mohammad Kheyrandish, Mahdi Mosleh and Najmeh Hosseinpour, “A Robust Intelligent Audio Watermarking Scheme Using Support Vector Machine,” Frontiers of Information Technology and Electronic Engineering, 17(12): pp. 1320-1330. [21] Maciej Niedźwiecki, Marcin Ciołek and Krzysztof Cisowski, “Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering,” IEEE/ACM Transactions on Audio, Speech and Language Processing, 23(6): pp. 970-981, 2015. [22] Hwai-Tsu Hu1, Hsien-Hsin Chou1, Chu Yu and Ling-Yuan Hsu, “Incorporation of Perceptually Adaptive QIM with Singular Value Decomposition for Blind Audio Watermarking,” EURASIP Journal on Advances in Signal Processing, 2014(1): pp.1-12. [23] ITU-T Recommendation P.862, “Perceptual Evaluation of Speech Quality (PESQ), An Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs,” International Telecommunication Union, 2001. [24] Amaro A De Lima, Fabio P Freeland, Rafael A De Jesus, et al., “On the Quality Assessment of Sound Signals,” IEEE International Symposium on Circuits and Systems, pp. 416-419, 2008. [25] Peter Kabal, An Examination and Interpretation of ITU-R BS. 1387: Perceptual Evaluation of Audio Quality, TSP Lab Technical Report, McGill University, 2002. [26] Telecommunications and Signal Processing Laboratory Multimedia Signal Processing Packages Website. Accessed on June 30, 2017. [Online]. Available: http://www-mmsp.ece.mcgill.ca/MMSP/Documents/Software/. [27] David Gunawan and D. Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources from Single-Channel Mixtures,” IEEE Signal Processing Letters, 17(5): pp. 421-424, 2010. [28] LAME MP3 Encoder Website. Accessed on June 30, 2017. [Online]. Available: http://lame.sourceforge.net/. [29] EBU(European Broadcasting Union) SQAM CD, Sound Quality Assessment Material Recordings for Subjective Tests Website. Accessed on June 30, 2017. [Online]. Available: https://tech.ebu.ch/publications/sqamcd. [30] Public Multiformat Listening Test Website. Accessed on June 30, 2017. [Online]. Available: http://listening-test.coresv.net/results.htm. [31] Xuanlei Zhang, Weibei Dou and Ming Dong, “A Reusable Architecture Design and Implementation for Inverse Quantization of MP3 Decoding,” Audio, Language and Image Processing, pp. 247-251, 2008. [32] MP3-SO3E Specification, Silicon Ocean Ultra Low Power MP3 Decoder ASIC Core, Maojettechnoloy Corporation Website. Accessed on June 30, 2017. [Online]. Available: http://www.maojet.com.tw/product/content/main.jspx?i=4c8f537e-3f72-11e0-aefe-000c299701c8. [33] SONY Digital Voice Recorder PCM-D100 Product Specification Website. Accessed on June 30, 2017. [Online]. Available: http://store.sony.com.tw/product/PCM-D100#tabs-spec. [34] SONY MP3 Player NW-WM1Z Product Specification Website. Accessed on June 30, 2017. [Online]. Available: http://www.sony.com.tw/zh/electronics/walkman/nw-wm1z/specifications. [35] Bernhard Birkl, Bridget Hooser, Marc Janssens, Dr. Frank Lenke and Vlado Vorisek, “Design Integration, DFT, and Verification Methodology for an MPEG 1/2 Audio Layer 3 (MP3) SoC Device,” Proceedings of IEEE Custom Integrated Circuits Conference, pp. 303-306, 2002. [36] Roland Dobai, Marcel Baláž, Peter Trebatický, Peter Malik, Elena Gramatová, “A Low-Overhead BIST Architecture for Digital Data Processing Circuits,” In: Tarek Sobh, Khaled Elleithy (editors) Emerging Trends in Computing, Informatics, Systems Sciences, and Engineering, Lecture Notes in Electrical Engineering, 151: pp. 647-659, Springer, 2013. [37] Audacity, Free, Open Source, Cross-platform Audio Software for Multi-track Recording and Editing Website. Accessed on June 30, 2017. [Online]. Available: http://www.audacityteam.org/download/windows/. [38] Stanley A. White, “A Simple FFT Butterfly Arithmetic Unit,” IEEE Transactions on Circuits and Systems, 28(2): pp. 352-355, 1981.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0801117-101956.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS