國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,實現以向量化運算為基礎之VLIW指令碼可壓縮之數位訊號處理器,Implementation of Vectorization-Based VLIW DSP with Compact Instructions

論文名稱 Title	實現以向量化運算為基礎之VLIW指令碼可壓縮之數位訊號處理器 Implementation of Vectorization-Based VLIW DSP with Compact Instructions
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	93 學年度第 2 學期 The spring semester of Academic Year 93	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	91
研究生 Author	李俊憲 Chun-Hsien Lee
指導教授 Advisor	邱日清 Jih-Ching Chiu
召集委員 Convenor	鍾崇斌 Chung-Ping Chung
口試委員 Advisory Committee	李聰, 蕭勝夫 Tsung Lee; Shen-Fu Hsiao
口試日期 Date of Exam	2005-07-08	繳交日期 Date of Submission	2005-08-23
關鍵字 Keywords	超長指令、快速傅利葉轉換、向量化指令、數位訊號處理器、指令壓縮 FFT, Compressed Instruction, VLIW, DSP, Vector Instruction
統計 Statistics	本論文已被瀏覽 5682 次，被下載 1889 次 The thesis/dissertation has been browsed 5682 times, has been downloaded 1889 times.

中文摘要
本論文之主要目的在於設計與實現符合DVB-T數位視訊廣播系統接收器之相關演算法運算所需之高效能數位訊號處理器，提供即時之資料處理以產生足夠的訊號流的輸出。其中需即時完成8192點之FFT運算是本論文最重要的關鍵。在不使設計的處理器時脈太高的前提之下，為了達到即時完成FFT的需求，提高指令並行度是唯一的方法。因此本論文針對FFT演算法設計了一個VLIW架構之運算核心，稱作DVB-T DSP，以提供足夠之執行單元支持指令並行，並利用Software Pipelining的方式將迴圈重新排程以達到在處理FFT之Butterfly運算時具有最佳之指令並行度。另外，為了能提供順暢的資料流，本論文針對FFT向量運算之特性，改良數位訊號處理器的Modulo Addressing之運算機制，稱作Extended Modulo Addressing，使得原本離散的向量能被視為一新的連續向量，避免了因資料不連續所造成的管線延遲。在VLIW架構處理器中，指令碼膨脹是一個很大的問題，為了解決這個問題，本論文提出一種指令壓縮的機制，能在不影響原有的處理器執行效率下，提高兩倍左右的程式密度。根據模擬分析的結果，此架構在執行FFT運算僅需133Mhz的速度即可達到DVB-T之要求。而在執行其他數位訊號處理之演算法上，也可以達到極好的效能。
Abstract
The main goal of this thesis is to design and implement the high performance processor core for completing those digital signal processing algorithms applied at the DVB-T systems. The DSP must support the signal flow in time. Completing the FFT algorithm at 8192 input signal points instantaneously is the most important key. In order to achieve the time demand of FFT and the DSP frequency must be as lower as possible, the way is to increase the degree of instruction level parallelism (ILP). The thesis designs a VLIW architecture processing core called DVB-T DSP to support instruction parallelism with enough execution units. The thesis also uses the software pipelining to schedule the loop to achieve the highest ILP when used to execute FFT butterfly operations. Furthermore, in order to provide the smooth data stream for pipeline, the thesis designs a mechanism to improve the modulo addressing, called extended modulo addressing, will collect the discrete vectors into one continuous vector. This is a big problem that the program size is bigger than other processor architecture at the VLIW processor architecture. In order to solve the problem, this thesis proposes an instruction compression mechanism, which can increase double program density and does not affect the processor execution efficiency. The simulation result shows that DVB-T DSP can achieve the time demand of FFT at 133Mhz. DVB-T DSP also has good performance for other digital signal processing algorithms.

目次 Table of Contents
摘要 I 英文摘要 III 目錄 V 圖片列表 VII 表格列表 IX 第一章導論 1 1.1 數位訊號處理器架構與現在之發展趨勢 1 1.2 研究動機與研究目的 5 1.3 論文架構 6 第二章相關研究 7 2.1 VLIW 7 2.2 編譯器與指令層級之並行 9 2.3 向量化處理器 11 2.4 具向量化運算之數位訊號處理器 13 第三章向量化運算 16 3.1 FFT演算法 16 3.2 向量化指令排程 23 3.3 增進向量化指令效率與排程結果分析 25 第四章指令壓縮 27 4.1 指令壓縮機制 27 4.2 解壓縮硬體結構 31 4.3 壓縮率 33 第五章 DVB-T DSP架構 34 5.1 架構概述 34 5.2 暫存器 37 5.3 記憶體定址模式 41 5.3.1 Indirect Addressing 41 5.3.2 Modulo Addressing 42 5.3.3 Extended Modulo Addressing 43 5.3.4 Bit Reversal Addresssing 44 5.4 Zero Overhead Looping 46 5.5 指令集 48 5.5.1 ALU Instruction 48 5.5.2 Multiplier Instruction 50 5.5.3 Load Instruction 52 5.5.4 Store Instruction 54 5.5.5 Branch Instruction 56 5.5.6 Zero Overhead Looping Instruction 57 第六章驗證與分析 58 6.1 驗證與模擬環境 59 6.2 硬體合成與驗證結果 60 6.3 分析結果 62 第七章結論 70 附錄 72 參考文獻 78

參考文獻 References
［1］Albert Yu, “The future of microprocessors”, IEEE Micro, Dec 1996, pp. 46-53. ［2］Gene Frantz, “Digital Signal Processor Trends“, IEEE Micro, November-December 2000, pp 52-59 November/December 2000 (Vol. 20, No. 6) ［3］Seshan, N., “High VelociTI processing [Texas Instruments VLIW DSP Architecture],” IEEE Signal Processing Magazine, Vol. 15 Issue: 2 , pp 86 -101, Mar 1998. ［4］John L. Hennessy ; David A. Patterson, “Computer Architecture A Quantitative Approach 3rd”, Morgan Kaufmann Publichsers,2003 ［5］Talla, D.; John, L.K.; Lapinskii, V.; Evans, B.L., “Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures,” International Conference on Computer Design, 2000, pp 163 -172. ［6］Moon, S.-M.; Park, S., “Performance analysis of VLIW compilation techniques,” IEE Proceedings- Computers and Digital Techniques, Vol.147 Issue: 2 , pp 117 -123, Mar 2000. ［7］Texas Instruments, “C54x DSP Benchmark”, “C62x DSP Benchmark” , http://dspvillage.ti.com/docs/catalog/generation/details.jhtml?templateId=5147&path=templatedata/cm/dspdetail/data/c54_benchmarks http://dspvillage.ti.com/docs/catalog/generation/details.jhtml?templateId=5154&path=templatedata/cm/dspdetail/data/c62_benchmarks ［8］BDTI, “BDTImark2000 and BDTIsimMARK2000 benchmark scores”, http://www.bdti.com/bdtimark/BDTImark2000.htm ［9］Colwell, R.P.; Hall, W.E.; Joshi, C.S.; Papworth, D.B.; Rodman, P.K.; Tornes, J.E., “Architecture and implementation of a VLIW supercomputer,” Proceedings of Supercomputing '90., pp 910 -919, 12-16 Nov 1990. ［10］Lee, L.; Suparjo, B.S.; Wagiran, R.; Sidek, R., “DSP design using VLIW architecture,” IEEE International Conference on Semiconductor Electronics, pp 160 -167, 2000 ［11］Conte, T.M.; Banerjia, S.; Larin, S.Y.; Menezes, K.N.; Sathaye, S.W., “Instruction fetch mechanisms for VLIW architectures with compressed encodings,” Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996. MICRO-29. , pp 201 -211, 2-4 Dec 1996. ［12］Sair, S.; Kaeli, D.R.; Meleis, W., “A study of loop unrolling for VLIW-based DSP processors,” 1998 IEEE Workshop on Signal Processing Systems, SIPS 98. , pp 519 -527, 8-10 Oct 1998. ［13］Stoodley, M.G.; Lee, C.G., “Software pipelining loops with conditional branches,” Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996. MICRO-29., pp 262 -273, 2-4 Dec 1996. ［14］Lee, M.; Tirumalai, P.; Ngai, T.-F., “Software pipelining and superblock scheduling: compilation techniques for VLIW machines,” Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, pp 202 -213, 5-8 Jan 1993. ［15］Calahan, D.; Ames, W., ”Vector processors: Models and applications”, Circuits and Systems, IEEE Transactions on, pp715-726, Volume: 26 Issue: 9 , Sep 1979 ［16］Texas Instruments, ”TMS320C3X User's Guide”, http://www.ti.com/sc/docs/psheets/rel_dsp.htm ［17］Texas Instruments, “TMS320C6000 CPU and Instruction Set Reference Guide” http://www.ti.com/sc/docs/psheets/rel_dsp.htm ［18］ARM,”VFP9-S Vector Floating-point Coprocessor Technique Reference Manual”, http://www.arm.com ［19］G. Bi and E. Jones, “A pipelined FFT processor for word-sequential data,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.37, pp.1982-1985, December 1989. ［20］Yutai Ma, “A effective memory addressing scheme for FFT processors”, IEEE Transactions on Signal Processing, vol.47, No.3 pp. March 1999. ［21］B. Gold and T. Bially, “Parallelism in fast Fourier transform hardware,” IEEE Transactions on Audio Electroacoustics, vol.21, no.1, pp. 5-16,1973.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0823105-181117.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS