國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,VLIW DSP架構之增進指令並行度之向量化運算機制 ,Improving ILP with the Vectorized Computing Mechanism in VLIW DSP Architecture

論文名稱 Title	VLIW DSP架構之增進指令並行度之向量化運算機制 Improving ILP with the Vectorized Computing Mechanism in VLIW DSP Architecture
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	84
研究生 Author	楊得鑫 Te-Shin Yang
指導教授 Advisor	邱日清 Jih-Ching Chiu
召集委員 Convenor	蕭勝夫 Shen-Fu Hsiao
口試委員 Advisory Committee	李錫智 Shie-Jue Lee
口試日期 Date of Exam	2003-06-10	繳交日期 Date of Submission	2003-06-25
關鍵字 Keywords	指令並行度、向量運算 VLIW, vector computing, instruction level parallelism
統計 Statistics	本論文已被瀏覽 5702 次，被下載 3317 次 The thesis/dissertation has been browsed 5702 times, has been downloaded 3317 times.

中文摘要
現今的DSP處理器設計常利用VLIW架構提高指令執行之並行度，以達到提高效能的目的。提高指令並行度的瓶頸有二，一是硬體資源是否足以同時處理所有的平行指令，二是由於指令間的相依關係所以無法平行處理；本論文針對ＦＦＴ演算法設計了一個VLIW架構之運算核心DVBTDSP，並利用軟體排程(Software pipelining)的方式將指令迴圈重新排程以達到在處理FFT之蝴蝶運算時具有最佳之指令並行度，另外為了能提供順暢的資料流，本論文針對FFT向量運算之特性，改良傳統DSP的餘數定址(modulo addressing)之運算機制，使得原本離散的向量能被視為一新的連續向量，避免了因向量中斷所造成的管線延遲，根據模擬分析的結果，此架構在處理FFT運算時跟C6200相比只需要其1/2的運算時間，在做其他演算法如FIR，IIR，DCT也有不亞於C6200的效能。
Abstract
In order to improving the performance for real-time application, current digital signal processors use VLIW architectures to increase the degree of instruction level parallelism (ILP). Two factors will limit the ILP, one is enough hardware resource for all parallel instructions. Another is the dependence relations between instructions. This thesis designs a VLIW architecture processing core called DVBTDSP molded by FFT algorithm and uses the software pipelining mechanism to schedule the loop to achieve the highest ILP degree when used to execute FFT butterfly operations. Furthermore, in order to provide the smooth data stream for pipeline operations, we design a mechanism to improve the modulo addressing, which will collect the discrete vectors into one continuous vector. The simulation results show that the DVBTDSP has double performance of the C6200 for the FFT processing, and has good performance for FIR, IIR and DCT algorithm computing.

目次 Table of Contents
摘要 i ABSTRACT ii Contents iii List of Figures v List of Tables vii Chapter 1 Introduction 1 1.1 The Development of DSP and Vector Processors 3 1.2 Standard DSP Architecture 4 1.3 Motivation and Goal 6 Chapter 2 Survey 8 2.1 VLIW 8 2.2 Basic Compiler ILP 9 2.3 Vector Processors 13 2.4 Current DSP processor with vector computing (VFP, C3x, C6x) 14 Chapter 3 Design of an Instruction Pipeline Decoder 20 3.1 The Characteristics of Arm Introduction Set 21 3.1.1 Instruction types 21 3.1.2 Multi-cycle instruction 22 3.1.3. Instruction stream 22 3.1.4. Forwarding controller 25 3.2 A Single Instruction Pipeline Decoder Design 26 3.2.1 Architecture 26 3.2.2 Resolution unit 28 3.3 Decoder design in VLIW DSP architecture 29 Chapter 4 Vectorized computing algorithm in VLIW architecture 31 4.1 FFT algorithm with DSP processing 31 4.2 Vectorized code scheduling 36 4.3 Circular Index Register setting instructions 39 4.4 Conditional load instruction 40 4.5 Modulo addressing mode 41 4.6 The Architecture of DVBTDSP 46 4.7 Super Element Architecture 48 4.7.1 ALUL 50 4.7.2 ALUR & MUL 51 4.7.3 Load 53 4.7.4 Store 55 4.7.5 Register File 56 Chapter 5 Verification and Analysis result 59 5.1 Verification environment 61 5.2 Synthesis results 62 5.3 Analysis results 65 Chapter 6 Conclusions and Future Work 72 Appendix 74 Reference 82

參考文獻 References
[1] Sunghyun Jee; Palaniappan, K, ”Dynamically scheduling VLIW instructions with dependency information” Interaction between Compilers and Computer Architectures, 2002, pp15-23 [2] J W Cooley and J W Tukey: “An Algorithm for the Machine Computation of Complex Fourier Series”, Mathematical Computations, 19, April 1965, pp. 297-301 [3] Lars Wanhammar, DSP Integrateed Circuits, academic press, 1999. [4] Glasser L.A and Dobberpuhl D.W, “The Design and Analysis of VLSI Circuits”, Addison-Wesley, Reading, MA, 1985 [5] Gene Frantz, “Digital Signal Processor Trends“, IEEE Micro, November-December 2000 pp 52-59 November/December 2000 (Vol. 20, No. 6) [6] Wolfe, A.; Fritts, J.; Dutta, S.; Fernandes, E.S.T.,” Datapath design for a VLIW video signal processor”, High-Performance Computer Architecture, 1997., Third International Symposium on , pp24 -35, 1-5 Feb 1997 [7] Sunghyun Jee; Palaniappan, K. “Compiler processor tradeoffs for DISVLIW architecture”, International Symposium on Parallel Architectures, Algorithms and Networks, pp: 175 -180. 2002 [8] J. Fritts. Architecture and Compiler Design Issues in Programmable Media Processors, Ph.D. Thesis, 2000. [9] D. A. Patterson and J. L. Hennessy, “Computer Atchitecture a Quantitative Approach”, Third Edition, Morgan Kaufmann Publisher, 2003 [10] Calahan, D.; Ames, W., ”Vector processors: Models and applications”, Circuits and Systems, IEEE Transactions on, pp715-726, Volume: 26 Issue: 9 , Sep 1979 [11] Kai Hwang, Faye A. Briggs, “Computer Architecture and Parallel Processing”,McGraw-Hill Book Company,1984 [12] Texas Instruments, ”TMS320C3X User's Guide”, http://www.ti.com/sc/docs/psheets/rel_dsp.htm [13] Texas Instruments, “TMS320C6000 CPU and Instruction Set Reference Guide”, http://www.ti.com/sc/docs/psheets/rel_dsp.htm [14] J. Eyre, J. Bier, "DSP Processors hits the mainstream" Computer Magazine, pp. 51-59, August 1998. [15] ARM,”VFP9-S Vector Floating-point Coprocessor Technique Reference Manual”, http://www.arm.com [16] ARM,”Arm Architecture Reference Manual”, http://www.arm.com [17] Simon Segars, ”The ARM9 Family – High performance Microprocessors for Embedded Applications” Computer Design: VLSI in Computers and Processors, 1998. ICCD '98. Proceedings. International Conference, pp:230-235,1998 [18] Steve Fuber, “ARM System-on-Chip Architecture” Addison Wesley Longman Inc,1996. [19] Findlay, P.A.; Trainis, S.A.; Steven, G.B.; Adams, R.G.,” HARP: a VLIW RISC processor”, CompEuro '91. 'Advanced Computer Technology, Reliable Systems and Applications'. 5th Annual European Computer Conference. Proceedings. , pp368 -372, 13-16 May 1991 [20] Lee, M.; Tirumalai, P.; Ngai, T.-F., “Software pipelining and superblock scheduling: compilation techniques for VLIW machines,” Proceeding of the Twenty-Sixth Hawaii International Conference on System Sciences, pp 202 -213, 5-8 Jan 1993. [21] Bogong Su; Jian Wang; Zhizhong Tang; Wei Zhao; Yimin Wu; A Software sPipelining Based VLIW Architecture and Optimizing Compiler Microprogramming and Microarchitecture. Micro 23. Proceedings of the 23rd Annual Workshop and Symposium, Workshop on, pp17-27, 27-29, Nov 1990

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0625103-115444.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS