Responsive image
博碩士論文 etd-0823105-181117 詳細資訊
Title page for etd-0823105-181117
論文名稱
Title
實現以向量化運算為基礎之VLIW指令碼可壓縮之數位訊號處理器
Implementation of Vectorization-Based VLIW DSP with Compact Instructions
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
91
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-08
繳交日期
Date of Submission
2005-08-23
關鍵字
Keywords
超長指令、快速傅利葉轉換、向量化指令、數位訊號處理器、指令壓縮
FFT, Compressed Instruction, VLIW, DSP, Vector Instruction
統計
Statistics
本論文已被瀏覽 5682 次,被下載 1889
The thesis/dissertation has been browsed 5682 times, has been downloaded 1889 times.
中文摘要
本論文之主要目的在於設計與實現符合DVB-T數位視訊廣播系統接收器之相關演算法運算所需之高效能數位訊號處理器,提供即時之資料處理以產生足夠的訊號流的輸出。其中需即時完成8192點之FFT運算是本論文最重要的關鍵。
在不使設計的處理器時脈太高的前提之下,為了達到即時完成FFT的需求,提高指令並行度是唯一的方法。因此本論文針對FFT演算法設計了一個VLIW架構之運算核心,稱作DVB-T DSP,以提供足夠之執行單元支持指令並行,並利用Software Pipelining的方式將迴圈重新排程以達到在處理FFT之Butterfly運算時具有最佳之指令並行度。另外,為了能提供順暢的資料流,本論文針對FFT向量運算之特性,改良數位訊號處理器的Modulo Addressing之運算機制,稱作Extended Modulo Addressing,使得原本離散的向量能被視為一新的連續向量,避免了因資料不連續所造成的管線延遲。
在VLIW架構處理器中,指令碼膨脹是一個很大的問題,為了解決這個問題,本論文提出一種指令壓縮的機制,能在不影響原有的處理器執行效率下,提高兩倍左右的程式密度。
根據模擬分析的結果,此架構在執行FFT運算僅需133Mhz的速度即可達到DVB-T之要求。而在執行其他數位訊號處理之演算法上,也可以達到極好的效能。
Abstract
The main goal of this thesis is to design and implement the high performance processor core for completing those digital signal processing algorithms applied at the DVB-T systems. The DSP must support the signal flow in time. Completing the FFT algorithm at 8192 input signal points instantaneously is the most important key.
In order to achieve the time demand of FFT and the DSP frequency must be as lower as possible, the way is to increase the degree of instruction level parallelism (ILP). The thesis designs a VLIW architecture processing core called DVB-T DSP to support instruction parallelism with enough execution units. The thesis also uses the software pipelining to schedule the loop to achieve the highest ILP when used to execute FFT butterfly operations. Furthermore, in order to provide the smooth data stream for pipeline, the thesis designs a mechanism to improve the modulo addressing, called extended modulo addressing, will collect the discrete vectors into one continuous vector.
This is a big problem that the program size is bigger than other processor architecture at the VLIW processor architecture. In order to solve the problem, this thesis proposes an instruction compression mechanism, which can increase double program density and does not affect the processor execution efficiency.
The simulation result shows that DVB-T DSP can achieve the time demand of FFT at 133Mhz. DVB-T DSP also has good performance for other digital signal processing algorithms.
目次 Table of Contents
摘要 I
英文摘要 III
目錄 V
圖片列表 VII
表格列表 IX
第一章 導論 1
1.1 數位訊號處理器架構與現在之發展趨勢 1
1.2 研究動機與研究目的 5
1.3 論文架構 6
第二章 相關研究 7
2.1 VLIW 7
2.2 編譯器與指令層級之並行 9
2.3 向量化處理器 11
2.4 具向量化運算之數位訊號處理器 13
第三章 向量化運算 16
3.1 FFT演算法 16
3.2 向量化指令排程 23
3.3 增進向量化指令效率與排程結果分析 25
第四章 指令壓縮 27
4.1 指令壓縮機制 27
4.2 解壓縮硬體結構 31
4.3 壓縮率 33
第五章 DVB-T DSP架構 34
5.1 架構概述 34
5.2 暫存器 37
5.3 記憶體定址模式 41
5.3.1 Indirect Addressing 41
5.3.2 Modulo Addressing 42
5.3.3 Extended Modulo Addressing 43
5.3.4 Bit Reversal Addresssing 44
5.4 Zero Overhead Looping 46
5.5 指令集 48
5.5.1 ALU Instruction 48
5.5.2 Multiplier Instruction 50
5.5.3 Load Instruction 52
5.5.4 Store Instruction 54
5.5.5 Branch Instruction 56
5.5.6 Zero Overhead Looping Instruction 57
第六章 驗證與分析 58
6.1 驗證與模擬環境 59
6.2 硬體合成與驗證結果 60
6.3 分析結果 62
第七章 結論 70
附錄 72
參考文獻 78
參考文獻 References
[1]Albert Yu, “The future of microprocessors”, IEEE Micro, Dec 1996, pp. 46-53.
[2]Gene Frantz, “Digital Signal Processor Trends“, IEEE Micro,
November-December 2000, pp 52-59 November/December 2000
(Vol. 20, No. 6)
[3]Seshan, N., “High VelociTI processing [Texas Instruments VLIW DSP
Architecture],” IEEE Signal Processing Magazine, Vol. 15 Issue: 2 ,
pp 86 -101, Mar 1998.
[4]John L. Hennessy ; David A. Patterson, “Computer Architecture A Quantitative
Approach 3rd”, Morgan Kaufmann Publichsers,2003
[5]Talla, D.; John, L.K.; Lapinskii, V.; Evans, B.L., “Evaluating signal processing
and multimedia applications on SIMD, VLIW and superscalar architectures,”
International Conference on Computer Design, 2000, pp 163 -172.
[6]Moon, S.-M.; Park, S., “Performance analysis of VLIW compilation techniques,”
IEE Proceedings- Computers and Digital Techniques, Vol.147 Issue: 2 , pp 117
-123, Mar 2000.
[7]Texas Instruments, “C54x DSP Benchmark”, “C62x DSP Benchmark” ,
http://dspvillage.ti.com/docs/catalog/generation/details.jhtml?templateId=5147&path=templatedata/cm/dspdetail/data/c54_benchmarks
http://dspvillage.ti.com/docs/catalog/generation/details.jhtml?templateId=5154&path=templatedata/cm/dspdetail/data/c62_benchmarks
[8]BDTI, “BDTImark2000 and BDTIsimMARK2000 benchmark scores”,
http://www.bdti.com/bdtimark/BDTImark2000.htm
[9]Colwell, R.P.; Hall, W.E.; Joshi, C.S.; Papworth, D.B.; Rodman, P.K.; Tornes,
J.E., “Architecture and implementation of a VLIW supercomputer,”
Proceedings of Supercomputing '90., pp 910 -919, 12-16 Nov 1990.
[10]Lee, L.; Suparjo, B.S.; Wagiran, R.; Sidek, R., “DSP design using VLIW
architecture,” IEEE International Conference on Semiconductor Electronics,
pp 160 -167, 2000
[11]Conte, T.M.; Banerjia, S.; Larin, S.Y.; Menezes, K.N.; Sathaye, S.W.,
“Instruction fetch mechanisms for VLIW architectures with compressed
encodings,” Proceedings of the 29th Annual IEEE/ACM International
Symposium on Microarchitecture, 1996. MICRO-29. , pp 201 -211,
2-4 Dec 1996.
[12]Sair, S.; Kaeli, D.R.; Meleis, W., “A study of loop unrolling for VLIW-based
DSP processors,” 1998 IEEE Workshop on Signal Processing Systems, SIPS
98. , pp 519 -527, 8-10 Oct 1998.
[13]Stoodley, M.G.; Lee, C.G., “Software pipelining loops with conditional
branches,” Proceedings of the 29th Annual IEEE/ACM International
Symposium on Microarchitecture, 1996. MICRO-29., pp 262 -273,
2-4 Dec 1996.
[14]Lee, M.; Tirumalai, P.; Ngai, T.-F., “Software pipelining and superblock
scheduling: compilation techniques for VLIW machines,” Proceeding of the
Twenty-Sixth Hawaii International Conference on System Sciences,
pp 202 -213, 5-8 Jan 1993.
[15]Calahan, D.; Ames, W., ”Vector processors: Models and applications”, Circuits
and Systems, IEEE Transactions on, pp715-726, Volume: 26 Issue: 9 ,
Sep 1979
[16]Texas Instruments, ”TMS320C3X User's Guide”,
http://www.ti.com/sc/docs/psheets/rel_dsp.htm
[17]Texas Instruments, “TMS320C6000 CPU and Instruction Set Reference Guide”
http://www.ti.com/sc/docs/psheets/rel_dsp.htm
[18]ARM,”VFP9-S Vector Floating-point Coprocessor Technique Reference
Manual”, http://www.arm.com
[19]G. Bi and E. Jones, “A pipelined FFT processor for word-sequential data,” IEEE
Transactions on Acoustics, Speech, and Signal Processing, vol.37,
pp.1982-1985, December 1989.
[20]Yutai Ma, “A effective memory addressing scheme for FFT processors”, IEEE
Transactions on Signal Processing, vol.47, No.3 pp. March 1999.
[21]B. Gold and T. Bially, “Parallelism in fast Fourier transform hardware,” IEEE
Transactions on Audio Electroacoustics, vol.21, no.1, pp. 5-16,1973.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code