Responsive image
博碩士論文 etd-0816108-003258 詳細資訊
Title page for etd-0816108-003258
論文名稱
Title
使用時脈閘控之三維頂點處理器功率最佳化
Power Optimization for 3D Vertex Shader Using Clock Gating
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
99
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2008-07-16
繳交日期
Date of Submission
2008-08-16
關鍵字
Keywords
時脈閘控、三維頂點處理器、指令排程、低功率
instruction schedule, 3D vertex shader, clock gating, Low power
統計
Statistics
本論文已被瀏覽 5679 次,被下載 2264
The thesis/dissertation has been browsed 5679 times, has been downloaded 2264 times.
中文摘要
隨著製程技術日益進步以及高效能和多功能的設計需求,功率消耗已經變成微處理器的一大瓶頸。其中,clock power佔整體功率消耗極大的比率。在本論文中我們將針對SIMD的pipelined 3D vertex shader,提出一個有效的clock gating methodology,盡可能地在沒有過多的overhead之下,降低整體的功率消耗。除了將所有指令依照指令流程做一分類外,我們也把pipeline stage關係考慮進去,達到以register bank為單位,如此可以使clock gating有較高的彈性來控制運算單元。
使用clock gating除了可以減少clock power,還可以藉由控制具有clock gating的暫存器,讓所連接的硬體模組功率跟著減少。在本論文中,我們亦分析關到什麼程度是適合的,不一定要全部的pipeline register都關掉,也不一定要把握住全部可以關的機會,這些都會在實驗數據上加以說明,並且列出最終的低功率版本。
由實驗結果可以得知,在面積增加不到2%的overhead之下,我們提出的clock gating方法可以減少大約30%的功率。並且在搭配以改善效能為目標的指令排程演算法後,可以降低能量最多到41%。而在同時執行四個頂點的新版本上,使用clock gating也可以減少大約10%的功率消耗,並且在搭配針對此一版本設計的指令排程的演算法中,除了將效能變為較佳的結果之外,能量最多也可以降低到16%。
Abstract
With technology increasingly and the needs of high performance and multiple functionalities, power dissipation has be a bottleneck in microprocessors. And clock power is the most percentage of total power dissipation. In our thesis, we will provide an effective clock gating methodology that has not more overhead possibly to decrease total power dissipations based on SIMD 3D vertex shader. Except for classify all instructions according the instruction flow, we also consider the relationship of pipeline stage and are based on register bank to control execution units more flexibility.
Using clock gating not only can decrease clock power, but also decrease the power of hardware modules succeed the registers with clock gating that be controlled. In our thesis, we will analysis which clock gating version is suitable because there is not definitely to disable the clock of all pipeline registers of all pipeline stages and hold all opportunities that can disable the clock. We will explain on experimental results and show the final low power version.
With experimental results, the clock gating methodology that we bring can decrease almost 30% power with increase less than 2% area. And collection of instruction schedule algorithm for high performance that can decrease 41% energy at most. In new version of four vertexes execute sequentially, using clock gating can also decrease almost 10% power dissipation. And collection of instruction schedule algorithm for this version not only has better performance result but also can decrease 16% energy at most.
目次 Table of Contents
CHAPTER 1. INTRODUCTION 1
1.1 MOTIVATION 1
1.2 PAPER ORGANIZATION 3
1.3 CONTRIBUTION 3
CHAPTER 2. RELATED WORK 4
2.1 RELATED RESEARCH OF CLOCK GATING METHODOLOGY 4
2.2 RELATED RESEARCH OF GATED MULTIPLEXER 6
2.3 CLOCK GATING CIRCUIT 7
CHAPTER 3. VERTEX SHADER 10
3.1 INTRODUCTION 10
3.2 ARCHITECTURE 12
3.3 REGISTER FILES 14
3.4 INSTRUCTION SET 16
3.4.1 Instruction format 16
3.4.2 Other fields 17
3.4.3 Instructions 19
3.5 THE RELATIONSHIP BETWEEN HARDWARE MODULES AND INSTRUCTIONS 23
3.5.1 MOV/MAX/MIN/SGE/SLT 24
3.5.2 RCP/RSQ/POW2/LOG2 25
3.5.3 MUL/ADD/MAD 28
3.5.4 DP3/DP4 29
3.6 STALL DETECTION 29
3.7 FORWARDING DETECTION UNIT 33
3.7.1 Forwarding detection policy 33
3.7.2 Forwarding architecture 35
CHAPTER 4. LOW POWER FEATURES 38
4.1 PROPOSED CLOCK GATING METHODOLOGY 38
4.2 ID/EXE1 PIPELINE REGISTERS AND EXE1 PIPELINE STAGE 43
4.3 EXE1/EXE2 PIPELINE REGISTERS AND EXE2 PIPELINE STAGE 45
4.4 EXE2/EXE3 PIPELINE REGISTERS AND EXE3 PIPELINE STAGE 47
4.5 EXE3/EXE4 PIPELINE REGISTERS AND EXE4 PIPELINE STAGE 48
4.6 EXE4/EXE5 PIPELINE REGISTERS AND EXE5 PIPELINE STAGE 48
4.7 EXE5/WB PIPELINE REGISTERS AND WB PIPELINE STAGE 48
CHAPTER 5. PERFORMANCE IMPROVEMENT 54
5.1 INSTRUCTION SCHEDULE (CPF) 54
5.2 FOUR VERTEX VERSION AND INSTRUCTION SCHEDULE 59
5.2.1 Four vertex version hardware design 59
5.2.2 Four vertex version instruction schedule 61
CHAPTER 6. EXPERIMENTAL RESULTS 62
6.1 EXPERIMENTAL METHODOLOGY 62
6.2 EFFECTIVENESS OF LOW POWER DESIGN IN ONE VERTEX VERSION 63
6.2.1 Comparison of different clock gating versions 63
6.2.2 Comparison of with/without extra pipeline registers in EXE1/EXE2 65
6.2.3 Comparison of with/without extra multiplexers 67
6.2.4 Comparison of doing low power design in forwarding multiplexer 69
6.2.5 Comparison of register bank-based and module-based 71
6.2.6 Final version and instruction schedule 73
6.3 EFFECTIVENESS OF LOW POWER DESIGN IN FOUR VERTEX VERSION 78
CHAPTER 7. CONCLUSION AND FUTURE WORK 83
7.1 CONCLUSION 83
7.2 FUTURE WORK 83
REFERENCES 85
參考文獻 References
[1] R. I. Bahar and S. Manne, “Power and energy reduction via pipeline balancing,” in Proc. 28th Int. Symp. Computer Architecture (ISCA), pp. 218–229, July 2001.
[2] Hai Li, S. Bhunia, Yiran Chen; K. Roy,and T.N. Vijaykumar, “DCG: deterministic clock-gating for low-power microprocessor design,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume 12, Issue 3, pp.245–254, March 2004.
[3] Xiaotao Chang, Mingming Zhang, Ge Zhang, Zhimin Zhang, and Jim Wang, “Adaptive Clock Gating Technique for Low Power IP Core in SoC Design,” IEEE International Symposium on Circuits and Systems, 2007(ISCAS 2007) , pp.2120-2123, May 2007.
[4] Chris Maughan and Matthias Wloka, "Vertex Shader Introduction," NVIDIA Corporation.
[5] N. Banerjee, A. Raychowdhury, K. Roy, S. Bhunia, and H. Mahmoodi, “Novel Low-Overhead Operand Isolation Techniques for Low-Power Datapath Synthesis,” IEEE Transactions on very large scale integration (VLSI) Systems, Vol. 14, No. 9, pp. 1034-1039, September 2006.
[6] Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, and Hoi-Jun Yoo, “A 155-Mw 50-Mvertices/s Graphics Processor With Fixed-Point Programmable Vertex Shader for Mobile Applications,” IEEE Journal of solid-state circuits, Vol. 41, NO. 5, pp. 1081-1091, May 2006.
[7] Erik Lindholm, Mark J Kilgard, and Henry Moreton, "A User-Programmable Vertex Engine," NVIDIA Corporation, pp. 149-158.
[8] Richard Atwater Thomson, "The Direct3D Graphics Pipeline," August 13, 2006.
[9] Kuo-Chuan Chao, Kuan-Hung Chen, and Yuan-Sun Chu, "Low-Power Mechanism with Power Block Management," (ISCAS 2006), pp. 2233-2236.
[10] “GPU ShaderAnalyzer 1.40,” ATI.
[11] Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo, “A 52.4mW 3D Graphics Processor with 141 Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling,” ISSCC 2007, pp. 278-279, 2007.
[12] “NVIDIA nfiniteFX Engine: Programmable Vertex Shaders,” NVIDIA Corporation.
[13] L. Benini and G. De Micheli, ”State Assignment for low power dissipation,” IEEE Journal of solid State Circuits, 1995.
[14] J. Montiero, S. Devadas, and A. Gosh, “Retiming sequential circuits for low power,” Proceedings of the ICCAD, 1993.
[15] Wang Bing, Peng Rui-hua, and Wang Qin, “A Design Flow for Clock Controller Hard-macro Generation,” ASIC, pp.94-97, 2007.
[16] Rani Bhutada and Yiannos Manoli, “Complex clock gating with integrated clock gating logic cell,” Design & Techonology of Integrated Systems in Nanoscale Era, pp.164-169, 2007.
[17] A. Parikh, M. Kandemir, N. vijaykrishnan, and M.J. Irwin, “Instruction Scheduling based on Energy and Performance Constraints,” IEEE Computers Society Annual Workshop on VLSI, pp. 37-42, April 2000.
[18] A. Parikh, M. Kandemir, N. vijaykrishnan, and M.J. Irwin, “VLIW scheduleing for energy and performance,” IEEE Computer Society Workshop on VLSI, pp. 111-117, 2001.
[19] G. Sinevriotis, and T. Stouraitis, “A Novel List-scheduling Algorithm For Circuits and Systems,” ISCAS 2002, pp IV-97 – IV-100, May 2002.
[20] Miloš D Ercegovac and Tomás Lang, “Digital Arithmetic,” 2003.
[21] Wei-sen lin, “Design of Unified Arithmetic Units for 3D Graphics Vertex Shader,” 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code