Responsive image
博碩士論文 etd-0901109-043304 詳細資訊
Title page for etd-0901109-043304
論文名稱
Title
同時支援浮點和定點格式運算之可程式化頂點處理器設計、實作與驗證
Design, Implementation, And Verification Of A Programmable Floating- And Fixed-Point Vertex Shader
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
136
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-07-29
繳交日期
Date of Submission
2009-09-01
關鍵字
Keywords
幾何運算、三維圖學、頂點處理器
Vertex Shader, SIMD, Programmable
統計
Statistics
本論文已被瀏覽 5674 次,被下載 9
The thesis/dissertation has been browsed 5674 times, has been downloaded 9 times.
中文摘要
3D 繪圖流程依照功能上大致可分為:幾何轉換子系統(Geometry Subsystem)以及著色子系統(Render Subsystem)。幾何轉換子系統的硬體主要可分成2種方式,一種是固定功能的硬體管線(Fixed Function Pipeline),其架構是採用固定的硬體管線,運算流程固定無彈性;另一種是可程式化的頂點著色引擎(User-Programmable Vertex Shader),可依使用者的需求來產生不同運算結果,運算較為彈性,也逐漸成為目前設計上的主流。而本篇論文依照OpenGL ES 2.0之規格設計一個可程式化幾何轉換子系統。本論文提出以SIMD(Single-Instruction Multiple-Data) 乘累加器(MAC)為基礎,作為可程式化頂點處理器中之主要架構,將浮點、定點向量運算單元與浮點特殊函數運算單元整合於其中,並提供自行定義之指令集,可供使用者選擇定點、浮點,向量或是單純純量運算之處理器架構。此外,本論文針對硬體面積、效能及功率消耗等需求作硬體最佳化之處理。
Abstract
3D graphics pipeline can be divided into two subsystems: geometry subsystem and rendering subsystem.
Hardware implementation of the transformation and lighting in the geometric subsystem can be divided into two categories, fixed function pipeline and programmable vertex shader. This thesis proposes a programmable vertex shader design based on OpenGL ES 2.0 specification. We start from the design of instruction set and use a multiplier-accumulator (MAC)-based SIMD (Single-Instruction Multiple-Data) structure. The vertex shader supports both floating-point and fixed-point operations of both scalar and vector formats. In addition, the special function unit for calculation of complicated functions is also integrated in the vertex shader. Besides, we also make out best to minimize the cost, power ,and delay during the entire design process.
目次 Table of Contents
第1 章 概論 .. 1
1.1 本文大綱. 1
1.2 研究動機. 1
1.3 貢獻... 2
第2 章 研究背景與相關研究 .. 3
2.1 三維圖學管線流程. 3
2.1.1 三維圖學之API 規範 . 3
2.1.2 幾何運算之架構比較:Fixed Function Geometry Engine V.S.
Programable Geometry Engine (Vertex Shader) 7
2.2 Geometry System 所負責之運算 10
2.2.1 座標轉換(Transformation) ..... 10
2.2.2 座標轉換顏色運算(Lighting) 18
2.2.3 Culling & Clipping .... 20
2.3 相關研究論文. 21
2.3.1 Design Before 2005(From 1999 To 2004) ..... 22
2.3.2 Fixed-Point SIMD Vertex Shader ... 24
2.3.3 Floating-Point SIMD Vertex Shader .. 27
2.3.4 Multi-Thread VLIW Vertex Shader ... 29
2.3.5 LNS-Based Vertex Shader ..... 32
2.3.6 Unified Vertex/Pixel Shader ..... 40
2.3.7 Multimedia Processor ... 41
2.3.7 比較與討論. 42
2.4 本論文提出的架構概觀... 44
第3 章 Vertex Shader 概觀 ..... 46
3.1 整體架構簡述. 46
3.2 算術運算單元對於頂點處理器的重要性及影響..... 48
3.3 頂點處理器中所需的相關算術運算. 49
3.3.1 座標轉換(Transformation) .... 49
3.3.2 光源計算(Lighting) ... 51
3.4 Vertex Shader 指令集 . 55
3.4.1 指令集設計.... 55
3.4.2 由指令集來組成Geometry System 所需之數學運算 .. 61
3.5 Vertex Shader 硬體設計 ... 65
3.5.1 Four-Way Floating And Fixed-Point SIMD Vector Unit . 65
3.5.2 Special Function Unit(SFU) .. 72
3.4 溝通介面... 79
第4 章 管線化(Pipeline) .... 83
4.1 Vertex Shader 之管線化設計 . 83
4.2 資料危障與回饋機制(Hazard And Forwarding) . 85
4.2.1 解決資料危障問題:Forwarding ..... 85
4.2.2 timing diagram of Geometry system operations .. 86
4.2.3 排序方法其效率及正確性之分析.... 94
4.3 管線化架構下之效能評估..... 95
第5 章 實作與驗證結果 .... 96
5.1 合成數據結果... 96
5.2 驗證方法與流程..... 99
5.3 比較..... 102
第6 章 結論與未來目標 .. 105
6.1 結論..... 105
6.2 未來目標... 106
參考文獻 108
附錄A(Appendix A) 112
圖目錄
[圖2-1] Overview Of Opengl ES Operation ... 4
[圖2-2] Primitive Assembly .... 5
[圖2-3] Rasterization .. 6
[圖2-4] Opengl ES 1.X 圖形運算固定功能管線流程 .... 7
[圖2-5] Geometry Engine 運算功能分布 8
[圖2-6] Geometry Subsystem Engine 架構圖 8
[圖2-7] Opengl ES 2.0 可程式化3D 圖形運算管線流程 .... 9
[圖2-8] Geometry Operations In Opengl 10
[圖2-9] 座標轉換管線圖..... 11
[圖2-10] 文字經過縮放的效果..11
[圖2-11] 文字經過平移的效果..12
[圖2-12] 文字經過旋轉的效果. 13
[圖2-13] 觀點轉換後的座標系統... 14
[圖2-14] 視埠轉換示意圖... 16
[圖2-15] 光線效果... 19
[圖2-16] 光線運算當中各個向量示意圖... 19
[圖2-17] Clipping Operation. 20
[圖2-18] Culling Operation ... 21
[圖2-19] 向量處理器的架構圖. 23
[圖2-20] VPU1 架構圖 .. 23
[圖2-21] 可程式化頂點處理器概觀..... 24
[圖2-22] 操作模式... 25
[圖2-23] (A)(B)矩陣運算與其生產量,(C)單一乘加器硬體架構圖 . 26
[圖2-24] 指令層級功率管理..... 26
[圖2-25] 幾何轉換引擎架構..... 28
[圖2-26] 特殊函數運算單元架構圖..... 29
[圖2-27] VLIW 架構圖 . 31
[圖2-28] Pre CAche/POST Cache 示意圖 .. 32
[圖2-29] 對數轉換器架構圖..... 34
[圖2-30] 小數部份產生器(FPGEN) ..... 34
[圖2-31] 整合型算術運算單元架構..... 36
[圖2-32] LNS Stage 內部架構圖 ..... 36
[圖2-33] 可規劃性的CPA 樹狀架構圖 ..... 37
[圖2-34] 內積運算相關的硬體資源圖. 38
[圖2-35] 矩陣運算生產量示意圖... 38
[圖2-36] 矩陣運算相關的硬體資源圖. 39
[圖2-37] 圖形處理器架構圖..... 39
[圖2-38] Pixel-Vertex Multithreading..... 40
[圖2-39] Fully Programmable 3-D Graphics Processor . 40
[圖2-40] Stream Processor .... 42
[圖2-41] 整體架構概觀. 44
[圖2-42] 階層式架構概觀... 45
[圖3-1] Vertex Shader 系統概觀 46
[圖3-2] 乘法加總器硬體架構... 51
[圖3-3] 光源計算流程... 51
[圖3-4] Negative/Swizzle ..... 56
[圖3-5] Write Mask ... 57
[圖3-6] 矩陣乘法..... 61
[圖3-7] Geometry System 所需運算之流程 63
[圖3-8] Vertex Shader 4-Way SIMD Vector Unit ..... 65
[圖3-9] Swizzle/negative 66
[圖3-10]浮點數加法器... 67
[圖3-11]浮點數乘法器 ... 67
[圖3-12]Vertex Shader 4-way SIMD Datapath and Special Function Unit .. 72
[圖3-13]近似法的階數、表格大小、絕對誤差三者之間的關係圖... 73
[圖3-14] N 階內插近似法硬體單元 74
[圖3-15]Special Function Unit Architecture . 78
[圖3-16] Interface 探討區塊 79
[圖3-17] Vertex Shader System Flow .... 82
[圖4-1] 管線Timing Diagram ... 83
[圖4-2] 硬體管線..... 84
[圖4-3] Timing diagram of transdormation I 86
[圖4-4] Timing diagram of transdormation II ..... 87
[圖4-5] Timing diagram of transdormation III ... 87
[圖4-6] Timing diagram of transdormation IV ... 88
[圖4-7] Timing diagram of transdormation V .... 88
[圖4-8] Timing diagram of transdormation VI ... 89
[圖4-9] Timing diagram of lighting I 89
[圖4-10] Timing diagram of lighting II .. 90
[圖4-11] Timing diagram of lighting III . 90
[圖4-12] Timing diagram of lighting IV . 91
[圖4-13] Timing diagram of lighting V .. 91
[圖4-14] Timing diagram of lighting VI . 92
[圖4-15] Timing diagram of lighting VII 92
[圖4-16] Timing diagram of lighting VIII .... 93
[圖4-17] Timing diagram of lighting IX . 93
[圖4-18] Timing diagram of lighting X .. 94
[圖4-19] Timing diagram of lighting XI . 94
[圖5-1] Vertex Shader System 各部份所占面積之百分比示意圖 . 96
[圖5-2] Vertex Shader Datapath 各算術邏輯單元所占之比例 . 97
[圖5-3] 算術運算單元Functuin & Gate level 驗證流程圖 . 99
表目錄
[表2-1] 各種Transformation 矩陣的整理 .. 17
[表2-2] Clipping.. 20
[表2-3] 相關論文綜合特色表... 43
[表3-1] 算術運算單元之分析..... 54
[表3-2] Instruction Format Description .. 58
[表3-3] Instruction Set .... 59
[表4-1] 管線化下的效能 . 95
[表5-1] Vertex Shader System 合成數據與各單元面積比例 ... 96
[表5-2] Vertex Shader Datapath 合成數據分析 . 98
[表5-3] 特殊函數指令的相對誤差表... 101
[表5-4] 相關論文之功能性比較..... 102
[表5-5] 數據比較表. 104
[表6-1] Vertex Specification and Characteristics ... 106
參考文獻 References
[1]. J.-H. Sohn, et al., "A 50Mvertices/S Graphics Processor With Fixed-Point Programmable Vertex Shader For Moblie Applications", IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 192-193, Feb. 2005.
[2]. D. Kim, et al., "An Soc With 1.3Gtexels/S 3D Graphics Full Pipeline Engine For Consumer Applications", IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 190-191, Feb. 2005.
[3]. C.-H. Yu, K. Chung, D. Kim, and L.-S Kim, "A 120Mvertices/S Multi-Threaded VLIW Vertex Processor For Mobile Multimedia Applications", IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 408-409, Feb., 2006.
[4]. B.-G Nam, J. Lee, S. J. Lee, and H.-J Yoo, "A 52.4mw 3D Graphics Processor With 141Mvertices/S Vertex Shader And 3 Power Domains Of Dynamic Voltage And Frequency Scaling", IEEE International Solid-State Circuits Conference (ISSCC) , Dig. Tech. Papers, pp. 278-603, Feb., 2007.
[5]. J. Sohn, et al., "A 155-Mw 50-Mvertices/S Graphics Processor With Fixed-Point Programmable Vertex Shader For Mobile Applications", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, pp. 1081-1091, NO. 5, MAY 2006.
[6]. D. Kim, et al., "An Soc With 1.3 Gtexels/S 3-D Graphics Full Pipeline For Consumer Applications", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, pp. 71-84, NO. 1, JANUARY 2006.
[7]. B.-G Nam, H. Kim, and H-J Yoo, "A Low-Power Unified Arithmetic Unit For Programmable Handheld 3-D Graphics Systems", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 42, pp. 1767-1778, NO. 8, AUGUST 2007.
[8]. D. Harris, "An Exponentiation Unit For An Opengl Lighting Engine", IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, pp. 254-254, NO. 3, MARCH 2004.
[9]. B.-G. Nam, M.-W. Lee, and H.-J. Yoo, "Development Of A 3-D Graphics Rendering Engine With Lighting Acceleration For Handheld Multimedia Systems," IEEE Transactions On Consumer Electronics, VOL. 51, pp. 1020-1027, No. 3, AUGUST 2005.
[10]. N. Ide, et al., "2.44-GFLOPS 300-Mhz Floating-Point Vector-Processing Unit For High-Performance 3-D Graphics Computing", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 35, pp. 1025-1033, NO. 7, JULY 2000.
[11]. C.-H. Chen, C.-Y. Lee., "A Cost Effective Lighting Processor For 3D Graphics Application," International Conference On Image Processing, VOL. 2, pp.792-796, NO.8, Oct. 1999.
[12]. B.-G Nam, H. Kim, and H.-J. Yoo, "Power And Area-Efficient Unified Computation Of Vector And Elementary Functions For Handheld 3D Graphics Systems," IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, pp.490-504, NO. 4, APRIL 2008.
[13]. H. Kim, et al., "A 231-Mhz, 2.18-Mw 32-Bit Logarithmic Arithmetic Unit For Fixed-Point 3-D Graphics System", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, pp. 2373-2381, NO. 11, NOVEMBER 2006.
[14]. A. Kunimatsu, et al., "VECTOR UNIT ARCHITECTURE FOR EMOTION SYNTHESIS", IEEE MICRO, VOL. 20, pp. 40-47, Issue 2, March-April 2000.
[15]. E. Lindholm, M. J. Kligard, and H. Moreton, "A User-Programmable Vertex Engine", ACM SIGGRAPH, pp. 149-158 August 2001.
[16]. J.-H Woo, et al., "A 195mw, 9.1mvertices/S Fully Programmable 3D Graphics Processor For Low Power Mobile Devices", IEEE Asian Solid-State Circuits Conference (ASSCC), pp. 372 - 375, Nov. 2007.
[17]. M. J. Schulte, E. E. Swartzlander, "Hardware Designs For Exactly Rounded Elementary Functions", IEEE TRANSACTIONS ON COMPUTERS, VOL. 43, pp. 964-973, NO. 8, AUGUST 1994.
[18]. Http://www.Microsoft.Com
[19]. Http://www.Opengl.Org
[20]. Http://www.Khronos.Org
[21]. J. Kessenich, "Opengl ES Shading Language", Language Version 1.10, 2006.
[22]. C.-H. Yu, D. Kim, and L.-S. Kim, "A 33.2Mvertices/Sec Programmable Geometry Engine For Multimedia Embedded Systems", IEEE Circuits And Systems (ISCAS), VOL. 5, pp. 4574-4577, May 2005.
[23]. J. M. Muller, "Partially Rounded" Small-Order Approximations For Accurate, Hardware-Oriented, Table-Based Methods," 16th IEEE Symposium On Computer Arithmetic Proceedings, pp. 114 - 121, 15-18 Jun. 2003.
[24].C.-H. Yu, et al., "An Energy-Efficient Mobile Vertex Processor With Multithread Expanded VLIW Architecture And Vertex Caches", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 42, pp. 2257-2269, NO. 10, OCTOBER 2007
[25]. J.-H. Woo, et al., "A 195Mw, 9.1mvertices/S Fully Programmable 3-D Graphics Processor For Low-Power Mobile Devices", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL.43, pp. 220-221, NO. 11, NOVEMBER 2008
[26]. J.-H. Woo, et al., "A 195Mw Mobile Multimedia SoC With Fully Programmable 3-D Graphics And MPEG4/H.264/JPEG", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL.43, pp.220-221, NO. 9, NOVEMBER 2008
[27]. S.-Y. Chien, et al., "An 8.6mw 25Mvertices/S 400-MFLOPS 800-MOPS Multimedia Stream Processor Core For 8.91mm2 Mobile Applications", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL.43, pp. 2025-2035, NO. 9, NOVEMBER 2008
[28]. Y. Tsao, C.-H. Chang, Y.-C. Lin, S.-Y. Chien, and L.-G Chen, "An 8.6Mw 12.5Mvertices/S 800MOPS 8.91mm2 Stream Processor Core For Mobile Graphics And Video Applications", IEEE Symposium On VLSI Circuits Digest Of Technical Papers, pp. 218-219, June. 2007
[30]. A. Munshi, "Opengl ES Common/Common-Lite Profile Specification", Ver. 1.1, Nov. 2004.
[31]. T.-Y. Huang, "Hardware Design, Integration, And Verification Of Geometry Engine In 3D Graphics", National Sun-Yet San University, July 2006.
[32]. J. Kessenich, "Opengl ES Shading Language", Language Version 1.10, 2006.
[33]. M. D. Ercegovac, T. Lang, "Digital Arithmetic," Morgan Kaufmann Publishers, pp. 182 - 237, 2004
[34]. J. Cao, B. W. Y. Wei, "High-performance hardware for function generation", 13th IEEE Symposium on Computer Arithmetic Proceedings, pp. 184 - 186, 6 - 9 Jul. 1997
[35]. J. Cao, et al., "High-performance architectures for elementary function", 13th IEEE Symposium on Computer Arithmetic Proceedings, pp. 136 - 144, 11 - 13 Jun. 2001
[36]. M. J. Schulte, J. E. Stine, "Approximating elementary functions with symmetric bipartite tables", IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), vol. 48, pp. 842- 847, 8 Aug. 1999
[37]. F. Dinechin, A. Tisserand, "Some improvements on multipartite table methods," 15th IEEE Symposium on Computer Arithmetic Proceedings, pp. 128 - 135, 11 - 13 Jun. 2001
[38]. W.-S. Lin, "Design of Unified Arithmetic Units for 3D Graphics Vertex Shader", National Sun-Yet San University, July 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內一年後公開,校外永不公開 campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.191.254.106
論文開放下載的時間是 校外不公開

Your IP address is 18.191.254.106
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code