國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,三維繪圖頂點處理器之整合型算術運算單元設計,Design of Unified Arithmetic Units for 3D Graphics Vertex Shader

論文名稱 Title	三維繪圖頂點處理器之整合型算術運算單元設計 Design of Unified Arithmetic Units for 3D Graphics Vertex Shader
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	96 學年度第 2 學期 The spring semester of Academic Year 96	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	99
研究生 Author	林為森 Wei-Sen Lin
指導教授 Advisor	蕭勝夫 Shen-Fu Hsiao
召集委員 Convenor	張雲南 Yun-Nan Chang
口試委員 Advisory Committee	陳春僥 Chuen-Yau Chen
口試日期 Date of Exam	2008-06-13	繳交日期 Date of Submission	2008-09-02
關鍵字 Keywords	矩陣運算生產量、多階內插近似法、頂點處理器 Vertex Shader, higher-order approximation, throughput of the matrix computation
統計 Statistics	本論文已被瀏覽 5650 次，被下載 1614 次 The thesis/dissertation has been browsed 5650 times, has been downloaded 1614 times.

中文摘要
頂點處理器為三維電腦繪圖晶片系統核心之一，目的是加速三維繪圖管線中的座標轉換及光源計算，而算術運算單元是頂點處理器的主要硬體。本論文提出整合型算術運算單元的架構，將浮點向量運算單元與浮點特殊函數運算單元整合在一起，目的是為了共用某些硬體單元，進而節省面積。本論文提出3個架構：提議架構I包含SIMD架構的浮點向量運算單元及以一階內插近似法實現浮點特殊函數運算單元。提議架構II針對提議架構I的面積作改進，以多階內插近似法來減小特殊函數表格的總面積，但需額外的硬體如平方器、3次方器、4次方器。另外為了減少面積，多階內插法所需的內積運算由原本的浮點向量內積運算單元所取代，不過也因此造成向量運算指令的延遲。提議架構III主要針對向量運算指令的延遲與矩陣運算生產量作改進，共有2份浮點向量內積運算單元，其中之一獨立出來以減少向量運算指令的延遲，另一份則為浮點特殊函數運算單元的一部分。而這2份向量內積運算單元又可配合矩陣的運算，藉此提高一倍的矩陣運算生產量。
Abstract
Vertex shader, one of the core parts in 3D graphics systems, is to speed up the operations of coordinate transformation and lighting in 3D graphics pipeline, and vector ALU is the key part of a vertex shader. This thesis proposes several unified architectures that integrate the floating-point vector arithmetic unit and special function unit in order to share some hardware resource. We propose three different architectures for the design of the unified vector ALU. The first architecture includes a single-instruction-multiple-data (SIMD) vector arithmetic unit, and uses table-based method with first-order approximation to calculate some special functions. The second architecture use higher-order approximation to reduce the table sizes and share the floating-point multipliers in the SIMD vector unit. The proposed third architecture has two copies of hardware that can compute two dot-product operations in parallel and thus increase the throughput of the matrix computation by a factor of two. Furthermore, the two dot-product units can be used to perform the interpolation for special function calculation.

目次 Table of Contents
Chapter 1 導論...................................................................................................... 12 1.1 研究動機.................................................................................................. 12 1.2 論文架構.................................................................................................. 12 Chapter 2 頂點處理器所需算術運算單元的介紹.............................................. 13 2.1 3D 圖學簡介 ............................................................................................ 13 2.1.1 三維繪圖管線.................................................................................... 13 2.1.2 幾何轉換子系統介紹........................................................................ 16 2.2 頂點處理器概觀...................................................................................... 25 2.2.1 頂點處理器的基本架構.................................................................... 26 2.2.2 算術運算單元設計對於頂點處理器的重要性及影響.................... 27 2.3 頂點處理器中所需的相關算術運算...................................................... 28 2.3.1 座標轉換............................................................................................ 28 2.3.2 光源計算............................................................................................ 31 Chapter 3 相關論文之探討.................................................................................. 35 3.1 Designs before 2005 (from 1999 – 2004) ................................................ 35 3.2 fixed-point SIMD Vertex Shader[5] [ISSCC’05][JSSC’06] .................... 37 3.3 floating-point SIMD Vertex Shader [6] [ISSCC’05][JSSC’06] ............... 40 3.4 Multi-Thread VLIW Vertex Shader [3] [ISSCC’06] ............................... 42 3.5 LNS-based Vertex Shader [7] [ISSCC’07][JSSC’07] .............................. 45 3.6 Unified Vertex/Pixel Shader..................................................................... 53 3.7 比較與討論.............................................................................................. 54 Chapter 4 整合型算術運算單元的架構設計...................................................... 55 4.1 提議架構I ............................................................................................... 55 4.1.1 架構概觀............................................................................................ 55 4.1.2 指令集................................................................................................ 59 4.1.3 向量運算單元.................................................................................... 60 4.1.4 特殊運算單元.................................................................................... 62 4.2 提議架構 II ............................................................................................. 66 4.2.1 架構概觀............................................................................................ 66 4.2.2 指令集................................................................................................ 67 4.2.3 改進的地方與方法............................................................................ 68 4.3 提議架構 III............................................................................................ 72 4.3.1 架構概觀............................................................................................ 72 4.3.2 指令集................................................................................................ 73 4.3.3 改進的地方與方法............................................................................ 74 4.4 提議架構與相關論文的比較.................................................................. 75 4.4.1 算術運算單元複雜度(Arithmetic Unit Complexity)分析................ 76 4.4.2 矩陣運算生產量分析與比較............................................................ 79 4.4.3 指令延遲分析與比較........................................................................ 81 Chapter 5 實作與驗證.......................................................................................... 85 5.1 合成數據.................................................................................................. 85 5.2 驗證方法與流程...................................................................................... 86 5.3 比較.......................................................................................................... 88 Chapter 6 結論與未來展望.................................................................................. 91 6.1 結論.......................................................................................................... 91 6.2 未來展望.................................................................................................. 91 參考文獻...................................................................................................................... 97

參考文獻 References
[1]. J. Sohn, et al., “A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Moblie Applications,” IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 192-193, Feb. 2005. [2]. Donghyun Kim, et al., “An SoC with 1.3Gtexels/s 3D Graphics Full Pipeline Engine for Consumer Applications”, IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 190-191, Feb. 2005. [3]. Chang-Hyo Yu, et al., “A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications”, IEEE International Solid-State Circuits Conference (ISSCC), Dig. Tech. Papers, pp. 408-409, Feb., 2006. [4]. Byeong-Gyu Nam, et al., “A 52.4mW 3D Graphics Processor with 141Mvertices/s Vertex Shader and 3 Power Domains of Dynamic Voltage and Frequency Scaling”, IEEE International Solid-State Circuits Conference (ISSCC) , Dig. Tech. Papers, pp. 278-603, Feb., 2006. [5]. Ju-Ho Sohn, et al., “A 155-mW 50-Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, NO. 5, MAY 2006. [6]. Donghyun Kim, et al., “An SoC With 1.3 Gtexels/s 3-D Graphics Full Pipeline for Consumer Applications”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, NO. 1, JANUARY 2006. [7]. Byeong-Gyu Nam, et al., “A Low-Power Unified Arithmetic Unit for Programmable Handheld 3-D Graphics Systems”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 42, NO. 8, AUGUST 2007. [8]. David Harris, “An Exponentiation Unit for an OpenGL Lighting Engine”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 53, NO. 3, MARCH 2004. [9]. Byeong-Gyu Nam, et al., “Development of a 3-D Graphics Rendering Engine with Lighting Acceleration for Handheld Multimedia Systems”, IEEE Transactions on Consumer Electronics, VOL. 51, No. 3, AUGUST 2005. [10]. Nobuhiro Ide, et al., “2.44-GFLOPS 300-MHz Floating-Point Vector-Processing Unit for High-Performance 3-D Graphics Computing”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 35, NO. 7, JULY 2000. [11]. C.-H. Chen and C.-Y. Lee, “A Cost Effective Lighting Processor for 3D Graphics Application,” International Conference on Image Processing, VOL. 2, pp.792 - 796, 24-28 Oct. 1999. [12]. Byeong-Gyu Nam, Hyejung Kim, Hoi-Jun Yoo, “Power and Area-Efficient Unified Computation of Vector and Elementary Functions for Handheld 3D Graphics Systems”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 4, APRIL 2008. [13]. Hyejung Kim, Byeong-Gyu Nam,“A 231-MHz, 2.18-mW 32-bit Logarithmic Arithmetic Unit for Fixed-Point 3-D Graphics System”, IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), VOL. 41, NO. 11, NOVEMBER 2006. [14]. Atsushi Kunimatsu, Nobuhiro Ide, et al ” VECTOR UNIT ARCHITECTURE FOR EMOTION SYNTHESIS”, IEEE MICRO,VOL. 20, Issue 2, March-April 2000. [15]. Erik Lindholm, Mark J Kilgard, Henry Moreton, “A User-Programmable Vertex Engine”, ACM SIGGRAPH, pp.149-158 August 2001. [16]. Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, “A 195mW, 9.1MVertices/s Fully Programmable 3D Graphics Processor for Low Power Mobile Devices”, IEEE Asian Solid-State Circuits Conference, pp. 372 – 375, Nov. 2007. [17]. Michael J. Schulte, and Earl E. Swartzlander, “Hardware Designs for Exactly Rounded Elementary Functions”, IEEE TRANSACTIONS ON COMPUTERS, VOL. 43, NO. 8, AUGUST 1994. [18]. http://www.microsoft.com [19]. http://www.opengl.org [20]. http://www.khronos.org [21]. John Kessenich, “OpenGL ES Shading Language”, language version 1.10, 2006. [22]. Chang-Hyo Yu, Donghyun Kim and Lee-Sup Kim, “A 33.2Mvertices/sec Programmable Geometry Engine for Multimedia Embedded Systems”, IEEE Circuits and Systems(ISCAS), Vol. 5, pp. 4574–4577, May 2005. [23]. J. M. Muller, “Partially rounded" small-order approximations for accurate, hardware-oriented, table-based methods,” 16th IEEE Symposium on Computer Arithmetic Proceedings, pp. 114 - 121, 15-18 Jun. 2003.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0902108-114207.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS