國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,多核心多執行緒浮點格式處理器設計及其在電腦繪圖之應用 ,Design of a Multi-Core Multi-thread Floating-Point Processor and Its Application in Computer Graphics

論文名稱 Title	多核心多執行緒浮點格式處理器設計及其在電腦繪圖之應用 Design of a Multi-Core Multi-thread Floating-Point Processor and Its Application in Computer Graphics
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	99 學年度第 2 學期 The spring semester of Academic Year 99	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	115
研究生 Author	葉家裕 Chia-Yu Yeh
指導教授 Advisor	蕭勝夫 Shen-Fu Hsiao
召集委員 Convenor	陳中和 Chung-Ho Chen
口試委員 Advisory Committee	陳銘志, 鄺獻榮, 張雲南 Ming-Chih Chen; Shiann-Rong Kuang; Yun-Nan Chang
口試日期 Date of Exam	2011-07-25	繳交日期 Date of Submission	2011-09-06
關鍵字 Keywords	多執行緒、單指令多資料流、頂點著色器、多核心、圖形處理單元、矩陣乘法 multi-threading, graphics processing unit (GPU), vertex shader, SIMD, matrix-vector multiplication, OpenGL ES 2.0
統計 Statistics	本論文已被瀏覽 5667 次，被下載 460 次 The thesis/dissertation has been browsed 5667 times, has been downloaded 460 times.

中文摘要
隨著近年來電腦繪圖晶片GPU的蓬勃發展，可看出研發者採用許多技術來加速硬體速度，這些技術包含單指令多資料流、超長指令字元、多執行緒和多核心……等等。在OpenGL ES 2.0的規範下，可程式化的頂點著色器採用單指令多資料流的運算單元架構，可以有效率的執行3D圖學中最常見的矩陣乘法運算。近年來，nVidia公司出產的Telsa系列高效能顯示卡採用了多核心的純量運算單元，因為這樣的設計使GPU更能貼近通用圖形處理器的概念。在本篇論文中，我們設計了一個多核心多執行緒的浮點格式處理器(以Scalar_based MT-GPU代稱)來執行多執行緒指令。Scalar_based MT-GPU包含了4個純量處理器(SP)和1個特殊功能單元(SFU)，目前主要應用在3D圖學中的T&L運算，在本篇論文最後也會和頂點著色器做效率、面積上的比較。
Abstract
Graphics processing unit (GPU) designs usually adopts various computer architecture techniques to boost the computation speed, including single-instruction multiple data (SIMD), very-long-instruction word (VLIW), multi-threading, and/or multi-core. In OpenGL ES 2.0, user programmable vertex shader (VS) hardware unit can be designed using vectored SIMD computation unit so that it can efficiently compute the matrix-vector multiplication, one of the key operations in vertex transformation. Recently, high-performance GPU, such as Telsa series from nVidia, is designed with many-core architectures with each core responsible for scalar operations. The intention is to allow for efficient execution of general-purpose computations in addition to the specialized graphics computations. In this thesis, we design a scalar-based multi-threaded GPU design that is composed of four scalar processors, one special-function unit, and can execute multi-threaded instructions. We use the example of vertex transformation to demonstrate execution efficiency of the scalar-based multi-threaded GPU. We also make comparison with the vector-based SIMD GPU.

目次 Table of Contents
第1章概論 1 1.1本文大綱 1 1.2研究動機 1 第2章 3D圖學介紹 4 2.1 三維圖學管線流程 4 2.1.1 三維圖學之API : DirectX and OpenGL 4 2.1.2 OpenGL ES 1.X v.s. OpenGL ES 2.0 15 2.2 Geometry system所需之運算 17 2.2.1 座標轉換（transformation） 18 2.2.2 顏色運算（lighting） 19 2.2.3 Culling & Clipping 20 第3章 GPU與Multi-threading相關研究 23 3.1 GPU發展歷程 23 3.2 有關於平行化的相關研究 29 3.2.1 指令平行化(ILP) 30 3.2.2 執行緒平行化(TLP) 31 3.3 有關於CPU多執行緒的相關研究 38 3.4 Multi-threading GPU 42 3.4.1 SIMD vertex shader 42 3.4.2 Scalar_based MT-GPU 48 第4章 Scalar_based MT-GPU之設計與實作 50 4.1 整體架構簡述 50 4.2 Scalar_based MT-GPU指令集 51 4.2.1 指令集設計 51 4.2.2 由指令集來組成Geometry system 所需之數學運算 54 4.3 Function Units 62 4.3.1 floating-point scalar processor 62 4.3.2 special function unit 66 4.4 管線化設計 69 4.4.1 Scalar_based MT-GPU之管線化設計 69 4.5 Multi-threading設計 71 第5章 vertex shader之驗證及效能 74 5.1 模擬與驗證 74 5.2 效能評估 74 5.2.1 指令分析 74 5.2.2 效能分析 78 5.2.3 合成數據 79 5.3 與相關論文之比較結果 80 第6章結論與未來目標 84 6.1 資料危障與回饋機制(hazard and forwarding) 84 參考文獻 88

參考文獻 References
[1] http://www.opengl.org/ [2] http://www.khronos.org/ [3] NVIDIA CUDA. http://developer.nvidia.com/category/zone/cuda-zone [4] T. Ungerer, B. Robic, and J. Silc, 「A Survey of Processors with Explicit Multithreading」, ACM Computing Surveys,pp. 29–63, ACM, Vol. 35, No. 1, March 2003. [5] P. Watcharawitch, S. Moore, 「MulTEP: MultiThreaded Embedded Processors」, International Symposium on Low-Power and High-Speed Chips IV, vol. I. the IEEE/IEICE/IPSJ /ACM SIGARCH, 2003. [6] J. Laudon , A. Gupta, et al. , 「Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations」 ,Computer Systems Laboratory Stanford University Stanford, CA 94305 [7] Fung, Wilson Wai Lun , 「Dynamic Warp Formation: Exploiting Thread Scheduling for Efficient MIMD Control Flow on SIMD Graphics Hardware」 , MICRO 2007. 40th Annual IEEE/ACM International Symposium. [8] L Spracklen; S.G. Abraham, , 「Chip multithreading: opportunities and challenges,」High-Performance Computer Architecture, 2005. HPC Symposium on, pp. 248-252, 12-16 Feb. 2005. [9] D.M. Tullsen, S. Eggers, et al..」 Simultaneous Multithreading: Maximizing On-Chip Parallelism.」 In Proc.22th Int. Symp. on Computer Architecture, 1995. [10] 「An Introduction to Modern GPU Architecture」, nVIDIA [11] J. Kreuzinger and T. Ungerer. 「Context Switching Techniques for Decoupled Multithreaded Processors. 「 In Proc.Euromicro'99, 1999. [12] T. Ungerer, B. Robie, et al., 「A survey of processors with explicit multithreading,」 ACM Computing Surveys, Vo1.3S, No. 1, pp.29-63, 2003. [13] T. Ungerer, B. Robie, et al., 「Multithreaded processors,」The Computer Journal, VoI.4S, No.3, 2002. [14] Y. Lu, S. Sezer, et al., 「Advanced multithreading architecture with hardware based thread scheduling,」 Proceedings of International Conference on Field Programmable Logic and Applications (FPL), 2010. [15] S. Sen, H. Muller, et al., 「Synchronization in a Multithreaded Processor」, Communicating Process Architectures 2000 [16] J. L. Lo, J. S. Emer, et al. 「Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. 「ACM Transactions on Computer Systems, 15, 1997. [17] E. Rotenberg, Q. Jacobson, et al. 「Trace Processors,」 Proceedings. of the 30th International Symposium on Microarchitecture, pp.138-148, Dec. 1997. [18] J.- H. Clark. 「The Geometry Engine: A VLSI Geometry System for Graphics.」 In R. Daniel Bergeron, editor, Computer Graphics (SIGGRAPH 82 Conference Proceedings), volume 16, pages 127 - 133. Addison Wesley, July 1982. [19] B.-O. Schneider, 「Efficient Polygon Clipping for an SIMD Graphics Pipeline」, IEEE Transactions On Visualization and Computer Graphics, vol.4, no.3, pp. 272-285, July-September 1998 [20] E. Lindholm, M.-J. Kilgard and Henry Moreton, 「A User-Programmable Vertex Engine」, ACM SIGGRAPH 2001, 12-17 August 2001. [21] C.-H. Yu, K. Chung, D. Kim and L.-S. Kim ,「A 120Mvertices/s Multi-threaded VLIW Vertex Processor for Mobile Multimedia Applications」, ISSCC Page(s):1606 - 1615 2006 [22] C.H. Yu, L.S. Kim, et al. 「An energy-efficient mobile vertex processor with multithread expanded VLIW architecture and vertex caches,」 IEEE J. Solid-State Circuits, vol. 42, no. 10, pp. 2257–2269, Oct. 2007. [23] C.H. Yu, L.S. Kim, et al. 「A 186-Mvertices/s 161-mW Floating-Point Vertex Processor With Optimized Datapath and Vertex Caches,」 IEEE Transactions on VLSI Systems, Vol.17, NO. 10,pp. 1369 - 1382,Oct. 2009 [24] J.S. Yoon, L.S. Kim, et al. 「A Dual-Shader 3-D Graphics Processor With Fast 4-D Vector Inner Product Units and Power-Aware Texture Cache, 」 IEEE Transactions on VLSI Systems, VOL. 19, NO. 4, pp. 525 - 537, APRIL 2011 [25] W.-S. Lin, 「Design of Unified Arithmetic Units for 3D Graphics Vertex Shader」, National Sun-Yet San University, July 2008. [26] K.-M. Huang , 「 Design, Implementation, And Verification Of A Programmable Floating- And Fixed-Point Vertex Shader」 , National Sun-Yet San University, July 2009.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0906111-035109.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS