國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,三維圖學呈像之頂點與像素處理器硬體設計,Design of Vertex and Per-Fragment Processor for 3D Graphics Rendering

論文名稱 Title	三維圖學呈像之頂點與像素處理器硬體設計 Design of Vertex and Per-Fragment Processor for 3D Graphics Rendering
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	95 學年度第 2 學期 The spring semester of Academic Year 95	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	77
研究生 Author	蔡明其 Ming-chi Tsai
指導教授 Advisor	張雲南 Yun-Nan Chang
召集委員 Convenor	陳春僥 Chuen-Yau Chen
口試委員 Advisory Committee	鄺獻榮 Shiann-Rong Kuang
口試日期 Date of Exam	2007-07-24	繳交日期 Date of Submission	2007-09-04
關鍵字 Keywords	像素處理器硬體、頂點處理器、三維圖學呈像 per-fragment processor, vertex processor, 3d graphic rendering
統計 Statistics	本論文已被瀏覽 5650 次，被下載 14 次 The thesis/dissertation has been browsed 5650 times, has been downloaded 14 times.

中文摘要
在近幾年，由於VLSI與多媒體科技的迅速進步，三維圖學應用程式已經在很多領域廣泛且快速地發展，並且不再受限於工作站等特殊技術領域。在未來，三維圖學引擎在大多數多媒體系統上也將成為一個獨立的單元，如娛樂電視設備與個人電子裝置等多媒體系統。一般來說，三維圖學引擎可分成兩部份，幾何系統與呈像系統。本篇論文的主要貢獻在於設計了一個有效率的管線化像素處理流程與頂點處理器的發展，並且協助幾何系統和呈像系統的整合。在像素處理器的設計上，由於此處理器包含了很多處理程序，包含了霧化混合、可見度測試、與透明度混合等單元，本篇論文分析了這些處理階段之間的相依關係，將數個處理程序以平行化的方式改善以減少整體管線化後的時間延遲（Pipeline Latency）；深度測試也被搬移到較前面的處理階段，以減少不必要的貼圖讀取。本篇論文也提出了適合在區塊式三維圖學呈像方法下的記憶體緩衝區讀取機制，已達到減少整體系統的記憶體頻寬。第一個方法利用一些額外的控制旗標且整合頻繁的緩衝區清除操作與標準呈像處理程序，以減少額外的記憶體清除存取。第二個方法是利用一個修改紀錄表來記錄區塊內每一個像素的修改狀態以減少更新的像素個數。根據實驗結果顯示可以減少超過50%的記憶體存取次數。本論文提出的硬體設計已經以.18μm製程實現完成，頂點處理器的邏輯閘個數為201K，而像素處理器的邏輯閘個數則為118K。
Abstract
For the past few years, with the rapid advance of VLSI and multimedia technology, the applications of three-dimensional (3D) graphic applications have been widely and rapidly spread into various areas, and not longer limited into specific technical areas performed by high-end workstations. In near future, the 3D graphic engine will become an indispensable part of most multimedia systems including the entertainment television sets, the personal electronic appliances etc. A general 3D graphics engine can be divided into the geometry subsystem and the raster sub- system. The main contribution of this thesis is to design an efficient fragment pipeline process. It also helps the development of the vertex processor, and the integration of geometry and raster subsystem. In the design of the per-fragment processor, since it contains vary processing stages, such as fog blending, visible test, and alpha blending. This thesis analyzes the dependence relationship between these stages to allow several stages to run in parallel to reduce the overall pipeline latency and adjust the processing order of these stages to avoid unnecessary texturing access. This thesis also proposes two memory buffer access mechanisms suitable for the tile-based 3D graphic rendering engine to reduce the overall system memory bandwidth. The first method is to include some additional control flags for each tile such that the frequent buffer clear operations can be integrated with the normal rendering processes to avoid the additional memory clear access. The second approach is to identify the non-modified pixels in each tile by building the dirty table to reduce the number of updated pixels. The experimental results show that the proposed methods can cause more than 50% reduction of memory access. The proposed design has been realized using 0.18um technology. The gate count of the vertex processor without special functions and per-fragment processor is 201k and 118k, respectively.

目次 Table of Contents
CHAPTER 1 概論 11 1.1 研究動機 11 1.2 本文大綱 12 CHAPTER 2 研究背景與相關研究 13 2.1 三維（3D）圖學簡介與應用 13 2.2 Geometry Subsystem與Raster Subsystem簡介 15 2.3 Raster Subsystem之記憶體頻寬 20 CHAPTER 3 頂點與像素處理器設計 22 3.1 Vertex Processor硬體單元 22 3.1.1 頂點處理器（Vertex Processor）指令說明與設計 23 3.2 像素處理器（Per-Fragment Processor）說明與設計 29 3.2.1 Texture Mapping Unit 30 3.2.2 Fog Blending Unit 33 3.2.3 Per-Fragment Operations Unit 36 3.2.4 Internal Buffer Controller of Per-Fragment Processor 43 CHAPTER 4 像素處理器最佳化 46 4.1 Per-Fragment Processor改善方法 46 4.1.1 Adjustment of the Fragment Rate for Rasterizer 46 4.1.2 Texture Cache & Early Depth Test 48 4.1.3 Reduction of Per-Fragment Operations Pipeline Stages 49 4.1.4 Reduction of Bus Bandwidth for External Memory 51 CHAPTER 5 驗證與效能分析 59 5.1 功能驗證 59 5.1.1 Table精確度比對 60 5.1.2 軟硬體驗證 61 5.1.3 Verification of Versatile FPGA 63 5.2 執行結果與效能分析 67 5.3 硬體合成結果 72 CHAPTER 6 結論與未來研究方向 73 6.1 結論 73 6.2 未來研究方向 73 6.2.1 Programmable Per-Fragment Processor 73 6.2.2 統一著色器（Unified Shader）74

參考文獻 References
[ 1 ] Steven Molnar and Henry Fuchs, “Advance Raster Graphics Architecture,” Computer Graphics: Principles and Practice, 2nd Edition, Addison Weley, 1999. [ 2 ] 梁伯嵩, 聶幼成, 任建葳, 3D 繪圖硬體架構與設計, 電子月刊, Aug 1996 [ 3 ] M. Segal and K. Akeley, The OpenGL Graphics System: A Specification, Ver. 2.0, Oct. 2004. [ 4 ] A. Munshi, OpenGL ES Common/Common-Lite Profile Specification, Ver. 1.1, Nov. 2004. [ 5 ] N. Ide, M. Hirano, Y. Endo, S. Yoshioka, H. Murakami, A. Kunimatsu, T. Sato, T. Kamei, T. Okada and M. Suzuoki, “2.44-GFLOPS 300-MHz Floating-Point Vector-Processing Unit for High-Performance 3D Graphics Computing,” in Proc. of IEEE J. Solid- State Circuits, vol. 35, no. 7, pp. 1025-1032, July 2000. [ 6 ] A. Wolfe and D. B. Noonburg, “A Superscalar 3D Graphics Engine,” in Proc. of the 32nd annual ACM/IEEE international symposium on Microarchitecture , pp. 50-61, Nov. 1999. [ 7 ] C. L. Chen, B. S. Liang and C. W. Jen, “A Low-Cost Raster Engine for Video Game, Multimedia PC and Interactive TV,” in Proc. of IEEE Trans. on Consumer Electronics, vol. 41, no. 3, pp. 724-730, August 1995. [ 8 ] G. J. Dunnett, M. White, P. F. Lister, R. L. Grimsdale, and F. Glemot, “The Image Chip for High Performance 3D Rendering,” in Proc. of IEEE Computer Graphics & Applications, vol. 12, no. 6, pp. 41-52, Nov. 1992. [ 9 ] M. Awaga, T. Ohtsuka, H. Yoshizawa, and S. Sasaki, “3D Graphics Processor Chip Set,” in Proc. of IEEE Micro, vol. 15, no. 6, pp. 37-45, Dec. 1995. [ 10 ] R. Woo and etc., “A 210mW Graphics LSI Implementing Full 3D Pipeline with 264 Mtexels/s Texturing for Mobile Multimedia Applications,” in Proc. of IEEE Journal of Solid-State Circuits, vol. 39, No. 2, pp. 358-367, Feb. 2004. [ 11 ] J. H. Sohn, J. H. Woo, M. W. Lee, H. J. Kim, R. Woo and H. J. Yoo, “A 50 Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications,” in Proc. of IEEE International Solid-State Circuits Conference, Digest of Technical Papers., vol. 1, pp. 192-592, Feb. 2005. [ 12 ] D. Kim, K. Chung, C. H. Yu, C. H. Kim, I. Lee, J. Bae, Y. J. Kim, J. H. Park, S. Kim, Y. H. Park, N. H. Seong, J. A. Lee, J. Park, S. Oh, S. W. Jeong and L. S. Kim, “An SoC with 1.3 Gtexels/s 3D Graphics Full Pipeline for Consumer Applications,” in Proc. of IEEE Journal Solid-State Circuits, vol. 41, pp. 71-78, Jan. 2006. [ 13 ] ARM Ltd., ARM MBX HR-S 3D Graphics Core Technical Overview, 2002. [ 14 ] J. Montrym and H. Moreton, “The GeForce 6800,” in Proc. of IEEE vol. 25, issue 2, pp. 41 - 51, March-April 2005. [ 15 ] M. Chen, G. Stoll, H. Igehy, K. Proudfft and P. HANRAHAN, “Simple Models of the Impact of Overlap in Bucket Rendering,” in Proc. of 1998 SIGGRAPH/ Eurographics Workshop on Graphics Hardware, pp. 105 - 112, 1998. [ 16 ] M. Cox and N. Bhandari, “Architectural Implications of Hardware-Accelerated Bucket Rendering on the PC,” in Proc. of 1997 SIGGRAPH/Eurographics Workshop on Graphics Hardware, pp. 25 - 34, 1997. [ 17 ] S. Morein, “ATI Radeon HyperZ Technology,” in Proc. of Workshop on Graphics Hardware, Hot3D Proceedings. ACM SIGGRAPH/Eurographics, pp. 58, 61, 65, 78, August 2000. [ 18 ] N. Greene, M. Kass and G. Miller, “Hierarchical Z-Buffer Visibility,” in Proc. of ACM SIGGRAPH, pp. 231-238, 1993. [ 19 ] E. Lindholm, M. J., Kligard and H. Moreton, “A User-Programmable Vertex Engine,” in Proc. of SIGGRAPH 2001, 149–158, July 2001. [ 20] Y. N. Chang, C. H. Tsi, M. C. Tsai and H. L. Lin, “Design of a Low-Cost 3D Graphic Rendering Accelerator,” in the 17th VLSI Design/CAD Symposium, Aug. 2006. [ 21] Y. N. Chang and C. H. Tsi, “Fast Graphic Scan Conversion Module Design,” in the 17th VLSI Design/CAD Symposium, Aug. 2006. [ 22] C. W. Yoon, R. Woo, J. Kook, S. J. Lee, K. Lee, and H. J. Yoo, “An 80/20-Mhz 160-mw Multimedia Processor Integrated with Embedded DRAM, MPEG-4 accelerator, and 3D rendering engine for mobile applications,” in Proc. of IEEE Journal of Solid-State Circuits, vol. 36, no. 11, pp. 1758-1767, Nov. 2001. [ 23] D. Kim et al, “An SOC with 1.3Gtexel/s 3D graphics full pipeline for consumer applications,” in Proc. of the IEEE Journal of Solid-State Circuits, vol. 41, no. 1, pp. 71-78, Jan. 2006. [ 24] C. Mueller, “The Sort-First Rendering Architecture for High-Performance Graphics,” in Proc. 1995 symposium on interactive 3D Graphics, pp. 75-84, Apr. 1995. [ 25] Michael McCool, Mauro Steigleder, “Graphics Accelerators: State of the Art,” http://www.cgl.uwaterloo.ca/Projects/rendering/Talks/StateArt2.ppt. [ 26] Paul S. Heckbert and Henry P. Morton, “Interpolation for Polygon Texture Mapping and Shading, ” in David Rogers and Rae Earnshaw, editors, State of the Art in Computer Graphics: Visualization and Modeling,”, pp. 101-111, 1991. [ 27] Peter Kornerup and David W. Matula, “Single Precision Reciprocals by Multipartite Table Lookup,” in Proc. of the 17th IEEE Symposium on Computer Arithmetic, pp. 240-248, 27-29 June 2005.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.118.37.240 論文開放下載的時間是校外不公開 Your IP address is 18.118.37.240 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS