Responsive image
博碩士論文 etd-0808113-022414 詳細資訊
Title page for etd-0808113-022414
論文名稱
Title
多執行緒SIMD統一圖形處理器的設計與實作
Design and implementation of a multi-thread unified SIMD graphics processor
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
71
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-07-30
繳交日期
Date of Submission
2013-09-08
關鍵字
Keywords
多執行緒、統一圖形處理器、執行緒控制、多重貼圖、填充單元
Multi-threading, Unified GPU, Thread control, Fill unit, Multi-texture
統計
Statistics
本論文已被瀏覽 5694 次,被下載 188
The thesis/dissertation has been browsed 5694 times, has been downloaded 188 times.
中文摘要
針對嵌入式系統對圖形應用的需求,本論文實作了一低成本的單核多執行緒統一圖形處理器單元,採取了在其他嵌入式圖形處理器單元文獻中很少使用的數種架構特色。首先,本論文的圖形處理器單元除了支援基本的頂點和像素著色器的功能,並利用處理器以指令來執行繪圖流程固定功能的模組,包含了裁切、背面剔除和點陣化掃描的功能,採取此方式減少了數十萬的硬體邏輯閘。其次,三個區塊二個存取埠的暫存器檔案架構取代了一般四個存取埠的暫存器檔案架構,可以大幅度的節省暫存器檔案的面積。基於多區塊的暫存器檔案架構,頂點執行緒和像素執行緒會被分配到不同暫存器檔案中。為了配合暫存器的存取埠架構,不同暫存器檔案區塊的執行緒利用分時多工的方式去存取暫存器檔案區塊,輪流執行不同的執行緒。此外,在多執行緒的排程方式上,若執行緒遭遇到暫停的狀況(如貼圖失誤),會被替換至另一個在相同暫存器檔案區塊就緒的執行緒準備執行。由於獨特的交替執行方式,可以在十級管線中減少因為資料相依和分支跳躍造成的損失。本論文並考慮了三角形扇和三角形帶特殊模式下的重複頂點,設計了特殊的頂點填充單元以避免冗餘的頂點重複處理;以及實作了多重貼圖單元以支援多重貼圖的功能。本論文實作的多執行緒統一圖形處理器邏輯閘約204K(不包含記憶體)。
Abstract
This thesis presents a low-cost design and implementation of single-core multi-thread unified graphic processor unit (GPU) targeted for embedded graphics applications. The proposed GPU has adopted several architectural features which have seldom been found in the related embedded GPU literatures. First, in addition to the fundamental vertex and fragment shaders, the proposed GPU also supports the execution of software implementation for those fixed functions including clipping, back-face culling, and rasterization which are mainly used in the middle of graphics rending flow. More than several hundreds of thousands of gates can be saved. Secondly, a three-bank two-port register file architecture has been proposed which can contribute to another big saving of GPU implementation cost by avoiding the use of four-port register file. Based on this multi-bank register file, the vertex and fragment threads will be distributed and associated with different register banks. Different banks of threads will be executed in GPU data-path alternatively in time-multiplexing method. When the execution of a thread encounters a stall due to the texture miss, it will be swapped with another thread in the same bank which is ready to run. Due to the unique alternative execution style, the penalty of both RAW and branch hazards in our 10-stage pipeline GPU can be at most one. To avoid the redundant processing of the same vertex in triangle modes of fan and stripe, a special vertex-fill unit is also implemented. The proposed GPU also realizes the multi-texture function which can support the mapping of many textures. The gate count of the proposed unified multi-threaded graphics processor is approximately 204K (does not include memory).
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
Chapter 1 概論 1
1.1研究動機 1
1.2論文大綱 1
Chapter 2 研究背景與相關研究 3
2.1三維圖學簡介與繪圖管線流程 3
2.1.1固定管線流程-幾何轉換子系統 4
2.1.2固定管線流程-著色子系統 6
2.1.3可程式化管線流程-頂點與像素著色器 7
2.1.4可程式化管線流程-統一著色器 8
2.2統一處理器架構相關文獻 9
2.3統一著色器開發雛型 13
2.3.1軟體實現掃瞄轉換 13
2.3.2後端像素操作模組 14
2.3.3多執行緒統一處理器雛型 18
Chapter 3 多執行緒SIMD統一圖形處理器架構說明與設計 20
3.1指令集架構 20
3.2儲存單元優化與設計 22
3.3管線各階段工作與硬體架構 26
3.3.1指令擷取階段(IF) 27
3.3.2指令解碼(ID)與暫存器檔案存取階段 29
3.3.3暫存器區塊選擇模組(Bank_selector) 32
3.3.4暫存器寫回階段(WB) 33
3.4執行緒控制與資源分配器架構設計 33
3.4.1資源分配器 33
3.4.2執行緒狀態記錄表(thread info.) 35
3.4.3執行緒分派模組(thread dispatch) 37
3.4.4執行緒控制模組 42
3.5子模組與系統架構 43
3.5.1多執行緒統一圖形處理器系統架構 43
3.5.2填充單元(Fill unit)架構設計 45
3.5.3多重貼圖單元(multi-texture unit)設計 46
Chapter 4 實驗結果與分析 49
4.1驗證與除錯環境 49
4.1.1 RTL模擬驗證 49
4.1.2 FPGA模擬驗證 49
4.2實驗結果 50
4.2.1成果展示 50
4.2.2合成數據分析 52
4.2.3模擬器結果分析比較 52
Chapter 5 結論與未來展望 54
5.1結論 54
5.2未來展望 54
參考文獻 56
附錄memory map定址空間 58
參考文獻 References
1. http://www.khronos.org/opengles/2_X/
2. http://en.wikipedia.org/wiki/Phong_shading/
3. http://www.ozone3d.net/tutorials/
4. Tae-Young Kim, Jongho Kim, and Hyunmin Hur, “A Unified Shader Based on the OpenGL ES 2.0 for 3D Mobile Game Development”, in Proc.2nd International Conference, Edutainment 2007, Hong Kong, China, June 2007, pp. 898-903.
5. Theo Ungerer, Borut Robic, and Jurij Silc, “Multithreaded Processor”, British Computer Society, The Computer Journal, Vol. 45, No.3, pp. 320-348, January 2002.
6. James Laudon, Anoop Gupta, and Mark Horowitz, “Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations”, in Proc. 6th International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, U.S.A., October 1994, pp. 308-318.
7. Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, and Hoi-Jun Yoo, “A Low Power Multimedia SoC with Fully Programmable 3D Graphics and MPEG4/H.264/JPEG for Mobile Devices”, in Proc. 2007 IEEE Symposium on Low Power Electronics and Design(ISLPED), Portland, OR, Aug. 2007, pp. 238-243.
8. 林昱呈, “適用於三維繪圖系統之頂點與像素通用著色處理器之硬體架構設計與實現”, 國立臺灣大學電子工程學研究所碩士論文, 2007.
9. 孫亞賢, “低成本多執行緒之單一著色器架構設計”, 國立中山大學資訊工程學系研究所碩士論文, 2011.
10. Kyusik Chung, Donghyun Kim, and Lee-Sup Kim, “A 3-way SIMD Engine for Programmable Triangle Setup in Embedded 3D Graphics Hardware”, in Proc. IEEE International Symposium on Circuits and Systems, vol. 5, Kobe, Japan, pp. 4546- 4549, May 2005.
11. 林仕明, “低成本三維立體圖形呈像引擎設計”, 國立中山大學資訊工程學系研究所碩士論文, 2011.
12. http://www.opengl.org/documentation/specs/version2.0/glspec20.pdf
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code