Responsive image
博碩士論文 etd-0703115-172907 詳細資訊
Title page for etd-0703115-172907
論文名稱
Title
具可變精確度運算模式之多執行緒統一著色器
A Multi-thread Unified Shader with Variable Precision Modes
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
73
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-07-27
繳交日期
Date of Submission
2015-08-10
關鍵字
Keywords
指令集、多執行緒、三維統一圖形處理器、多重精確度模式、低功率設計
Multi-thread, 3-D Graphics Processing Unit, Instruction Set, Multi-precision Mode, Low-power Design
統計
Statistics
本論文已被瀏覽 5659 次,被下載 22
The thesis/dissertation has been browsed 5659 times, has been downloaded 22 times.
中文摘要
隨著科技的進步,功能整合及運算效能的要求越來越高,產品也朝著移動式及短小輕薄的方向發展,因此電池的電量就成為設計產品時的重要考量之一。而在繪圖影像方面,使用者也逐漸趨向於追求真實度,這使得處理圖形時,需要更大量的計算。因此加速資料的計算成為解決問題的方法之一,但伴隨而來的就是所消耗的功率也會跟著增大,因此產品的開發開始注重低功耗的設計。
本論文實作一個具有可變精確度運算模式之多執行緒統一圖形處理單元,主要使用ATTILA模擬器來協助開發,除了將傳統的頂點處理器及像素處理器結合之外,多執行緒則提供了八個執行緒來提升資料的生產量及使用率。為了減少功率消耗,調整了輸入的精確度,將向量運算單元及特殊函數運算單元都設計為四種精確度模式,並在算數處理單元做了低功耗的設計。此外,加入特殊的電路設計來減少資料切換,進而達到省電的效果。本論文使用clock gating的方式減少了clock power的消耗,以及其後所連接硬體的切換動作都跟著減少了。最終輸出的資料所呈現的影像誤差以人眼可以接受的程度為目標。
Abstract
Following by the progressive technologies, the most important thing that people need to care about is power consumption due to the products toward portable and lightweight devices. Every day, people use electronic products and cannot separate their life without them. For the image processing appilcations on electronic products, human gradually pay attention to the image quality. In other words, it has to perform more computations. Therefore, improving the computation speed of electronic product becomes one way to solve this problem. However, higher computation ability will consume more power. Hence the designers begins to focus on low-power design.
In this thesis, we present a unified shader which combines variable precision modes and multi-thread based on ATTILA simulator. This shader combines vertex shader and pixel shader into one unified shader, and provides eight shaders to upgrade the throughput and the utility rate of data. Moreover, we integrate four precision modes into the vector arithmetic unit and the special function arithmetic unit to reduce the power consumption by adjusting the precision of the input and output operands. In addition, clock gating technique is applied to reduce the clock power of register files and pipeline registers, and the switching activities of these following hardware circuits. Following those steps, significant power saving can be achieved in the proposed multi-precision unified shader under the acceptable image distortion.
目次 Table of Contents
目錄
論文審定書 i
論文提要 ii
誌謝 iii
摘要 iv
Abstract v
目錄 vi
圖目錄 viii
表目錄 xi
第一章 緒論 1
1.1 研究動機 1
1.2 論文大綱 2
第二章 研究背景 3
2.1 三維繪圖簡介 3
2.1.1 三維繪圖管線流程(3D Rendering Pipeline) 3
2.1.2 座標轉換(Transformation Matrix) 5
2.1.3 Lighting 6
2.2 OpenGL ES簡介 7
2.3 ATTILA 3D繪圖模擬器 10
第三章 提出的統一著色器架構說明與設計 14
3.1 整體架構概要 15
3.2 Register files 16
3.3 指令集 18
3.3.1 指令格式 18
3.3.2 指令種類 23
3.3.3 Banks 24
3.4 管線各階段工作與硬體架構 25
3.4.1 指令擷取階段(instruction fetch) 26
3.4.2 執行緒單元架構 27
3.4.3 指令解碼階段(instruction decode) 28
3.4.4 Negative/Swizzle架構 29
3.4.5 Forwarding detection policy 30
3.4.6 Forwarding 架構 30
3.4.7 暫存器寫回階段(write back) 31
3.4.8 執行緒控制模組 32
3.4.9 執行緒狀態記錄表(thread_info) 33
3.4.10 RAW detection 34
3.5 運算單元 35
3.5.1 DP4 Architecture 36
3.5.2 Special Function Unit Architecture 38
3.6 Register file gating 39
3.7 Texture Unit 41
第四章 實驗結果 43
4.1 實驗方法與步驟 43
4.2 實驗數據 43
第五章 結論與未來研究方向 58
5.1結論 58
5.2未來研究方向 58
參考文獻 59
參考文獻 References
[1] Woo-Young Kim, Bo-Haeng Lee, Kwang-Yeob Lee, “Design of a Fully Programmable Shader Processor for Low Power Mobile Devices,” Institute of Electrical and Electronics Engineers, 2009.
[2] Ju-Ho Sohn, Ramchan Woo, Hoi-Jun Yoo, “A Programmable Vertex Shader with Fixed-Point SIMD Datapath for Low Power Wireless Applications,” The Eurographics Association 2004 Graphics Hardware, 2004.
[3] Erik Lindholm, Mark J Kilgard, Henry Moreton, “A User-Programmable Vertex Engine,” Association for Computing Machinery , 2001.
[4] Khronos Group: http://www.khronos.org/
[5] ATTILA: http://attila.ac.upc.edu/wiki/index.php/Main_Page
[6] Victor Moya del Barrio, Carlos González, Jordi Roca, Agustín Fernández, and Roger Espasa, “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231-241, March 2006.
[7] ARB Vertex Program Extension:
http://oss.sgi.com/projects/ogl-sample/registry/ARB/vertex_program.txt
[8] ARB Fragment Program Extension:
http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.txt
[9] 閻懷玉,“使用時脈閘控之三維頂點處理器功率最佳化,”國立中山大學, 碩士論文, July 2008.
[10] 李鈺珊, “應用於三維繪圖處理器之低功率指令模式調整系統,” 國立中山大學, 碩士論文, July 2014.
[11] Chang-Hyo Yu, Donghyun Kim and Lee-Sup Kim, “A 33.2Mvertices/sec Programmable Geometry Engine for Multimedia Embedded Systems,” IEEE Custom Intergrated Circuits Conference, 2008.
[12] Chang-Hyo Yu, Donghyun Kim and Lee-Sup Kim, “A 33.2Mvertices/sec Programmable Geometry Engine for Multimedia Embedded Systems,” Institute of Electrical and Electronics Engineers, 2005
[13] 徐肇謚, “多執行緒SIMD統一圖形處理器的設計與實作,” 國立中山大學, 碩士論文, July 2013
[14] Jeong-Ho Woo, Student Member, Ju-Ho Sohn, Associate Member, Hyejung Kim, Student Member,and Hoi-Jun Yoo, Fellow, “A 152-mW Mobile Multimedia SoC With Fully Programmable 3-D Graphics and MPEG4/H.264/JPEG,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, NO. 9, September 2009.
[15] Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong1, Euljoo Jeong, Suk-Joong Lee and Hoi-Jun Yoo, “A 195mW, 9.1MVertices/s Fully Programmable 3D Graphics Processor for Low Power Mobile Devices,” IEEE Journal of Solid-State Circuits, vol. 43, NO. 11, November 2008.
[16] 黃冠潣, “同時支援浮點和定點格式運算之可程式化頂點處理器設計、實作與驗證,” 國立中山大學, 碩士論文, July 2009.
[17] Chang-Hyo Yu, Kyusik Chung, Donghyun Kim Seok-Hoon Kim, Lee-Sup Kim, “A 186-Mvertices/s 161-mW Floating-Point Vertex Processor With Optimized Datapath and Vertex Caches,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, NO. 10, October 2009.
[18] Yusra A. Y. Al-Najjar, and Dr. Der Chen Soong, “Comparison of Image Quality Assessment: PSNR, HVS, SSIM, UIQI,” International Journal of Scientific & Engineering Research, Vol. 3, No. 8, August 2012.
[19] 徐立緯, “可執行特殊函數與浮點乘加運算之可變精確度架構,” 國立中山大學, 碩士論文, July 2015.
[20] 張尹貞, “多重精確度貼圖單元的設計與實作,” 國立中山大學, 碩士論文, July 2015.
[21] 張銘峰, “具SIMD架構之多功能多重精確度四維內積運算單元,” 國立中山大學, 碩士論文, July 2015.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code