Responsive image
博碩士論文 etd-0710116-140815 詳細資訊
Title page for etd-0710116-140815
論文名稱
Title
多重精確度貼圖單元之高效率管線化架構
An Efficient Pipelined Architecture for the Multi-precision Texture Unit
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
72
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-07-25
繳交日期
Date of Submission
2016-08-10
關鍵字
Keywords
管線化架構、多重精確度、貼圖單元、低功率設計、三維圖形處理器
multi-precision, pipelined architecture, texture unit, low-power design, 3-D graphics processing unit
統計
Statistics
本論文已被瀏覽 5657 次,被下載 19
The thesis/dissertation has been browsed 5657 times, has been downloaded 19 times.
中文摘要
隨著科技的日新月異與三維繪圖處理器技術的成熟,三維繪圖處理器已經被廣泛的應用在穿戴式裝置,而穿戴式裝置對於繪圖處理器的繪圖能力需求也越來越高。但這些複雜的運算將消耗大量電力,因此如何在電池容量有限的穿戴式裝置節省功率的消耗就成為一個重要的議題。在三維繪圖處理器中,貼圖單元扮演一個不可或缺的角色。貼圖單元可以讓影像看起來更加細緻,並且可以利用貼圖來代替複雜的計算,以提升整體處理器的效能與影像的畫質。人類的眼睛可以允許影像有些許失真而無法察覺差異,所以我們引進多重精確度的概念,在允許影像有些許失真的情況下,執行我們選定的精確度模式,達到節省功率的效果。另外,我們可以觀察到貼圖單元需要進行很多複雜的運算,而這些運算可以利用多個內積運算指令(DP4)來完成,因此我們設計一個供貼圖單元使用的內積運算單元(DP4),並且透過貼圖單元的管線化架構來增加效能。 
  本論文提出一個多重精確度貼圖單元的管線化架構,此架構可以提供多種精確度模式給使用者選擇,達到降低功率消耗的目標。此外,我們也在貼圖單元中加入快取機制以增加效能,並且重複執行內積運算單元來實現線性過濾功能,用以降低硬體的面積。最後我們將貼圖單元的運算進行管線化排程,並且設計對應的管線化架構,大大地提升貼圖單元的整體效能。
Abstract
As technology advances and the fully developed technology for 3-D graphics processing units, it has been widely applied in wearable devices. For wearable devices, the performance requirement of graphics processor unit performance is increasing, Unfortunately, these complex operations will consume a lot of power. Therefore, how to reduce the power consumption within the limited power of wearable devices has become an important issue. The texture unit is an indispensable part in the 3-D graphics processing units. The texture unit can make the image more details and high quality. In addition, it can use texture mapping to substitute complex computations, so that the overall performance and image quality of 3-D graphics processing units can be enhanced. Human eyes can’t clearly recognize a slight distortion of 3-D images. As a result, power savings can be achieved by selecting lower precisions mode when a little image distortion is acceptable. Besides, there is many complex computations in texture unit. These computations can be implemented by several dot product (DP4) instructions. Therefore, we design a dot product unit for texture unit to enhance the performance by a pipelined architecture.
This thesis proposes a pipelined architecture for multi-precision texture unit. The architecture can change the computation precision to save power consumption. In addition, we add cache mechanism in texture unit to increase performance, and the dot product arithmetic unit is executed repeatedly to implement linear filter. As a result, the area of hardware circuit can be decreased. Finally, we perform pipelined scheduling for texture unit, and design the corresponding pipelined architecture that greatly enhances overall performance.
目次 Table of Contents
論文提要 i
摘要 ii
Abstract iii
目錄 v
圖目錄 vii
表目錄 x
第一章 緒論 1
1.1 研究動機 1
1.2 論文大綱 2
第二章 研究背景 3
2.1 三維繪圖介紹 3
2.2 OPENGL ES簡介 6
2.3 ATTILA 3D繪圖模擬器 8
2.4 運算單元 11
2.4.1 布斯編碼器 11
2.4.2 壓縮樹 15
2.4.3 布斯乘法器 17
2.5 貼圖單元 18
2.5.1 Texture mapping 18
2.5.2 Mipmap 19
2.5.3 Texture Filter 21
第三章 多重精確度貼圖單元之管線化架構 24
3.1 貼圖單元流程介紹 24
3.2 基礎貼圖單元架構 25
3.2.1 Calculate Address 27
3.2.2 Fetch and Read 28
3.2.3 Filter 29
3.3 提出的貼圖單元架構 33
3.3.1 快取機制 35
3.3.2 內積運算單元 36
3.3.3 Multi-precision 39
3.3.4 管線化排程 40
3.3.5 Controller 42
第四章 實驗方法與成果 43
4.1 實驗步驟與方法 43
4.2 實驗結果 46
第五章 結論與未來研究方向 58
5.1 結論 58
5.2 未來研究方向 58
參考文獻 59
參考文獻 References
[1] 楊政峰,“具可變精確度運算模式之多執行緒統一著色器,”國立中山大學, 碩士論文, July 2015.
[2] Khronos Group: http://www.khronos.org/
[3] Woo-Young Kim and Bo-Haeng Lee, Kwang-Yeob Lee, “Design of a Fully Programmable Shader Processor for Low Power Mobile Devices,” Institute of Electrical and Electronics Engineers, 2009.
[4] ATTILA: http://attila.ac.upc.edu/wiki/index.php/Main_Page
[5] Victor Moya del Barrio, Carlos González, Jordi Roca, Agustín Fernández, and Roger Espasa, “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231-241, March 2006.
[6] ARB Vertex Program Extension:
http://oss.sgi.com/projects/ogl-sample/registry/ARB/vertex_program.txt
[7] ARB Fragment Program Extension:
http://oss.sgi.com/projects/ogl-sample/registry/ARB/fragment_program.txt
[8] Zhijun Huang, “High-level optimization techniques for low-power multiplier design,” PhD dissertation, Univ. of California, Los Angeles, 2003.
[9] D. Radhakrishman and A. P. Preethy, “Low-power CMOS pass logic 4-2 compressor for high-speed multiplication,” 43rd IEEE Midwest Symp. on Circuits & Systems, Vol. 3, pp. 1296-1298, 2000.
[10] P. Mokrian, G. Howard, G. Jullien, and M. ahmadi, “On the use of 4:2 compressor for partial product reduction,” IEEE Canadian Conf. on Electrical and Computer Engineering, Vol. 1, pp. 121-124, May, 2003.
[11] D. Villeger and V. Oklobdzija, “Analysis of Booth Encoding Efficiency in Parallel Multipliers Using Compressors for Reduction of Partial Products,” 27th Ann. Asilomar Conf. on Signals, Systems, and Computers, Vol. 1, pp. 781-784, 1993.
[12] Victor Moya del Barrio, Carlos González, Jordi Roca, Agustín Fernández, and Roger Espasa, “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231-241, March 2006
[13] 張尹貞,“多重精確度貼圖單元的設計與實作,”國立中山大學, 碩士論文, July 2015.
[14] Yusra A. Y. Al-Najjar and Der Chen Soong, “Comparison of Image Quality Assessment: PSNR, HVS, SSIM, UIQI,” International Journal of Scientific & Engineering Research, Vol. 3, No. 8, 041-045,August 2012.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code