國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,在向量架構上使用同質暫存簇的高效能暫存分配器 ,A High Performance Register Allocator for Vector Architectures with a Unified Register-Set

論文名稱 Title	在向量架構上使用同質暫存簇的高效能暫存分配器 A High Performance Register Allocator for Vector Architectures with a Unified Register-Set
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	43
研究生 Author	蘇鈺登 Yu-Dan Su
指導教授 Advisor	希家史提夫 Steve W. Haga
召集委員 Convenor	蕭勝夫 Shen-Fu Hsiao
口試委員 Advisory Committee	李宗南, 黃宗傳 Chung-Nan Lee; Tsung-Chuan Huang
口試日期 Date of Exam	2012-02-14	繳交日期 Date of Submission	2012-06-29
關鍵字 Keywords	編譯器最佳化、單一暫存簇、暫存器分配、指令排程、新式GPU、向量架構 instruction scheduling, register allocator, compiler optimization, unified register set, vector architecture, novel Graphics Processing Unit
統計 Statistics	本論文已被瀏覽 5649 次，被下載 759 次 The thesis/dissertation has been browsed 5649 times, has been downloaded 759 times.

中文摘要
本論文描述的編譯器最佳化目標是一個向量基底的單一屬性暫存簇。此最佳化結合了暫存器分配與指令排程。它會在程式執行時去檢驗有純量變數出現的地方，並對其盡可能的最佳化。我們的目標，是將具有相似運算的指令給打包起來，並讓我們的最佳化分配器對它進行優化。儘管其他研究者也有再進行相似的打包方法，但它們大部分研究都被侷限在硬體上面，硬體通常都會將大量時間耗費在純量暫存器與向量暫存器間的資料搬移。而本篇論文與他們不同的是，我們針對新式硬體架構，不需要在不同屬性的暫存器間做資料搬移，更可以利用硬體特性讓一些純量變數並行運算。因此，我們才能夠取得顯著的加速。最後，我們所考慮的硬體架構，是正在中山大學開發的GPU嵌入式系統。而此GPU架構裡只有單一暫存簇，並可使用此單一暫存簇來對整數、浮點數、向量，來進行儲存及計算。
Abstract
This thesis describes a compiler optimization targeted for machines with unified, vector-based register sets. This optimization combines register allocation and instruction scheduling. It examines places where the code performs computations on scalar variables. The goal is to identify instances where the same operation is performed. For example, a program might calculate “base+offset” and then calculate “i+j”. Even though these computations are unrelated, yet they use the same operator; if “base” and “i” are packed into one vector register, while “offset” and “j” are packed into another, then these two computations can be performed simultaneously through the vectors’ parallel addition operation. This would reduce the execution time of the compiled code. Although other researchers have considered similar packing methods, their work has been limited by the hardware that they were studying. Such hardware usually imposed high costs for moving data between scalar and vector register banks. This present thesis, however, considers a novel hardware architecture that imposes no such costs. As a consequence, we are able to obtain significant speedups. The architecture that we consider is a Graphics Processing Unit (GPU) for embedded systems that is under development at this university. This GPU has a single register set for integers, float, and vectors.

目次 Table of Contents
論文審定書…………………………………… i 摘要 …………………………………….……. ii Abstract ...............…………………………….iii Index ……………………………………….….v 1. Introduction 1 2. Basic Concepts 8 2.1 Concepts of Compiler 8 2.1.1 SSA-Form 8 2.1.2 Live Variables 9 2.1.3 Trace Scheduling 9 2.2 How Novel Features in Our GPU Affect the Register Allocator 11 3. Related Work 14 4. Implementation 18 4.1 Machine Code Representation Rewriting 20 4.2 Scheduling and Register Allocation Algorithm 23 5. Experimental Results 29 7. Reference 33 8. Appendix 34

參考文獻 References
[1] K. C. Lu and S. Haga. “Compiler Development to Support OpenGL 2.0 ES on a Novel 3D Graphics Processor,” Masters Thesis: National Sun Yat-Sen University, August 2010. [2] K. A. Huang and S. Haga. “Compiler Support for Vector Processing on OpenGL ES 2.0 Programs,” Masters Thesis: National Sun Yat-Sen University, August 2010. [3] S. C. Tseng and S. Haga. “Compiler/Hardware Codesign and Memory Management for a Novel 3D Graphics Processor,” Masters Thesis: National Sun Yat-Sen University, August 2010. [4] The LLVM Compiler. Website: http://llvm.org [5] Donald E. Knuth,“The Art of Assembly Language, Volume4”, Addison-Wesley, 2006 [6] A. Aho, M. Lam, R. Sethi and J. Ullman, “Compilers: Principles, Techniques and Tools(2nd Ed)”, Pearson Addison Wesley, Hong Kong, 2006. [7] C. Lattner. “LLVM for OpenGL and other stuff.” LLVM Designers Conference, May 2007 [8] N. Sreraman and R. Govindarajan. “A Vectorizing Compiler for Multimedia Extensions,” The International Journal of Parallel Programming. Vol. 28, No 4, 2000. [9] H. Chang, and W. Sung, ”Efficient Vectorization of SIMD Programs with Non- aligned and Irregular Data Access Hardware”, in CASES ’08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, 2008.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0629112-150235.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS