Responsive image
博碩士論文 etd-0706118-131509 詳細資訊
Title page for etd-0706118-131509
論文名稱
Title
設計基於多重迴圈指令序列之語意分析之動態迴圈展開器
Design of Dynamic Loop Unrolling Mechanism Based on Semantic Analysis of Nested Loop
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
76
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2018-07-24
繳交日期
Date of Submission
2018-08-06
關鍵字
Keywords
多重迴圈語意、多重迴圈展開、迴圈語意、超多純量、迴圈展開、迴圈指令並行度
hyper-scalar, semantic of nested loop, ILP of loop, loop unrolling, semantic of loop, nested loop unrolling
統計
Statistics
本論文已被瀏覽 5634 次,被下載 0
The thesis/dissertation has been browsed 5634 times, has been downloaded 0 times.
中文摘要
現今支援ILP處理器並不具備分析指令,與主動對程式指令進行編排提升並行度之功能,只能依照經編譯器編譯過的指令順序進行指令抓取,再根據指令之間的相依性,盡量將能夠派發的指令向下派發以提升並行度。高效能需求的程式中,無論是影像處理或是目前主流發展的機器學習,其程式皆使用了大量的迴圈結構,因迴圈結構的特殊性,使支援ILP處理器難以提升並行度,需要利用編譯器對程式進行特殊編譯以提升程式並行度,此做法缺乏彈性且無法作用於已經編譯完成的程式。
本論文提出基於多重迴圈指令序列之語意分析之動態迴圈展開器,並將之設計在Hyperscalar架構的指令分析器上,此動態迴圈展開器之架構分為以下三個部分:(1)迴圈偵測單元、(2)展開控制單元、(3)迴圈展開單元。其根據訂定的語意對指令序列進行分析,找出迴圈之指令區間並蒐集資料,通過分析已蒐集的資料、迴圈之間的關係以及當前迴圈展開的情形,將迴圈程式分段展開、消除迴圈結構的特殊性,並在重新編排指令順序後,派送至處理器中的各個核心。
為了驗證加入動態迴圈展開器後對處理器效能的提升,將機器學習中大量使用的的卷積運算、矩陣乘積運算以及當前常被使用的AES中的 mix column,利用通用的
Abstract
In today's ILP processors can't analyze the semantic information of instructions and change instruction series automatically to promote ILP.ILP processors can only fetch instructions sequentially and analyze the dependency between instructions then dispatch the instructions which data are prepared. High performance requirements program such as image processing or machine learning, contains a lot of loop structure. Loop’s particularity caused processors hard to promote ILP. Processors need to use special compiler to compile code, the method is inflexible and cannot be used to promote the code which had already been compiled.
This paper proposed dynamic loop unrolling mechanism based on semantic analysis of nested loop consists of three units in the architecture: loop detect unit (LDU), unrolling control unit (UCU) and loop unrolling unit (LUU). It will parse the semantic of instructions to find the closed interval of the loop body instructions, and collect the instruction’s information in the loop body. Dynamic loop unrolling mechanism cuts the instruction series of nested loop into several segments and unrolls it by the information from LDU and the situation of unrolling, then dispatch the instruction into cores.
The verifications use ARM instructions generated by
目次 Table of Contents
目錄
論文審定書...........................................................................i
致謝......................................................................................ii
摘要.....................................................................................iii
ABSTRACT…………………………………………………………………….……..iv
目錄 ……………………………………………………………………………….......v
圖次..................................................................................viii
第1章 緒論......................................................................1
1.1 研究動機 ......................................................................1
1.2 研究目標 ......................................................................2
1.3 論文架構 ......................................................................3
第2章 相關研究..............................................................4
2.1 超純量處理器..............................................................4
2.1.1 超純量處理器架構介紹.......................................4
2.1.2 亂序執行例外處理...............................................8
2.2 超多純量(Hyperscalar)處理器...................................8
2.2.1 超多純量處理器架構介紹...................................8
2.2.2 指令分析器........................................................10
2.2.3 虛擬共享暫存器檔案.........................................11
2.3 在pipeline及Superscalar處理器中提升迴圈效率....13
2.3.1 Loop Buffers ....................................................13
2.3.2 Loop Cache......................................................14
2.3.3 Branch Prediction............................................14
2.3.4 編寫程式時進行loop unrolling........................14
2.4 在VLIW處理器中提升迴圈執行效率.......................15
2.5 在超多純量架構中提升迴圈執行效率....................15
第3章 在超多純量架構中設計
基於多重迴圈指令序列之語意分析之動態迴圈展開器...17
3.1 基於多重迴圈語意分析之迴圈展開系統架構...........17
3.1.1 系統設計概念.....................................................17
3.1.2 系統架構.............................................................19
3.2 迴圈偵測單元.............................................................24
3.2.1 單層迴圈偵測.....................................................24
3.2.2 多重迴圈判斷.....................................................26
3.2.3 迴圈資料儲存格式.............................................27
3.2.4 迴圈資料儲存格式範例.....................................28
3.2.5 迴圈偵測單元架構............................................29
3.3 展開控制單元...........................................................31
3.3.1 外層迴圈資料處理............................................32
3.3.2 外層迴圈計數器資料補償................................34
3.3.3 迴圈之間指令資料處理....................................37
3.3.4 迴圈展開資料儲存格式....................................38
3.3.5 展開控制單元架構...........................................38
3.4 迴圈展開單元..........................................................40
3.4.1 消除重疊運算相依性........................................41
3.4.2 暫存器重新命名................................................43
3.4.3 迴圈間指令以及外層迴圈計數器資料補償指令處理..44
3.4.4 迴圈補償機制....................................................46
3.4.5 迴圈展開單元架構............................................48
3.5 迴圈展開器概述.......................................................50
第4章 模擬與驗證.......................................................51
4.1 模擬程式流程...........................................................51
4.1.1 模擬程式流程介紹............................................51
4.1.2 測試指令...........................................................53
4.2 模擬結果與分析.......................................................56
4.2.1 模擬結果..........................................................56
4.2.2 模擬結果分析...................................................58
第5章 結論..................................................................62
參考文獻..........................................................................63
參考文獻 References
[1] J. E. Thornton, “Parallel operation in the Control Data 6600,”Proceedings of
Spring Joint Computer Conference, 1964.
[2] D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo, “TheIBM System/360
model 91: Machine philosophy andinstruction-handling,” IBM Journal of
Research and Development, vol. 11, 1967, pp. 8–24.
[3] R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic
Units", IBM Journal of Research and Development, volume 11, issue 1, January
1967, IBM, pp. 25–33
[4] J.E. Smith and A.R. Pleszkun,” Implementing precise interrupts in pipelined processors” IEEE Transactions on Computers ,Volume 37, Issue 5, May 1988 ,pp. 562 - 573
[5] D.S. Su. “Design of the Execution-driven Simulation Environment for Hyper-
scalar Architecture.” Department of Electrical Engineering National Sun Yat-Sen
University, 2008.
[6] J.C. Chiu, Y.L. Chou, P.K. Chen and D.S. Su. “A Unitable Computing Architecture for Chip Multiprocessors.” The Computer Journal, Nov. 2011 Vol. 54, No. 12, pp.2033-2052.
[7] P.K. Chen. “ESL Model of the Hyper-scalar Processor on a Chip.” Department of
Electrical Engineering National Sun Yat-Sen University, 2007.
[8] J.C. Chiu, Y.J. Huang and Y.L. Ye. “Design of the Optimized Group management
Unit by Detecting Thread Parallelism on the Hyperscalar Architecture.” National
Computer Symposium, Dec.2013.
[9] R. S. Bajwa et al., Instruction buffering to reduce power in processors for signal
processing, IEEE VLSI, 1997.
[10] N. Bellas, I. Hajj, C. Polychronopoulos, and G. Stamoulis, “Energy and
Performance Improvements in Microprocessor Design using a Loop Cache,” in
ICCD, 1999
[11] T.Y. Yeh and Y. N. Patt, “Alternative Implementations of Two-Level Adaptive
Branch Prediction,” Department of Electrical Engineering and Computer
Science The University of Michigan.,1992
[12] J. A.Fisher, P. Faraboschi, and C. Young, Embedded Computing, a VLIW
approach to architecture, compilers and tools. Elsevier, 2005
[13] P. Faraboschi, J.A. Fisher and C. Young”Instruction scheduling for instruction
level parallel processors” Proceedings of the IEEE Volume 89, Issue 11, Nov 2001 ,pp. 1638 - 1659
[14] Y. Yang,N. Gu,K. Ren and B. Hu”An Approach to Enhance Loop Performance for Multicluster VLIW DSP Processor” ARCS 2014; 2014 Workshop Proceedings on Architecture of Computing Systems, Feb 2014, pp. 25-28
[15] Z.L. Chen, “Improving ILP with Semantic-Based Loop Unrolling
Mechanism in X86 Architectures”, Department of Computer Science and
Information Engineering National Chiao Tung University,1999
[16] J.C. Huang and T. Leng, ” Generalized loop-unrolling: a method for program
speedup,” in. Proceedings of 1999 IEEE Symposium on Application-Specific
Systems and Software Engineering and Technology, 1999
[17] S. Weiss and J.E. Smith, “A Study of Scalar Compilation Techniques for
Pipelined Supercomputers,” in Proceedings of Second International Conference
on Architecture Support for Programming Languages and Operating Systems,
Palo Alto, CA, Oct. 1987, pp. 105-109
[18] J.C. Chiu , S.J. Chao and Yi-Xuan Lu “Design of Instruction Analyzer
with Semantic-Based Loop Unrolling Mechanism in the Hyperscalar
Architecture.” Department of Electrical Engineering National Sun Yat-Sen
University, 2017.
[19] B. Wang, W. Zheng and Q. Fang, “Weimin Zheng Parallel Task Developing Based on Software Pipeline in Multicore System” International Symposium on Parallel and Distributed Processing with Applicationsm, Sept. 2010, pp. 6-9
[20] E. Rotenberg, S. Bennett, and J.E. Smith, “Trace cache: a low latency approach to high bandwidth instruction fetching,” in MICRO-29.Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996, pp. 24 –34
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code