Responsive image
博碩士論文 etd-0723115-165419 詳細資訊
Title page for etd-0723115-165419
論文名稱
Title
超多純量架構中的指令分析器設計
Design Instruction Analyzer in the Hyper-scalar Architecture
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
69
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-08-17
繳交日期
Date of Submission
2015-08-24
關鍵字
Keywords
指令分析器、虛擬共享暫存器、超多純量、指令並行度
ILP, virtual shared register file, hyper-scalar, instruction analyzer
統計
Statistics
本論文已被瀏覽 5719 次,被下載 28
The thesis/dissertation has been browsed 5719 times, has been downloaded 28 times.
中文摘要
以超多純量(Hyper-scalar)微處理器系統架構執行同一執行緒時,會因指令間的相依性,因而虛擬共享暫存器的核心間之資料交互傳輸的次數頻繁,導致執行因資料交通的延遲造成效能不彰。故本論文之目的為提出一指令分析器以解決因指令間的相依性所導致的問題,盡可能的將具有相依性之指令派發至同一核心,大量減少核心間之資料交互傳輸次數,以增進整體結構之運算效能。
指令分析器在指令抓取之初依據相依性進行分析,再派發至最適當的核心,依序分為四個階段:一、指令抓取(Instruction Fetch):配合跳躍預測器(Branch Predictor)同時抓取四道指令,以提升指令並行度;二、暫存器標籤(Register Tag):根據相依性產生運算元標籤與條件標籤,並依據兩標籤結果決定最適當的目的暫存器標籤;三、相依性分析器(Dependence Analyzer):由暫存器標籤生成核心標籤,用以決定指令派發之核心;四、派發(Dispatch):根據核心標籤產生週期標籤,代表指令之派發時機,並將結果記錄於延遲表(Defer Table),派發為指令分析器之核心階段,必需有一PC偵測器(PC Detector)用以判斷是否正確抓取指令,當發生指令抓取錯誤時,有ㄧ補償機制將PC導向正確指令位址。
而在效能評估方面,利用程式驗證此架構是否能有效的派發指令,降低核心要求運算元之次數,模擬結果顯示,在跳躍預測器正確率達83%的條件之下,加入指令分析器後核心經由虛擬共享暫存器要求運算元次數降為原本的二分之一。故本論文實現之指令分析器,可有效的增加核心使用率且降低核心請求運算元之次數。
Abstract
When the Hyper-scalar microprocessor system architecture performs the same thread, it cause the delay of data transmit and reduce the performance due to the dependence between instructions which result in a frequently data interact between cores in the Virtual Shared Register File (VSRF) transmission. Therefore, we propose Instruction Analyzer to solve the problem of dependence between instructions. When an instruction depends on another instruction, both of the instructions would be issued to the same core as far as possible. In order to improve the whole architecture performance, the number of data interaction between cores will be substantially reduced in the VSRF
Before being issued to the appropriate core, instructions must be analyzed according to dependence by Instruction Analyzer. There are four stages in the whole procedure. First, Instruction Fetch: In order to improve the parallelism of instruction level, it will cooperate with Branch Predictor and fetch four instructions at the same time in this stage. Second, Register Tag: Operand tags and conditional tags will be generated according to the dependence between instructions. Register destination tag will be determined according to the most appropriate result of the operand tags and the conditional tags. Third, Dependence Analyzer: Core tags will be generated according to the register tags and decide the core which the instruction will be issued to. Fourth, Dispatch: Cycle tags will be generated according to the core tags and decide when the instruction must be issued. The result of cycle tags will be recorded in the Defer Table. This stage is the most important part of Instruction Analyzer. There must be a PC Detector that judge whether Instruction Analyzer fetch the correct instructions. When the Instruction Analyzer fetches wrong instructions, a compensation mechanism would direct the PC to correct the instruction address.
We verify whether this architecture could efficiently issue instructions by testing programs and reduce the number of data interaction between cores in the VSRF. According to the simulation result, the number of data interaction between cores in the VSRF reduces to half after Instruction Analyzer is applied. Therefore, we implement Instruction Analyzer that not only raise the core usage but also reduce the number of data interaction between cores in the VSRF.
目次 Table of Contents
論文審定書 i
論文公開授權書 ii
致謝 iii
摘要 iv
Abstract v
目錄 vii
圖次 x
表次 xii
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 1
1.3 論文架構 2
第二章 背景知識與相關研究 3
2.1 單一核心架構介紹 3
2.2 分散式指令集架構 5
2.2.1 分散式指令集架構介紹 5
2.2.2 指令運作範例 6
2.3 雙階層適應性跳躍預測器 7
2.4 ARM指令格式解析 8
2.4.1 Data Processing Instructions 8
2.4.2 Load and Store Instructions 9
2.4.3 Load and Store Multiple Instructions 10
2.4.4 Branch Instructions 11
2.5 超多純量(Hyper-scalar)架構介紹 11
2.5.1 虛擬共享暫存器檔案 14
2.5.2 Register data flow的處理 17
2.5.3 Memory data flow的處理 19
2.5.4 Instruction flow的處理 20
第三章 指令分析器之設計 23
3.1 指令分析器架構 23
3.1.1 指令抓取機制 24
3.1.2 暫存器標籤機制 27
3.1.3 相依性分析機制 29
3.1.4 派發 31
3.2 指令標籤流程 35
3.3 跳躍指令預測錯誤之補償機制 43
3.3.1 Dispatch中的跳躍指令 43
3.3.2 補償機制 44
第四章 模擬與驗證 47
4.1 奇數和偶數和程式運作範例 47
4.2 矩陣相乘 48
4.3 結果分析與總結 50
第五章 結論 51
參考文獻 53
參考文獻 References
[1] L Wang, CL Wu, “Distributed Instruction Set Computer Architecture,” IEEE Transactions on Computers,1991, vol.40, pp.915-934.
[2] Tse-Yu Yeh, Yale N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction,” 1992, Department of Electrical Engineering and Computer Science The University of Michigan.
[3] ARM Architecture Reference Manual :
https://www.scss.tcd.ie/~waldroj/3d1/arm_arm.pdf
[4] Ding-Siang Su, “Design of the Execution-driven Simulation Environment for Hyper-scalar Architecture,” 2008, Department of Electrical Engineering National Sun Yat-Sen University.
[5] Jih-Ching Chiu, Yu-Liang Chou, Po-Kai Chen and Ding-Siang Su, “A Unitable Computing Architecture for Chip Multiprocessors,” The Computer Journal, Vol. 54, No. 12, pp. 2033-2052, Nov. 2011.
[6] Po-Kai Chen, “ESL Model of the Hyper-scalar Processor on a Chip,” 2007, Department of Electrical Engineering National Sun Yat-Sen University.
[7] Jih-ching Chiu, Yin_Jou Huang, Yi-Lin Ye, “Design of the Optimized Group management Unit by Detecting Thread Parallelism on the Hyperscalar Architecture,” 2013 National Computer Symposium, Dec.2013.
[8] Yu-Ren Lai, “Design of the Superscalar Dual-Core Architecture using Single-Issue Out-of-Order Instruction Pipe for Embedded System,” 2009, Department of Electrical Engineering National Sun Yat-Sen University.
[9] Jih-Ching Chiu,Kai-Ming Yang,Yu-Liang Chou,Chih-Kang Wu, “A relation-exchanging buffering mechanism for instruction and data streaming,” Computers & Electrical Engineering, Vol. 39, No. 4, pp. 1129-1141, May 2013.
[10] Kai-ming Yang, Kin-fong Lei, Jih-ching Chiu, “Design of an Asynchronous Ring Bus Architecture for Multi-Core Systems,” 2010 International Computer Symposium (ICS), pp. 682-687, Dec. 2010.
[11] JC Chiu, YL Chou, PK Chen, “A Superscalar Dual-Core Architecture for ARM ISA,” Proceedings of the International Computer Symposium 2006, pp. 21-26, Dec. 2006.
[12] Congy J.; Hany G.; Jagannathan A.; Reinmany G.; Rutkowski K.; “Accelerating Sequential Applications on CMPs Using Core Spilling,” IEEE Transactions On Parallel and Distributed Systems : Accepted for future publication, 2007.
[13] B. A. Nayfeh and K. Olukotun, "A single-chip multiprocessor," Computer, IEEE, vol. 30, pp. 79-85, 1997.
[14] B Lee, AR Hurson, “Dataflow architectures and multithreading,” Computer Volume 27, Issue 8, Aug. 1994 Page(s):27 – 39.
[15] A. E.-Moursy, R. Garg, D. H. Albonesi, and S. Dwarkadas, “Partitioning multi-threaded processors with a large number of threads,” in Proceedings of the International Symposium on Performance Analysis of Systems and Software, 2005, pp. 112-123.
[16] ST Srinivasan, H Akkary, T Holman, K Lai., “A minimal dual-core speculative multi-threading architecture,” Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference 2004, pp. 360-367.
[17] Jih-Ching Chiu, Kai-Ming Yang and Yu-Liang Chou, “A hyperscalar dual-core architecture for embedded systems,” Microprocessors and Microsystems, Vol. 37, No. 8, No. B, pp. 929–940, Nov. 2013.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code