國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,實現適用於超多純量架構之核心單元,Implementation of Core Element for Hyper-scalar Architecture

論文名稱 Title	實現適用於超多純量架構之核心單元 Implementation of Core Element for Hyper-scalar Architecture
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	104 學年度第 2 學期 The spring semester of Academic Year 104	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	83
研究生 Author	賴郁龍 Yu-Long Lai
指導教授 Advisor	邱日清 Jih-Ching Chiu
召集委員 Convenor	鄺獻榮 Shiann-Rong Kuang
口試委員 Advisory Committee	謝東佑, 楊凱名 Tong-Yu Hsieh; Kai-Ming Yang
口試日期 Date of Exam	2016-07-26	繳交日期 Date of Submission	2016-08-19
關鍵字 Keywords	非對稱性多核心處理器、超多純量、可重新組態、亂序執行、依序完成 out-of-order, reconfigurable, hyper-scalar, in-order commit, Asymmetric multicore processors
統計 Statistics	本論文已被瀏覽 5653 次，被下載 132 次 The thesis/dissertation has been browsed 5653 times, has been downloaded 132 times.

中文摘要
為應付現今多變的程式與不同性質的工作量，多核心處理器的設計已是目前一大課題，而非對稱性多核心處理器(Asymmetric Multicore Processors, AMPs)即是因應這些需求而產生的設計。超多純量(Hyper-scalar)微處理器系統架構，為非常具有彈性應用的非對稱性多核心處理器架構，該架構在執行同一執行緒程式時，因指令分析器的指令間相依性分析，核心間可經由虛擬共享暫存器進行資料交互傳輸以達到多核心共同加速單一執行緒效能。在超多純量架構下，處理器核心為了同時作到可單獨執行單一執行緒，與多核心間共同加速單一執行緒，提出可亂序執行並依序寫回的處理器核心單元架構。本論文為實現適用於超多純量架構之處理器核心單元，此核心單元加入了兩個資料處理階段，使核心具有資料處理中心可支援亂序執行依序寫回的功能，同時考量記憶體讀寫之順序及跳躍指令的調整已符合程式的運作。處理器單元為適用於超多純量處理器的架構，依照指令運行機制主可分為三個部分討論並實現：一、指令流：增加資料處理中心與指令序號表、暫存器來源資料表的指令資訊交流，達到可亂序執行指令，依序完成指令之功能需求；二、記憶體存取規則：透過記憶體單元的設計，實現記憶體讀寫指令在亂序執行條件下依然保持正確的順序；三、跳躍指令依序執行：在亂序執行條件下，跳躍指令需保存運算結果，直到依序完成執行時才可進行跳躍指令。論文最後我們利用撰寫Verilog與FPGA合成電路評估模擬與測試，確定此核心單元在執行指令時可適於超多純量架構所要求的指令流控制、記憶體讀寫管理及跳躍指令的正確執行，未來可以此核心單元建構出雙核心、四核心等多核心超多純量處理器架構。
Abstract
To meet the needs of a diverse range of programs and workloads, the design of multi-core processors is a major issue. Asymmetric multi-core processors (AMPs) design have been generated to the demand. The hyper-scalar microprocessor system architecture, which a kind of asymmetric multi-core processors with highly flexible applications. To achieve the increase of single thread performance in multi-core jointly execute, this architecture allows Virtual Shared Register Files (VSRF) to exchange data information within cores with the analysis of Instruction Analyzer (IA) when performs the same thread programs. In order to perform executing single-thread programs independently and accelerating single-thread programs within cores simultaneously, the hyper-scalar architecture propose core elements with out-of-order execution and in-order commit. This paper propose to implement a core element for hyper-scalar architecture. We puts in two data processing stage so that the core element support out-of-order execution and in-order commit with data processing unit. In addition, we benefit the programs with both considering the order of memory access and the adjustment of branch instruction. To applicable the hyper-scalar processor architecture, we discuss and implement the instruction operation mechanism, which can be section into three: First, Instruction Flow: To get up to the demands of out-of-order execution and in-order commit instructions, we add the data processing unit, instruction sequence table and register source table to the pipeline. Second, Memory Access Flow: Implementation of maintaining the correct order of memory access under the out-of-order execution by design of the memory unit. Third, Branch Ordered Flow: The computation of branch instructions should be saved until the instruction is completed in turn under the out-of-order execution. Finally we evaluate the result of simulation and testing through Verilog language and FPGA circuit synthesis. The result prove that the core element is applicable for the hyper-scalar claims of instruction flow control, memory access management and correct branch instruction execution. We can construct the two-core, four-core, or even multi-core hyper-scalar processors by the core element in the future.

目次 Table of Contents
論文審定書 i 論文公開授權書 ii 致謝 iii 摘要 iv Abstract v 目錄 vii 圖次 xi 表次 xiv 第一章緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 論文架構 3 第二章背景知識與相關研究 4 2.1 現今非對稱多核心處理器(AMPs)設計分類 4 2.2 WiDGET(Wisconsin Decoupled Grid Execution Tiles) 6 2.2.1 Pipeline Stage 7 2.2.2 Front-end 7 2.2.3 Execution Unit 9 2.3 Bahurupi 9 2.3.1 Bahurupi Architecture 10 2.3.2 Bahurupi Execution model 11 2.3.3 Bahurupi Architecture details 13 2.4 超多純量(Hyper-scalar)架構介紹 15 2.4.1 指令分析器 18 2.4.2 虛擬共享暫存器檔案 20 2.5 可動態分配核心處理器架構特色功能整理 22 第三章實現適用於超多純量架構之核心單元 24 3.1 超多純量處理器核心單元指令運行機制 25 3.1.1 Instruction Flow 26 3.1.2 Memory Access Flow 30 3.1.3 Branch Ordered Flow 30 3.2 Instruction Flow之實現 31 3.2.1 Pre-decoder 與 Fetch stage 31 3.2.2 Decoder stage 32 3.2.3 Data stage 34 3.2.4 Execution stage 36 3.2.5 Commit stage 38 3.2.6 Writeback stage 39 3.2.7 暫存器來源資料表(Register Source Table) 39 3.2.8 指令序號表(Sequence table) 41 3.3 Memory Access Flow之實現 42 3.3.1 Memory stage 42 3.3.2 Decoder stage、Execution stage、指令序號表、Commit stage 43 3.4 Branch Ordered Flow之實現 44 3.4.1 Commit跳躍指令執行 45 3.4.2 Pre-decoder與Fetch跳躍指令執行 46 3.5 核心單元整體架構 46 3.5.1 Fetch stage 47 3.5.2 Decoder stage 47 3.5.3 Data stage 48 3.5.4 Execution stage 49 3.5.5 Memory stage 49 3.5.6 Commit stage 50 3.5.7 Writeback stage 51 第四章模擬與驗證 52 4.1 處理器核心單元合成結果 52 4.2 處理器核心單元指令運行模擬分析 53 4.2.1 Instruction Flow與Branch Ordered Flow模擬分析 54 4.2.2 Memory Access Flow模擬分析 57 4.2.3 指令模擬測試結果分析 59 4.3 雙核心架構之指令模擬驗證 59 4.3.1 奇數和偶數和指令模擬驗證 60 4.3.2 矩陣相乘指令模擬 61 4.3.3 指令模擬驗證比較 63 第五章結論 65 參考文獻 66

參考文獻 References
[1] Sparsh Mittal. “A survey of techniques for architecting and managing asymmetric multicore processors.” ACM Comput. Surv, February 2016, 48, 3, Article 45, 38 pages,. [2] Sandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, and Eric Rotenberg. “A unified view of nonmonotonic core selection and application steering in heterogeneous chip multiprocessors.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2013, pp.133–144. [3] Mihai Pricopi, Thannirmalai Somu Muthukaruppan, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. “Power-performance modeling on asymmetric multi-cores.” In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, 2013, (CASES’13), pp.1–10. [4] Rance Rodrigues, Israel Koren, and Sandip Kundu. “Performance and power benefits of sharing execution units between a high performance core and a low power core.” Proceedings of the International Conference on VLSI Design, 2014, (VLSID’14), pp.204–209. [5] Sudarshan Srinivasan, Rance Rodrigues, Arunachalam Annamalai, Israel Koren, and Sandip Kundu. “A study on polymorphing superscalar processor dynamically to improve power efficiency.” Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2013, (ISVLSI’13), pp.46–51. [6] Yasuko Watanabe, John D. Davis, and David A. Wood. “WiDGET: Wisconsin decoupled grid execution tiles.” Proceedings of the International Symposium on Computer Architecture, 2010, (ISCA’10), Vol. 38. pp.2–13. [7] Mihai Pricopi and Tulika Mitra. “Bahurupi: A polymorphic heterogeneousmulti-core architecture.” ACM Transactions on Architecture and Code Optimization 8, 4, 2012, 22:1–22:21. [8] Ding-Siang Su. “Design of the Execution-driven Simulation Environment for Hyper-scalar Architecture.” Department of Electrical Engineering National Sun Yat-Sen University, 2008. [9] Jih-Ching Chiu, Yu-Liang Chou, Po-Kai Chen and Ding-Siang Su. “A Unitable Computing Architecture for Chip Multiprocessors.” The Computer Journal, Nov. 2011 Vol. 54, No. 12, pp.2033-2052. [10] Po-Kai Chen. “ESL Model of the Hyper-scalar Processor on a Chip.” Department of Electrical Engineering National Sun Yat-Sen University, 2007. [11] Jih-Ching Chiu, Yin-Jou Huang, Yi-Lin Ye. “Design of the Optimized Group management Unit by Detecting Thread Parallelism on the Hyperscalar Architecture.” National Computer Symposium, Dec.2013. [12] Jih-Ching Chiu, Yi-Lin Ye. “Design Instruction Analyzer in the Hyper-Scalar Architecture.” National Computer Symposium, Dec.2015. [13] Yu-Ren Lai. “Design of the Superscalar Dual-Core Architecture using Single-Issue Out-of-Order Instruction Pipe for Embedded System.” Department of Electrical Engineering National Sun Yat-Sen University, 2009. [14] Jih-Ching Chiu, Yu-Liang Chou, Po-Kai Chen, “A Superscalar Dual-Core Architecture for ARM ISA.” Proceedings of the International Computer Symposium, Dec. 2006, pp.21-26. [15] ARM Architecture Reference Manual : https://www.scss.tcd.ie/~waldroj/3d1/arm_arm.pdf [16] Jih-ching Chiu, Kai-Ming Yang, and Yu-Liang Chou. “A hyperscalar dual-core architecture for embedded systems.” Microprocessors and Microsystems 37.8 2013, pp.929-940. [17] Yu-Liang Chou. “Study of the Hyperscalar Multi-core Architecture.” Department of Electrical Engineering National Sun Yat-Sen University, 2011. [18] Wei-Chih Shih. “Design of Robust Micro-Control Unit.” Department of Electrical Engineering National Sun Yat-Sen University, 2008. [19] Smith, James E., and Gurindar S. Sohi. “The microarchitecture of superscalar processors.” Proceedings of the IEEE 83.12, 1995, pp.1609-1624

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0719116-164809.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS