國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,設計以執行驅動方式之超多純量架構的模擬環境,Design of the Execution-driven Simulation Environment for Hyper-scalar Architecture

論文名稱 Title	設計以執行驅動方式之超多純量架構的模擬環境 Design of the Execution-driven Simulation Environment for Hyper-scalar Architecture
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	96 學年度第 2 學期 The spring semester of Academic Year 96	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	94
研究生 Author	蘇鼎翔 Ding-Siang Su
指導教授 Advisor	邱日清 Jih-Ching Chiu
召集委員 Convenor	張大緯 Da-Wei Chang
口試委員 Advisory Committee	李聰 Tsung Lee
口試日期 Date of Exam	2008-07-23	繳交日期 Date of Submission	2008-08-21
關鍵字 Keywords	超多純量、多核心處理器 Hyper-scalar, multi-core architecture
統計 Statistics	本論文已被瀏覽 5717 次，被下載 0 次 The thesis/dissertation has been browsed 5717 times, has been downloaded 0 times.

中文摘要
由於微處理器系統研究及VLSI製程技術的發展，以效能為導向的處理器系統中單晶片多核心架構成為主流。然而現今的同質多核心架構，大多是以symmetric multiprocessors(SMP)的概念來設計，傳統的SMP機制下各核心之間單純只有資料的交互連結，因此單一執行緒只能由單一個核心處理，也就限制了多核心的利用率使得效能無法提升。因此本論文提出一種單晶片多核心的微處理器系統架構，將其定名為超多純量（Hyper-scalar）架構。此架構最大的特點在於「在多核心處理器中加入指令間的交互連結控制機制」，使得單一執行緒能在多核心下並行處理，而達成instruction Level Parallelism。架構中結合了超純量與多執行緒的優勢，可多核心增進單一執行緒效能，同時又支援多執行緒並行運算。而因應多執行緒的管理的需要，本架構新增一系列動態操作指令，使系統能透過新增指令動態分配各執行緒所使用的核心數量，可依據程式需求動態調整硬體資源達成運算資源的有效運用。本論文以ARM指令集為依據，探討如何建立各核心之間的指令交互控制機制及如何提升單一執行緒在Hyper-scalar架構下的執行效能。將之分成四個部份討論：暫存器資料流程、記憶體資料流程、指令執行控制流程及Multi cycle指令拆解。當指令抓取時每一筆指令依據所需的暫存器被標示相依性標籤，指令間的運算資料傳遞機制以虛擬共享暫存器處理，依資料驅動的執行理念，待所有的運算元皆備妥後，指令才進行運算。而記憶體資料流程的部份，採取較簡單的處理方法—循序執行記憶體存取指令。在指令執行控制流程方面為了改善執行效能，本架構中加入了指令預測執行機制，使指令達成跨越basic block的out of order execution。最後由於ARM指令集中有部分指令為multi cycle 指令，在本架構下執行時可拆解為多筆 one cycle指令並行處理，使效能更進一步提昇。而在效能評估方面，使用SystemC語言進行ESL模型撰寫，經由模擬驗證得知上述的演算法可正確實行。以MediaBench suite為評估程式進行效能之評估模擬，平均而言，超多純量系統架構在2 ~ 8個核心架構中，約可以獲得1.5至4倍的效能提升。
Abstract
As a result of the microprocessor system research and the development of VLSI manufacturing process technology, the recent trend and development in high performance computer have toward to multi-core architecture. However current multi-core architectures are designed by the symmetric multi processors (SMP) concept. In traditional SMP mechanism, there are only data link between processor cores. So a single thread only can be handled by a single core, it limits the usage rate in multi-core and performance can not increase. This paper proposed a scalable chip multiprocessor architecture, which is called Hyper-scalar. The Principal characteristic of the architecture is “design the interconnect control mechanisms for instructions in the multi-core”. Some single scalar processor cores in Hyper-scalar architecture can be dynamically grouped as an n-way superscalar accelerator to improve the instruction-level parallelism, which is called accelerator group. Hyper-scalar combines the advantages of superscalar and multithreaded architecture; Hence, this architecture can not only enhance single-threaded performance by using accelerate group but also supports multithreaded applications. The paper based on ARM instruction set, to analyze how to create the interactive control mechanisms for instruction in the multi-core, and how to enhance the performance of a single thread in the Hyper-scalar architecture. It can be divided into four parts: register flow, memory flow, instruction control flow, chop of multi cycle instruction. When instructions are issued into the processor, they must be attached dependence tags that can solve the dependence between all issued instructions. All instructions can exchange the data through the virtual shared register file (VSRF) mechanism, and all instructions are executed only when the operands are available. In the memory flow part: we solve the dependence problem with a simple technique—to execute instruction in instruction order. In instruction control flow part: in order to improve performance, we perform speculation execution mechanism, so the instructions can out of order execution beyond the basic block. Finally because there are some multi cycle instructions in the ARM instruction set, in hyper-scalar framework can chop into many one cycle instructions to further enhance performance. The simulation Model is written by SystemC, a modeling language based on C++ is to provide hardware-oriented simulation platform and the MediaBench suite is selected for the experiments. On average, the Hyper-scalar architecture can accelerate single-threaded performance by 50% to 300% using 2 ~ 8 cores.

目次 Table of Contents
中文摘要 I ABSTRACT III 圖目錄 VII 表目錄 X 第一章簡介 1 1-1 研究動機 1 1-2 研究目標 2 1-3 論文架構 2 第二章相關研究 3 2-1 單一核心架構介紹 3 2-2 目前多核心處理器架構 5 2-2-1 多核心增進多執行緒效能之架構 5 2-2-2 多核心增進單執行緒效能之架構 6 2-3 資料驅動運行機制 13 2-4 超多純量系統架構之概念 15 第三章超多純量系統架構之設計 21 3-1 REGISTER DATA FLOW—ALU INSTRUCTION PROCESSING 21 3-1-1 指令符號說明 21 3-1-2 暫存器資料流的處理 22 3-1-3 虛擬暫存器概念介紹 23 3-1-4 VSRF架構設計與運作範例 25 3-2 MEMORY DATA FLOW—LOAD/STORE INSTRUCTION PROCESSING 29 3-2-1 記憶體資料流的處理 29 3-2-2 記憶體存取標籤概念介紹 30 3-2-3 記憶體存取標籤架構介紹 30 3-3 INSTRUCTION FLOW—BRANCH INSTRUCTION PROCESSING 33 3-3-1 指令流程控制的處理 33 3-3-2 分支指令標籤概念介紹 34 3-3-3 完整的分支指令標籤架構運作範例 36 3-4 其他相關指令流程控制機制 41 3-4-1 多指令同步抓取的機制 42 3-4-2 其他具有branch性質的指令 44 3-4-3 執行結果寫回控制機制 47 3-4-4 多核心模式下指令如何取得正確的R15 49 3-4-5 Stall 的控制 50 3-4-6 VSRF的OPND replier找尋運算元的方法 51 3-4-7 如何判定一筆指令是否已執行結束 52 3-5 MULTI CYCLE 指令拆解平行處理 53 3-5-1 指令拆解方法 54 3-5-2 超多純量架構下指令拆解的動作 55 3-5-3 指令拆解造成的控制問題 59 3-5-4 指令拆解後的相依性問題 59 3-6 多核心系統操作模式 61 3-6-1 處理器群組方式 61 3-6-2 新增之系統指令 62 第四章驗證與模擬 64 4-1 模擬環境設定 64 4-1-1 模擬器製作與設定 64 4-1-2 效能評估方模擬方法 66 4-1-3 效能評估程式 67 4-2 多核心架構與效能增進之關係 69 4-3 多核心架構與記憶體存取延遲時間之關係 70 第五章結論 72 附錄 74 參考文獻 78

參考文獻 References
[1]. John L. Hennessy, David A. Patterson, “Computer Architecture : A Quantitative Approach”, 3rd ed., Morgan Kaufmann, 2003 [2]. R. Kalla, Balaram Sinharoy, J.M. Tendler, “IBM Power5 Chip: A Dual-Core Multithread Processor”, IEEE Micro, vol. 24, No. 2, pp. 40 – 47, March/April 2004 [3]. T. Takayanagi, J. L. Shin, B. Petrick, J. Y. Su, H. Levy, Ha Pham; J. Son, N. Moon, D. Bistry, U. Nair, M. Singh, V. Mathur, A. S. Leon, “A dual-core 64-bit ultraSPARC microprocessor for dense server applications”, IEEE Journal of Solid-State Circuits, vol. 40, pp. 7-18, Jan. 2005 [4]. L Peng, JK Peir, TK Prakash, YK Chen, D Koppelman, “Memory Performance and Scalability of Intel's and AMD's Dual-Core Processors: A Case Study”, Performance, Computing, and Communications Conference, IPCCC, IEEE Internationa April 2007, pp. 55 - 64 [5]. Z. Purser, K. Sundaramoorthy, and E. Rotenberg, “A study of Slipstream Processors”. Proceedings of the 33rd annual ACM/IEEE international, 2000, pp. 269 - 280 [6]. K. Sundaramoorthy, Z. Purser, and E. Rotenberg. “Slipstream processor: improving both performance and fault tolerance”, ACM SIGPLAN Notices, vol 35, pp. 257 – 268, 2000 [7]. KZ Ibrahim, GT Byrd and E. Rotenberg, “Slipstream execution mode for CMP-based multiprocessors”, High-Performance Computer Architecture, 2003. HPCA-9 2003, pp. 179- 190 [8]. ST Srinivasan, H Akkary, T Holman, K Lai., “A minimal dual-core speculative multi-threading architecture”, Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings. IEEE International Conference 2004, pp. 360-367 [9]. L Wang, CL Wu, “Distributed Instruction Set Computer Architecture”, IEEE Transactions on Computers, 1991, vol. 40, pp. 915-934 [10]. GS Sohi, SE Breach, TN Vijaykumar, “Multiscalar Processor”, 22nd Annual International Symposium on Computer Architecture, 1995, pp. 414- 425 [11]. M. Franklin, “The Multiscalar Architecture” Ph.D. Thesis, Computer Science Technical Report #1196, [12]. H Zhou, “Dual-core execution: building a highly scalable single-thread instruction window”, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), pp. 231-242 [13]. Congy J. ; Hany G. ; Jagannathan A. ; Reinmany G. ; Rutkowski K. ; “Accelerating Sequential Applications on CMPs Using Core Spilling”, IEEE Transactions On Parallel and Distributed Systems : Accepted for future publication, 2007 [14]. JC Chiu, YL Chou, PK Chen, “A Superscalar Dual-Core Architecture for ARM ISA”, Proceedings of the International Computer Symposium 2006, pp. 21-26, Dec. 2006 [15]. T Shimada, “On new generation dataflow architecture”, Design and Application of Parallel Digital Processors, 1988., International Specialist Seminar on the 11-15 Apr 1988 Page(s):112 – 115 [16]. B Lee, AR Hurson, “Dataflow architectures and multithreading”, Computer Volume 27, Issue 8, Aug. 1994 Page(s):27 – 39 [17]. JE Smith, GS Sohi “The microarchitecture of superscalar processors”, Proceedings of the IEEE Vol. 83, Issue 12, Dec. 1995 pp. 1609 – 1624 [18]. B Bishop, TP Kelliher, MJ Irwin, “A detailed analysis of MediaBench”; Signal Processing Systems, 1999. SiPS 99. 1999 IEEE Workshop, 20-22 Oct. 1999 pp. 448 – 455 [19]. C Lee, M Potkonjak, WH Mangione-Smith “MediaBench: a tool for evaluating and synthesizing multimedia and communications systems”; Microarchitecture, 1997. Proceedings. Thirtieth Annual IEEE/ACM International Symposium on Publication Date: 1-3 Dec 1997, pp. 330-335 [20]. Po-Kai Chen, “ESL Model of the Hyper-scalar Processor on a Chip”,2007 ,Department of Electrical Engineering National Sun Yat-Sen University [21]. M. Horowitz and W. Dally, "How scaling will change processor architecture," Solid-State Circuits Conference, 2004.Digest of Technical Papers.ISSCC.2004 IEEE International, Vol.1, pp. 132-133, 2004. [22]. B. A. Nayfeh and K. Olukotun, "A single-chip multiprocessor," Computer, IEEE, vol. 30, pp. 79-85, 1997. [23]. L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen and K. Olukolun, "The Stanford Hydra CMP," Micro, IEEE, vol. 20, pp. 71-84, 2000. [24]. C. Zilles and G. Sohi, "Master/Slave speculative parallelization," Microarchitecture, 2002.(MICRO-35).Proceedings.35th Annual IEEE/ACM International Symposium on, pp. 85-96, 2002. [25]. Jenn-Yuan Tsai, J. Huang, C. Amlo, D. J. Lilja and Pen-Chung Yew, "The superthreaded processor architecture," Transactions on Computers, vol. 48, pp. 881-902, 1999.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.220.160.216 論文開放下載的時間是校外不公開 Your IP address is 18.220.160.216 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS