國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以單指令派發亂序執行之指令管道設計適用於嵌入式系統之超純量雙核心架構,Design of the Superscalar Dual-Core Architecture using Single-Issue Out-of-Order Instruction Pipe for Embedded System

論文名稱 Title	以單指令派發亂序執行之指令管道設計適用於嵌入式系統之超純量雙核心架構 Design of the Superscalar Dual-Core Architecture using Single-Issue Out-of-Order Instruction Pipe for Embedded System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	97 學年度第 2 學期 The spring semester of Academic Year 97	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	77
研究生 Author	賴鈺仁 Yu-ren Lai
指導教授 Advisor	邱日清 Jih-chin Chiu
召集委員 Convenor	鍾崇斌 Chung-ping Chung
口試委員 Advisory Committee	蕭勝夫, 李聰 Shen-fu Hsiao; Tsung Lee
口試日期 Date of Exam	2009-07-24	繳交日期 Date of Submission	2009-07-29
關鍵字 Keywords	亂序執行、嵌入式系統、超純量、雙核心、單指令派發 Dual-Core, Superscalar, Embedded System, Out-of-Order, Single-Issue
統計 Statistics	本論文已被瀏覽 5683 次，被下載 2081 次 The thesis/dissertation has been browsed 5683 times, has been downloaded 2081 times.

中文摘要
在現今嵌入式系統多元化的應用下，系統之設計除了著重在低功率消耗與降低設計的複雜度外，彈性化且適切化的運算效能在處理器的架構發展上有著不可被忽視的重要性。而隨著微處理器系統的演進，並由於製程技術的進步，使得將多個核心整合到單一顆處理器中變得更為容易。然而多核心的系統架構雖然具備良好的多執行緒執行效能，但是卻無法有效地提升單執行緒執行效能，導致無法對原有程式的執行提供有效地支援，程式碼必須在好的執行排序與好的作業環境下才能發揮其功效。在本論文中以適用於嵌入式系統之多核心處理器設計為考量，討論如何以多核心架構有效地提升單執行緒的執行效能且相容於原有程式的執行。而為了簡化在多核心架構設計上的問題複雜度，故在本論文中以雙核心的架構來討論如何建構由於資料相依與控制相依的指令執行路徑，因此本論文的設計重點在於： 1. 建構簡單的亂序執行核心。 2. 可動態排程的指令分析器設計。 3. 可進行跨核心運算元共享的機制。 4. 具雙核心間同步偵測功能的指令執行確認。本架構中的每個核心皆為單指令派發亂序執行之指令管道。指令在抓取時可以藉由指令分析器針對指令間的資料相依性編寫對應的指令標籤並將指令動態排程後派發至兩個核心內執行，指令在核心內則可根據指令標籤至其他核心取得所需之運算元資料，完成運算元在兩個核心間的資料交流，使得指令可以在此雙核心架構中達到最大化的平行運算。而核心內指令以資料流為導向的方式執行但遵守依序完成的原則，以維持程式執行的正確性。本論文以ARM指令集為依據，實現與探討如何建立兩個核心之間的指令交互控制機制以及如何提升單一執行緒在此架構中的執行效能。而在效能評估方面，利用程式完成此超純量雙核心架構之動作行為模型，以程式軌跡導向模擬的方式，對此架構進行功能驗證，並以MediaBench suite為效能評估程式進行效能之評估模擬，根據模擬的結果與單核心五階管線架構相比，顯示平均有1.4倍以上的效能增進。
Abstract
With the improvement in VLSI technology, realization of multiple processor cores on a single chip becomes easier. Therefore, more and more users execute applications on current multi-core architectures. The multi-core system has a brilliant performance in executing multi-threaded applications, but this system could not gain any performance in single-threaded applications. This paper proposes a multi-core architecture for enhancing single-threaded performance in embedded system, and focuses on four points: 1. Construct a simple out-of-order execution core. 2. Design a dynamically scheduled instruction analyzer. 3. Design a mechanism for sharing operands between two cores. 4. Design a mechanism for committing instructions synchronously between two cores. The architecture of each core is single-issue out-of-order instruction pipe. First, instruction analyzer will fetch instructions and generate instruction dependence tags by detecting the dependencies among the fetched instructions, then schedule instructions dynamically and dispatch to the cores. In the core, instructions can know where to get required operands according to the information of instruction tags, this mechanism enables data can be shared between two cores. Instructions are executed by data-driven approach, but in-order complete to maintain the correctness of the program order. Based on ARM instruction set, this paper tries to explore ways to achieve interaction control mechanisms between two cores and to accelerate a single-thread in the dual-core architecture. We write a simulation model of the proposed architecture in C language as our trace-driven simulation framework and the MediaBench suite is selected for the experiments. According simulation result, the architecture can obtain average 40% performance speedup comparing to the five-stage pipelined architecture.

目次 Table of Contents
摘要 I ABSTRACT III 目錄 V 圖片列表 VII 表格列表 IX 第一章簡介 1 1-1研究動機 1 1-2研究目標 2 1-3論文架構 3 第二章相關研究 4 2-1單一核心架構介紹 4 2-2超純量處理器介紹 6 2-2-1靜態排程超純量處理器 7 2-2-2動態排程超純量處理器 8 2-2-3預先執行超純量處理器 10 2-3多核心架構介紹 11 第三章單指令派發亂序執行之超純量雙核心架構設計 20 3-1單指令派發亂序執行之超純量雙核心架構 20 3-2指令分析器之設計 24 3-3單指令派發亂序執行之核心架構設計 30 3-3-1 Fetch Stage 34 3-3-2 Data Stage 36 3-3-3 Memory Stage 41 3-3-4 Commit Stage 44 第四章模擬與分析 47 4-1架構驗證 47 4-1-1奇數和偶數和程式運作範例 47 4-1-2矩陣相乘程式運作範例 49 4-1-3範例程式在其他架構中運作的比較 51 4-2效能模擬 54 4-2-1模擬環境 55 4-2-2模擬器的實現 56 4-2-3效能評估程式 57 4-3模擬結果分析 59 第五章結論 60 參考文獻 62

參考文獻 References
[1] P. P. Gelsinger Intel Corp., Hillsboro, OR, “Microprocessors for the new millennium : Challenges,opportunities, and new frontiers”; IEEE International Solid-State Circuits Conference, 2001, pp. 22-25 [2] ARM, “ARM9TDMI Technical Reference Manual Rev 3”; 2000 http://infocenter.arm.com [3] ARM, “ARM1176JZ-S Technical Reference Manual Revision : r0p7”; 2008 http://infocenter.arm.com [4] ARM, “Cortex-A8 Technical Reference Manual Revision : r3p2”; 2009 http://infocenter.arm.com [5] J. E. Smith, G. S. Sohi, “The microarchitecture of superscalar processors”; Proceedings of the IEEE Vol. 83, Issue 12, Dec. 1995, pp. 1609-1624 [6] J. E. Thornton, “Parallel operation in the Control Data 6600”; Proceedings of Spring Joint Computer Conference, 1964, pp. 33-40 [7] D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo, “The IBM System/360 model 91: Machine philosophy and instruction-handling”; IBM Journal of Research and Development, Vol. 11, 1967, pp. 8-24 [8] Z. Purser, K. Sundaramoorthy, and E. Rotenberg, “A study of Slipstream Processors”; Proceedings of the 33rd annual ACM/IEEE international, 2000, pp. 269-280 [9] K. Sundaramoorthy, Z. Purser, and E. Rotenberg. “Slipstream processor: improving both performance and fault tolerance”; ACM SIGPLAN Notices, Vol 35, 2000, pp. 257-268 [10] K. Z. Ibrahim, G. T. Byrd and E. Rotenberg, “Slipstream execution mode for CMP-based multiprocessors”; High-Performance Computer Architecture, 2003, pp. 179-190 [11] H. Zhou, “Dual-core execution: building a highly scalable single-thread instruction window”; 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05), 2005, pp. 231-242 [12] M. Tremblay, J. Chan, S. Chaudhry, A. W. Conigliam, S. S. Tse, “The MAJC architecture: a synthesis of parallelism and scalability”; Micro, IEEE Vol. 20, Issue 6, Nov.-Dec. 2000, pp. 12-25 [13] G. S. Sohi, S. E. Breach, T. N. Vijaykumar, “Multiscalar Processor”; 22nd Annual International Symposium on Computer Architecture, 1995, pp. 414-425 [14] M. Franklin, “The Multiscalar Architecture”; Ph.D. Thesis, Computer Science Technical Report #1196, 1993 [15] L. Wang, C. L. Wu, “Distributed Instruction Set Computer Architecture”; IEEE Transactions on Computers, Vol. 40, 1991, pp. 915-934 [16] J. Congy, G. Hany, A. Jagannathan, G. Reinmany, K. Rutkowski, “Accelerating Sequential Applications on CMPs Using Core Spilling”; IEEE Transactions On Parallel and Distributed Systems, Vol. 18, Issue 8, 2007, pp. 1094-1107 [17] J. C. Chiu, Y. L. Chou, P. K. Chen, “A Superscalar Dual-Core Architecture for ARM ISA”; Proceedings of the International Computer Symposium 2006, Dec. 2006, pp. 21-26 [18] L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R.Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan, “Larrabee: a many-core x86 architecture for visual computing”; ACM Transactions on Graphics, Vol. 27, Issue 3, Aug. 2008 [19] M. Gschwind, H. P. Hofstee, B. Flachs, M. Hopkins, Yukio Watanabe, Takeshi Yamazaki, “Synergistic Processing in Cell's Multicore Architecture”; IEEE Micro, Vol. 26, Issue 2, 2006, pp. 10-24 [20] S. S. Stone, K. M. Woley, M. I. Frank, “Address-Indexed Memory Disambiguation and Store-to-Load Forwarding”; Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Nov. 2005, pp. 171-182 [21] Tingting Sha, M. M. K. Martin, A. Roth, “Scalable Store-Load Forwarding via Store Queue Index Prediction”; Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Nov. 2005, pp. 159-170 [22] B. Bishop, T. P. Kelliher, M. J. Irwin, “A detailed analysis of MediaBench”; Signal Processing Systems, 1999. SiPS 99. 1999 IEEE Workshop, 20-22 Oct. 1999, pp. 448-455 [23] C. Lee, M. Potkonjak, W. H. Mangione-Smith “MediaBench: a tool for evaluating and synthesizing multimedia and communications systems”; Microarchitecture, 1997. Proceedings. Thirtieth Annual IEEE/ACM International Symposium, 1-3 Dec. 1997, pp. 330-335

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0729109-173040.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS