國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,X86指令集之具有基本區塊重組的指令流緩衝器的設計,Design of a Basic Block Reassembling Instruction Stream Buffer for X86 ISA

論文名稱 Title	X86指令集之具有基本區塊重組的指令流緩衝器的設計 Design of a Basic Block Reassembling Instruction Stream Buffer for X86 ISA
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	93 學年度第 2 學期 The spring semester of Academic Year 93	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	69
研究生 Author	林增奎 Tseng-Kuei Lin
指導教授 Advisor	邱日清 none
召集委員 Convenor	蕭勝夫 none
口試委員 Advisory Committee	黃英哲, 李聰 none; none
口試日期 Date of Exam	2005-07-08	繳交日期 Date of Submission	2005-08-22
關鍵字 Keywords	指令流緩衝器、分支目標緩衝器 instruction stream buffer, branch target buffer, X86
統計 Statistics	本論文已被瀏覽 5690 次，被下載 1858 次 The thesis/dissertation has been browsed 5690 times, has been downloaded 1858 times.

中文摘要
現今的X86處理器都具有超純量的處理能力。超純量架構具有在一個週期內抓取、運算及結束多個指令能力，藉此來獲得更大的指令層級的平行度。但即使擁有超純量的處理能力，若處理器無法有效率地抓取多個指令而造成後端硬體空置，所能提升的效能也就有限。程式指令的不連續是造成抓取效率低落的主因之一。這造成了前端在一個週期內所能看見的連續指令有限，即使提高了前端的指令抓取數也無法改善此情形。本論文中，我們提出了一分支目標緩衝器與指令流緩衝器架構，此架構具有預先取得分支資訊以及重組快取行的能力。我們藉由重組原快取行與下一個基本區塊所在的快取行，來讓前端能看到兩個基本區塊的連續指令，如此一來，前端不但能看到更多的有效指令，也能輕易地擷取跨越基本區塊的指令。模擬與實作的結果也顯示，在64位元組快取行與前端寬度為6個指令的系統下，可比原來的系統增加43.2%的抓取效率；並在4個快取行深度ABP緩衝器支援下，平均每週期能抓取3.6個有效指令。
Abstract
Nowadays, X86 CPU all have superscalar computing ability. Superscalar architecture can fetch, execute and commit more than one instruction per cycle. And it helps a lot to explore more instruction level parallelism. If a superscalar processor fetches instructions inefficiently, its performance speedup ratio will be limit. Program flow is not continuous. It is one of main reasons that Front-End can’t fetch efficiently. And it is useless to get more speedup by enlarging fetch capacity of Front-End or other units. In this thesis, we present a new structure of branch target buffer and instruction stream buffer. They have abilities to predict advance branch information and reassemble cache lines. Front-End could fetch more valid instructions in a cycle by reassembling original line and line which contains instructions of the next basic block. The simulation and implement results show that we can get 43.2% speedup in fetch efficiency with 64 bytes cache line size and 6 fetch capacities. And 3.6 valid instructions per cycle with ABP buffer which buffers 4 cache line.

目次 Table of Contents
中文摘要 2 英文摘要 3 目錄 4 圖目錄 6 表目錄 8 第一章簡介 9 1-1 研究動機 10 1-2 研究目標 13 1-3 論文架構 14 第二章相關研究 15 2-1 預先抓取指令 15 2-2 分支預測 18 2-3 ABP緩衝器與指令預先解碼 21 2-3-1 ABP緩衝器 22 2-3-2 指令預先解碼 23 第三章分支目標緩衝器的設計 27 3-1 分支目標緩衝器的讀取機制 28 3-2 分支目標緩衝器的內部結構 29 3-3 結合分支目標緩衝器與預測單元 34 第四章指令流緩衝器的設計 36 4-1 概論 36 4-2 抓取循序器 38 4-3 重組單元與指令擷取單元 41 4-3-1 快取行重組單元 41 4-3-2 指令群重組單元 43 4-3-3 程式計數器產生器 45 4-4 高速設計下的硬體架構 46 第五章模擬與分析 49 5-1 效能評估程式 49 5-2 分支目標緩衝器的模擬與分析 51 5-3 指令流緩衝器的模擬與分析 55 5-3 硬體的合成與分析 62 第六章結論 65 參考文獻 67

參考文獻 References
[1] N. P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffer.” In Proceedings of 17th Annual International Symposium on Computer Architecture, pp. 28-31, May 1990. [2] T.-Y. Yeh and Y. N. Patt., “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache.” In Proceedings of the 7th International Conference on Supercomputing, July 1993. [3] C.-C. Lee, I.-C. K. Chen and T. N. Mudge, “The Bi-Mode Branch Predictor.” In Proceedings of Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, pp. 4-13, December 1997. [4] S. McFarlin, “Combining Branch Predictors.” WRL Technical Report TN-36, Digital Equipment Corp., June 1993 [5] T.-Y. Yeh and Y. N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction.” In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 124-134, May 1992. [6] T. M. Conte, K. N. Menzes, P. M. Mills, and B. A. Patel, “Optimization of Instruction Fetch Mechanisms for High Issue Rate.” In 22nd Annual International Symposium on Computer Architecture, pp. 333-334, June 1995. [7] G. Reinman, B. Calder and T. Austin, “Optimizations Enabled by a Decoupled Front-End Architecture.” IEEE Transactions on Computers, pp. 338-355, April 2001. [8] E. Rotenberg, S. Bennett and J. E. Smith, “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching.” In Proceedings of the 29th international Symposium on Microarchitecture, pp. 24-34, December 1996. [9] S.Jourdan, L. Rappoport, Y. Almog, Mrez, A. Yoaz and R. Ronen, “eXtended Block Cache.” In Proceedings of the 6th International Symposium on High-Performance Computer Architecture, pp. 61-70, January 2000 [10] B. Black, B. Rychlik and J. P. Shen, “The Block-based Trace Cache.” In Proceedings of the 26th International Symposium on Computer Architecture, ,pp. 196-207, May 1999. [11] J. -C. Chiu and C .-P. Chung, “High-bandwidth x86 instruction fetching based on instruction pointer table”. In IEE Proceedings of the Computer and Digital Techniques, pp. 113-118, May 2001. [12] M. Slater, “AMD’s K5 Designed to Outrun Pentium.” Microprocessor Report, volume 8, number 14, Oct. 1994. [13] L. Gwennap, “Intel’s P6 Uses Decoupled Superscalar Design.” Microprocessor Report, volume 9, number 2, Feb. 1995. [14] AMD Corporation, “Software Optimization Guide for AMD Athlon 64 and AMD Opteron.” Technical Document, March 2004. [15] G. Hinton, D. Sager, M. Upton, D. Dogs, D. Carmean, A. Kyker and P. Roussel “The Microarchitecture of Pentium 4 Processor.” Intel Technology Journal Q1, 2004. [16] S. J. E. Wilton and N. P. Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches.” WRL Technical Report 93/5, Digital Equipment Corp., July 1994 [17] J. L. Hennessy and D. A. Patterson, “Computer Architecture: A Quantitative Approach. third edition” Morgan Kaufmann Publishers, 2003. [18] Standard Performance Evaluation Corporation, http://www.spec.org/spec/contact.html

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0822105-162955.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS