論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
X86指令集之具有基本區塊重組的指令流緩衝器的設計 Design of a Basic Block Reassembling Instruction Stream Buffer for X86 ISA |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
69 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2005-07-08 |
繳交日期 Date of Submission |
2005-08-22 |
關鍵字 Keywords |
指令流緩衝器、分支目標緩衝器 instruction stream buffer, branch target buffer, X86 |
||
統計 Statistics |
本論文已被瀏覽 5690 次,被下載 1858 次 The thesis/dissertation has been browsed 5690 times, has been downloaded 1858 times. |
中文摘要 |
現今的X86處理器都具有超純量的處理能力。超純量架構具有在一個週期內抓取、運算及結束多個指令能力,藉此來獲得更大的指令層級的平行度。但即使擁有超純量的處理能力,若處理器無法有效率地抓取多個指令而造成後端硬體空置,所能提升的效能也就有限。 程式指令的不連續是造成抓取效率低落的主因之一。這造成了前端在一個週期內所能看見的連續指令有限,即使提高了前端的指令抓取數也無法改善此情形。本論文中,我們提出了一分支目標緩衝器與指令流緩衝器架構,此架構具有預先取得分支資訊以及重組快取行的能力。我們藉由重組原快取行與下一個基本區塊所在的快取行,來讓前端能看到兩個基本區塊的連續指令,如此一來,前端不但能看到更多的有效指令,也能輕易地擷取跨越基本區塊的指令。模擬與實作的結果也顯示,在64位元組快取行與前端寬度為6個指令的系統下,可比原來的系統增加43.2%的抓取效率;並在4個快取行深度ABP緩衝器支援下,平均每週期能抓取3.6個有效指令。 |
Abstract |
Nowadays, X86 CPU all have superscalar computing ability. Superscalar architecture can fetch, execute and commit more than one instruction per cycle. And it helps a lot to explore more instruction level parallelism. If a superscalar processor fetches instructions inefficiently, its performance speedup ratio will be limit. Program flow is not continuous. It is one of main reasons that Front-End can’t fetch efficiently. And it is useless to get more speedup by enlarging fetch capacity of Front-End or other units. In this thesis, we present a new structure of branch target buffer and instruction stream buffer. They have abilities to predict advance branch information and reassemble cache lines. Front-End could fetch more valid instructions in a cycle by reassembling original line and line which contains instructions of the next basic block. The simulation and implement results show that we can get 43.2% speedup in fetch efficiency with 64 bytes cache line size and 6 fetch capacities. And 3.6 valid instructions per cycle with ABP buffer which buffers 4 cache line. |
目次 Table of Contents |
中文摘要 2 英文摘要 3 目錄 4 圖目錄 6 表目錄 8 第一章 簡介 9 1-1 研究動機 10 1-2 研究目標 13 1-3 論文架構 14 第二章 相關研究 15 2-1 預先抓取指令 15 2-2 分支預測 18 2-3 ABP緩衝器與指令預先解碼 21 2-3-1 ABP緩衝器 22 2-3-2 指令預先解碼 23 第三章 分支目標緩衝器的設計 27 3-1 分支目標緩衝器的讀取機制 28 3-2 分支目標緩衝器的內部結構 29 3-3 結合分支目標緩衝器與預測單元 34 第四章 指令流緩衝器的設計 36 4-1 概論 36 4-2 抓取循序器 38 4-3 重組單元與指令擷取單元 41 4-3-1 快取行重組單元 41 4-3-2 指令群重組單元 43 4-3-3 程式計數器產生器 45 4-4 高速設計下的硬體架構 46 第五章 模擬與分析 49 5-1 效能評估程式 49 5-2 分支目標緩衝器的模擬與分析 51 5-3 指令流緩衝器的模擬與分析 55 5-3 硬體的合成與分析 62 第六章 結論 65 參考文獻 67 |
參考文獻 References |
[1] N. P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffer.” In Proceedings of 17th Annual International Symposium on Computer Architecture, pp. 28-31, May 1990. [2] T.-Y. Yeh and Y. N. Patt., “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache.” In Proceedings of the 7th International Conference on Supercomputing, July 1993. [3] C.-C. Lee, I.-C. K. Chen and T. N. Mudge, “The Bi-Mode Branch Predictor.” In Proceedings of Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, pp. 4-13, December 1997. [4] S. McFarlin, “Combining Branch Predictors.” WRL Technical Report TN-36, Digital Equipment Corp., June 1993 [5] T.-Y. Yeh and Y. N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction.” In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 124-134, May 1992. [6] T. M. Conte, K. N. Menzes, P. M. Mills, and B. A. Patel, “Optimization of Instruction Fetch Mechanisms for High Issue Rate.” In 22nd Annual International Symposium on Computer Architecture, pp. 333-334, June 1995. [7] G. Reinman, B. Calder and T. Austin, “Optimizations Enabled by a Decoupled Front-End Architecture.” IEEE Transactions on Computers, pp. 338-355, April 2001. [8] E. Rotenberg, S. Bennett and J. E. Smith, “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching.” In Proceedings of the 29th international Symposium on Microarchitecture, pp. 24-34, December 1996. [9] S.Jourdan, L. Rappoport, Y. Almog, Mrez, A. Yoaz and R. Ronen, “eXtended Block Cache.” In Proceedings of the 6th International Symposium on High-Performance Computer Architecture, pp. 61-70, January 2000 [10] B. Black, B. Rychlik and J. P. Shen, “The Block-based Trace Cache.” In Proceedings of the 26th International Symposium on Computer Architecture, ,pp. 196-207, May 1999. [11] J. -C. Chiu and C .-P. Chung, “High-bandwidth x86 instruction fetching based on instruction pointer table”. In IEE Proceedings of the Computer and Digital Techniques, pp. 113-118, May 2001. [12] M. Slater, “AMD’s K5 Designed to Outrun Pentium.” Microprocessor Report, volume 8, number 14, Oct. 1994. [13] L. Gwennap, “Intel’s P6 Uses Decoupled Superscalar Design.” Microprocessor Report, volume 9, number 2, Feb. 1995. [14] AMD Corporation, “Software Optimization Guide for AMD Athlon 64 and AMD Opteron.” Technical Document, March 2004. [15] G. Hinton, D. Sager, M. Upton, D. Dogs, D. Carmean, A. Kyker and P. Roussel “The Microarchitecture of Pentium 4 Processor.” Intel Technology Journal Q1, 2004. [16] S. J. E. Wilton and N. P. Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches.” WRL Technical Report 93/5, Digital Equipment Corp., July 1994 [17] J. L. Hennessy and D. A. Patterson, “Computer Architecture: A Quantitative Approach. third edition” Morgan Kaufmann Publishers, 2003. [18] Standard Performance Evaluation Corporation, http://www.spec.org/spec/contact.html |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內校外完全公開 unrestricted 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |