Responsive image
博碩士論文 etd-0822105-162955 詳細資訊
Title page for etd-0822105-162955
論文名稱
Title
X86指令集之具有基本區塊重組的指令流緩衝器的設計
Design of a Basic Block Reassembling Instruction Stream Buffer for X86 ISA
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
69
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-08
繳交日期
Date of Submission
2005-08-22
關鍵字
Keywords
指令流緩衝器、分支目標緩衝器
instruction stream buffer, branch target buffer, X86
統計
Statistics
本論文已被瀏覽 5690 次,被下載 1858
The thesis/dissertation has been browsed 5690 times, has been downloaded 1858 times.
中文摘要
現今的X86處理器都具有超純量的處理能力。超純量架構具有在一個週期內抓取、運算及結束多個指令能力,藉此來獲得更大的指令層級的平行度。但即使擁有超純量的處理能力,若處理器無法有效率地抓取多個指令而造成後端硬體空置,所能提升的效能也就有限。
程式指令的不連續是造成抓取效率低落的主因之一。這造成了前端在一個週期內所能看見的連續指令有限,即使提高了前端的指令抓取數也無法改善此情形。本論文中,我們提出了一分支目標緩衝器與指令流緩衝器架構,此架構具有預先取得分支資訊以及重組快取行的能力。我們藉由重組原快取行與下一個基本區塊所在的快取行,來讓前端能看到兩個基本區塊的連續指令,如此一來,前端不但能看到更多的有效指令,也能輕易地擷取跨越基本區塊的指令。模擬與實作的結果也顯示,在64位元組快取行與前端寬度為6個指令的系統下,可比原來的系統增加43.2%的抓取效率;並在4個快取行深度ABP緩衝器支援下,平均每週期能抓取3.6個有效指令。
Abstract
Nowadays, X86 CPU all have superscalar computing ability. Superscalar architecture can fetch, execute and commit more than one instruction per cycle. And it helps a lot to explore more instruction level parallelism. If a superscalar processor fetches instructions inefficiently, its performance speedup ratio will be limit.
Program flow is not continuous. It is one of main reasons that Front-End can’t fetch efficiently. And it is useless to get more speedup by enlarging fetch capacity of Front-End or other units. In this thesis, we present a new structure of branch target buffer and instruction stream buffer. They have abilities to predict advance branch information and reassemble cache lines. Front-End could fetch more valid instructions in a cycle by reassembling original line and line which contains instructions of the next basic block. The simulation and implement results show that we can get 43.2% speedup in fetch efficiency with 64 bytes cache line size and 6 fetch capacities. And 3.6 valid instructions per cycle with ABP buffer which buffers 4 cache line.
目次 Table of Contents
中文摘要 2
英文摘要 3
目錄 4
圖目錄 6
表目錄 8
第一章 簡介 9
1-1 研究動機 10
1-2 研究目標 13
1-3 論文架構 14
第二章 相關研究 15
2-1 預先抓取指令 15
2-2 分支預測 18
2-3 ABP緩衝器與指令預先解碼 21
2-3-1 ABP緩衝器 22
2-3-2 指令預先解碼 23
第三章 分支目標緩衝器的設計 27
3-1 分支目標緩衝器的讀取機制 28
3-2 分支目標緩衝器的內部結構 29
3-3 結合分支目標緩衝器與預測單元 34
第四章 指令流緩衝器的設計 36
4-1 概論 36
4-2 抓取循序器 38
4-3 重組單元與指令擷取單元 41
4-3-1 快取行重組單元 41
4-3-2 指令群重組單元 43
4-3-3 程式計數器產生器 45
4-4 高速設計下的硬體架構 46
第五章 模擬與分析 49
5-1 效能評估程式 49
5-2 分支目標緩衝器的模擬與分析 51
5-3 指令流緩衝器的模擬與分析 55
5-3 硬體的合成與分析 62
第六章 結論 65
參考文獻 67
參考文獻 References
[1] N. P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffer.” In Proceedings of 17th Annual International Symposium on Computer Architecture, pp. 28-31, May 1990.
[2] T.-Y. Yeh and Y. N. Patt., “Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache.” In Proceedings of the 7th International Conference on Supercomputing, July 1993.
[3] C.-C. Lee, I.-C. K. Chen and T. N. Mudge, “The Bi-Mode Branch Predictor.” In Proceedings of Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, pp. 4-13, December 1997.
[4] S. McFarlin, “Combining Branch Predictors.” WRL Technical Report TN-36, Digital Equipment Corp., June 1993
[5] T.-Y. Yeh and Y. N. Patt, “Alternative Implementations of Two-Level Adaptive Branch Prediction.” In Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 124-134, May 1992.
[6] T. M. Conte, K. N. Menzes, P. M. Mills, and B. A. Patel, “Optimization of Instruction Fetch Mechanisms for High Issue Rate.” In 22nd Annual International Symposium on Computer Architecture, pp. 333-334, June 1995.
[7] G. Reinman, B. Calder and T. Austin, “Optimizations Enabled by a Decoupled Front-End Architecture.” IEEE Transactions on Computers, pp. 338-355, April 2001.
[8] E. Rotenberg, S. Bennett and J. E. Smith, “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching.” In Proceedings of the 29th international Symposium on Microarchitecture, pp. 24-34, December 1996.
[9] S.Jourdan, L. Rappoport, Y. Almog, Mrez, A. Yoaz and R. Ronen, “eXtended Block Cache.” In Proceedings of the 6th International Symposium on High-Performance Computer Architecture, pp. 61-70, January 2000
[10] B. Black, B. Rychlik and J. P. Shen, “The Block-based Trace Cache.” In Proceedings of the 26th International Symposium on Computer Architecture, ,pp. 196-207, May 1999.
[11] J. -C. Chiu and C .-P. Chung, “High-bandwidth x86 instruction fetching based on instruction pointer table”. In IEE Proceedings of the Computer and Digital Techniques, pp. 113-118, May 2001.
[12] M. Slater, “AMD’s K5 Designed to Outrun Pentium.” Microprocessor Report, volume 8, number 14, Oct. 1994.
[13] L. Gwennap, “Intel’s P6 Uses Decoupled Superscalar Design.” Microprocessor Report, volume 9, number 2, Feb. 1995.
[14] AMD Corporation, “Software Optimization Guide for AMD Athlon 64 and AMD Opteron.” Technical Document, March 2004.
[15] G. Hinton, D. Sager, M. Upton, D. Dogs, D. Carmean, A. Kyker and P. Roussel “The Microarchitecture of Pentium 4 Processor.” Intel Technology Journal Q1, 2004.
[16] S. J. E. Wilton and N. P. Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches.” WRL Technical Report 93/5, Digital Equipment Corp., July 1994
[17] J. L. Hennessy and D. A. Patterson, “Computer Architecture: A Quantitative Approach. third edition” Morgan Kaufmann Publishers, 2003.
[18] Standard Performance Evaluation Corporation, http://www.spec.org/spec/contact.html
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code