Responsive image
博碩士論文 etd-0803104-231024 詳細資訊
Title page for etd-0803104-231024
論文名稱
Title
增進效能的超執行緒指令排程機制設計與實現
Design of instructions scheduling Mechanism in Hyper-Threading Architecture for Improving Performance
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
58
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2004-06-26
繳交日期
Date of Submission
2004-08-03
關鍵字
Keywords
平行處理、指令排程
ILP, scheduling
統計
Statistics
本論文已被瀏覽 5735 次,被下載 3611
The thesis/dissertation has been browsed 5735 times, has been downloaded 3611 times.
中文摘要
在微處理器系統中,指令平行處理是主要影響系統效能的關鍵,當指令排序機制設計複雜時以提高ILP,相對的也增加很多硬體的花費。在現今的處理器架構中,為了不使指令排序機制的硬體花費過大,採用了多重佇列的方式去派送指令,此種排程方法可能因多個相鄰且相依的指令阻擋了在排序佇列後的可執行指令去使用執行單元,使得執行單元的使用率並不飽和,對於可以同時執行多個執行緒的架構來說,排程佇列內的指令平行度高於只有單一個執行緒的指令平行
度,若能減少派送出相鄰且相依的指令機率,則可提高執行單元的使用率,於是我們提出了一個稱做priority-scheduling buffer 排程的機制去取代原本的各執行單元的排程佇列,此機制利用堆積在排程佇列內的指令相依性虛擬的分出多個排程佇列,使得指令可從不同的虛擬排程佇列派送,減少連續派送相依的指令,以提高執行單元的使用率。根據SPEC CINT2000 模擬的結果顯示,以英特爾Pentium 4 作為模擬的基礎架構,當排序佇列可容納的指令越多時,此機制在五個執行緒同時執行的時候可比原來的排程佇列增加7.14%的效能
Abstract
In the microprocessor system, exploiting ILP is an important key for improving performance. As instructions scheduling mechanism is designed complicated for employing ILP more efficient, the hardware cost will become larger in opposition. In the nowadays processor, they adopt the multiple scheduler queues to issue instructions so that the hardware cost will be not larger. But in this scheduling mechanism, it could successive issue the instructions that have dependence. This situation can makes that the utilization of execution units is not saturated. In the hyperthreading architecture, the instructions in the scheduler queue have high degree of parallelism. If we can
decrease the probability of situation that successive issue the instructions that have dependence, the utilization of execution units will heighten. In this paper, we propose
the scheduling mechanism called as priority-scheduling buffer to replace the original scheduler queues. The scheduling mechanism will divide an original scheduler queue
into multiple virtual scheduler queues according to the dependence of instructions. the instructions that have dependence will dispatch into the same virtual scheduler queue. The instructions can be issued from the ahead of different virtual scheduler queues. This can reduce the probability that successive issues the instructions that have dependence. According to result of simulation in SPEC CINT2000, we adopt the Intel Pentium 4 for basic architecture of our simulation. In the five threads executing simultaneously, the performance will increase 7.14% average that compares with the original scheduler queue.
目次 Table of Contents
摘要....2
ABSTRACT....3
Contents ....5
List of Figures ....7
List of Tables....8
Chapter 1 Introduction ....9
1.1 The Problems of Instruction Window....11
1.2 Motivations and Purposes ....12
1.3 Organization of This Thesis ....12
Chapter 2 Survey....14
2.1 Summary of MTA....14
2.1.1 MTA....14
2.1.2 Hyper-Threading Technology ....16
2.2 Relative Research and Technology ....18
2.2.1 First-Use Issue Logic ....19
2.2.2 Palacharla’s Dependence-Based FIFO schedulers....20
2.2.3 LeBeck’s WIB scheduler ....21
Chapter 3 Design of the Priority- Scheduling Buffer ....23
3.1 The Concept of Priority-Scheduling Buffer....23
3.1.1 Dynamic Adjusting Virtual Scheduler Queue....23
3.2 The Architecture of Priority-Scheduling Buffer ....27
3.2.1 The Tag of Dependence ....27
3.2.2 The Procedure of Operations in our Scheduler....31
3.2.3 Deadlock Free ....33
3.3 The Hardware of Priority-Scheduling Buffer ....34
Chapter 4 Simulation and Analysis.....37
4.1 Simulation Environment ....37
4.1.1 Simulator....37
4.1.2 Benchmark Programs....42
4.2 The Result of Simulation and Analysis....43
4.2.1 The Effect of Buffer Size and Number of Threads ....44
4.2.2 The Effect of Retirement....46
4.2.3 The Effect of Number of ALU....48
4.3 Compare with the other Scheduling Strategies ....49
4.4 Verification....51
Chapter 5 Conclusion....53
Reference ....55
參考文獻 References
[1] S. Palacharla, N.P. Jouppi, J.E. Smith, “Complexity-Effective Superscalar Processors”, in Proc of the 24th. Int. Symp. on Comp. Architecture, 1997, pp 1-13.

[2] D. Folegnani, A. Gonzalez, “Reducing Power Consumption of the Issue Logic”, in the Workshop on Complexity-Effective Design, Vancouver, June 2000.

[3] S. Önder, R. Gipta, “Superscalar Execution with Dynamic Data Forwarding”, in Proc. Int. Conference on Parallel Architectures and Compilation Techniques,
pp.130-135, 1998.

[4] V.V. Zyuban, “Inherently Lower-Power High-Performance Supersalar Architectures”, PhD. Thesis, Dept. of Computer Science and Engineering, University of Notre Dame, Indiana, January 2000. 320

[5] J.E. Smith, G.S. Sohi, “The Mircoarchitecture of Superscalar Processors”, in Proc. of the IEE, vol. 83, no.12, december 1995, pp. 1609-1624.

[6] Deborah T. Marr; Frank Binns; David L. Hill; Glenn Hinton; David A. koufaty; J. Alan Miller; Michael Upton; “ Hyper-Threading Technology Architecture and Microarchitecture” Intel Technology Journal Q1, 2002

[7] Koufaty, D.; Marr, D.T.; ” Hyperthreading technology in the netburst microarchitecture “ Micro, IEEE, Volume: 23, Issue: 2, March-April 2003 Pages:56 – 65

[8] Glenn Hinton; Dave Sager; Mike Upton; Darrell Boggs; Doug Carmean; Alan Kyker; Patrice Roussel; “The Microarchitecture of the Pentium 4 Processor” Intel
Technology Journal Q1, 2001

[9] J.LI. Cruz, A. Gonz~ilez, M. Valero, N. Topham, "Multiple-Banked Register
File Architectures" in Proc. of the 27nd lnt. Syrup. on Computer Architecture, 2000.

[10] Ramon Canal, Antonio González; “A low-complexity issue logic” Proceedings of the 14th international conference on Supercomputing May 2000

[11] S. Palacharla, N.E Jouppi, and J.E. Smith, "Complexity-Effective Superscalar Processors", in Proc of the 24th. Int. Symp. on Comp. Architecture, pp 1-13, 1997.

[12] Dan Ernst, Andrew Hamel, and Todd Austin; “Cyclone: A Broadcast-Free Dynamic Instruction Scheduler with Selective Replay”; ACM SIGARCH Computer Architecture News , Proceedings of the 30th annual international
symposium on Computer architecture, Volume 31 Issue 2 ,May 2003

[13] Alvin R. Lebeck, Jinson Koppanalil, Tong Li, Jaidev Patwardhan, and Eric Rotenberg. “A Large, Fast Instruction Window for Tolerating Cache Misses, In Proceedings of the International Symposium on Computer Architecture”
57 (ISCA-29), May 2002.

[14] Ilhyun Kim and Mikko H. Lipasti ; “Macro-op Scheduling: Relaxing Scheduling Loop Constraints”; Proceedings of the 36th Annual IEEE/ACM International
Symposium on Microarchitecture ; December 2003

[15] Jared Stark _ Mary D. Brown _ Yale N. Patt ; “On Pipelining Dynamic Instruction Scheduling Logic”; Proceedings of the 33rd annual ACM/IEEE
international symposium on Microarchitecture ; December 2000

[16] David W. Wall; “Limits of Instruction-Level Parallelism”; Proceedings of the fourth international conference on Architectural support for programming
languages and operating systems, Volume 19 , 25 , 26 Issue 2 , Special Issue , 4 , April 1991

[17] Gonzalez, J., and A. Gonzalez. “ Limits of instruction Level parallelism with data
speculation”, Proc. of the VECPAR conf., 585-598, 1998

[18] M. D. Smith, M. Johnson, M. A. Horowitz ;” Limits on multiple instruction issue”; ACM SIGARCH Computer Architecture News , Proceedings of the third
international conference on Architectural support for programming languages and
operating systems, Volume 17 Issue 2; April 1989

[19] Haitham Akkary Ravi Rajwar Srikanth T. Srinivasan; “Checkpoint Processing
and Recovery: Towards Scalable Large Instruction Window Processors“;
58
Proceedings of the 36th Annual IEEE/ACM International Symposium on
Microarchitecture, December 2003
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code