Responsive image
博碩士論文 etd-1027114-184928 詳細資訊
Title page for etd-1027114-184928
論文名稱
Title
支援有效快取空間分享與使用之多用途快取記憶體架構
A Versatile Cache Architecture for Better Utilization of the Cache Space
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
151
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-10-24
繳交日期
Date of Submission
2014-11-27
關鍵字
Keywords
快取記憶體、追蹤資料壓縮、追蹤資料儲存區、多用途快取記憶體、可重組態快取記憶體、快取標籤陣列、草稿記憶體
Trace Compression, Trace Storage, Multi-Role Cache, Cache Tag Arrays, Reconfigurable Cache, SPM, Cache
統計
Statistics
本論文已被瀏覽 5670 次,被下載 235
The thesis/dissertation has been browsed 5670 times, has been downloaded 235 times.
中文摘要
隨著半導體製程的進步,更多的電晶體數目可以被整合至單一晶片中,因此現今的 處理器其快取記憶體容量也隨之不斷加以因應未來更加多樣化與複雜的應用程式其記 憶體存取需求。雖然較大的快取記憶體容量對於不同的應用程式負載而言,大多數可 帶來效能提昇的好處。然而對於快取記憶體使用量低的應用程式負載而言,這樣的設 計是較無效率且浪費的,因為就算是增加快取記憶體的容量,其所能提昇的效能也是 有限的;因此結果就是使用率較低的快取記憶體不斷的帶來能量消耗,但卻無助於整 體效能的提昇。雖然目前已有許多文獻提出方法以根據應用程式負載來進行快取記憶 體的重組態,然而這些方法都僅考量快取記憶體的資料陣列,但快取記憶體的標籤陣 列卻都未被有效率的使用。因此本論文提出一種多用途快取記憶體架構,以將使用率 低的快取記憶體分享給晶片中的其它元件,進一步擴充快取記憶體的功能性。本論文 共提出三種重組態的方式;首先,我們利用指令快取記憶體的特性來進行程式追蹤資 料的壓縮。其次,資料快取記憶體的一部分可被分割出來儲存追蹤資料,以提供處理 器的監測及除錯支援。最後,原先未被使用到的快取記憶體標籤陣列也可被利用來提 供較多的草稿記憶體容量。此多用途快取記憶體架構同時也已經與一般用途處理器以 及三維圖形繪圖處理器系統晶片進行整合,並於晶片製作的各階層中進行軟硬體整合 及驗證以證明其可行性與效果。實驗結果顯示此多用途快取記憶體架構僅帶來少量的 額外硬體成本,同時其所需要的電路及快取記憶體修改並不會降低處理器的運作頻率 以及效能。因此本論文所提出的多用途快取記憶體架構對於開發多樣性的快取記憶體 功能是相當可行的方案,同時快取記憶體中的資料陣列及標籤陣列於重組態時皆都被 有效的開發與使用。
Abstract
Larger on-chip cache design is a clear design trend for general purpose systems to accommodate different characteristics in ever diversifying developments and applications of modern SoC’s. While larger caches are effective for a wide range of the conventional workloads, whereas such design philosophy is inadequate for workloads that benefit little from large caches. Its consequence is that the underutilized cache space consumes power constantly without any contribution. Though many research efforts have been attempted to build some flexibility in the cache to adapt the behavior of different applications, most of these studies take only the cache data arrays into account but the cache tag arrays are not considered. In this thesis, we propose a versatile cache architecture on which the underutilized portion of cache can be organized in different ways other than the conventional caching. First, the instruction cache is reused to perform real time program trace compression without incurring any additional cache misses. Second, the data cache can be configured as a trace bufferer for monitoring and debugging support and is protected from regular cache operations. Finally, the unused cache tag arrays can be transformed as the extension of SPM while the corresponding cache ways are configured as other types of functional units (SPM or buffers/lookup tables). The integration of the proposed versatile cache architecture with an academic ARM general purpose processor, and a programmable shader 3D graphics (3DG) SoC has been accomplished at RTL, FPGA, and chip levels to prove its feasibility and effectiveness. The results show that the hardware overhead is very minor, TC cache and DT cache have only 3.652K gates and 2.383K gates, only 0.123% and 0.081% overhead respectively to a modern SoC. In addition, the required support circuit and cache modification do not impair the global critical path delay. Therefore, the proposed approaches are highly feasible solutions for exploring a diversity of cache functionalities. Furthermore, both the cache data and tag arrays are taken into consideration for the efficient cache utilization.
目次 Table of Contents
論文審定書 i
論文聲明書 ii
Acknowledgments iv
摘要 v
Abstract vi
List of Figures xi
List of Tables xiv
Chapter 1 Introduction 1
1.1 Background: Plenty of On-Chip Memory Capacity . . . . . . . . . . . . . . . 1
1.2 Motivation..................................... 2
1.3 Contributions of theThesis............................ 4
1.4 Organization of theThesis ............................ 6
Chapter 2 Related Works 8
2.1 Innovative Cache Design for Better Cache Utilization . . . . . . . . . . . . . . 8
2.1.1 Cache Resizing for Power Reduction .................. 9
2.1.2 Efficient Cache Architecture for Cache Miss Reduction . . . . . . . . . 14
2.1.3 Cache Sharing with Other Activities................... 16
2.1.3.1 Cache Sharing among Processors Cores . . . . . . . . . . . 17
2.1.3.2 Cache Space Used for Other Purposes . . . . . . . . . . . . 19
2.2 Innovative Tag Organization ........................... 24
2.2.1 Reduction of Tag Arrays for Power/Area Efficiency . . . . . . . . . . . 24
2.2.2 Free Tag Utilization ........................... 26
2.3 Trace-Based Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Hardware-Based Program Trace Compression Techniques . . . . . . . 28
2.3.2 Trace Buffer Based Approaches . . . . . . . . . . . . . . . . . . . . . 29
2.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 3 Thesis Statement and Design Overview 33
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Target System Architecture and Research Goals . . . . . . . . . . . . . . . . . 34
3.3 Overview of Design Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.1 Cache as Program Trace Compressor . . . . . . . . . . . . . . . . . . 35
3.3.2 Cache as Trace Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Free Tag as SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 4 Versatile Instruction Cache (TC Cache) : as a Trace Compressor 40
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Features of Trace-Compression Instruction Cache . . . . . . . . . . . . . . . . 41
4.3 Trace-Compression Instruction Cache Architecture . . . . . . . . . . . . . . . 42
4.3.1 Branch/Target Identifier . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.2 Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.3 Trace Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Data Paths and Operations of Bypass and On-line Mode . . . . . . . . . . . . . 46
4.4.1 The Bypass Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.2 The On-Line Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.3 Comparison of the Bypass Mode and the On-Line Mode . . . . . . . . 51
4.4.4 Extension to Data Trace . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Decompression for the Compressed Program Trace . . . . . . . . . . . . . . . 53
4.5.1 Decompression Flow of the Bypass Mode . . . . . . . . . . . . . . . . 54
4.5.2 Decompression Flow of the On-Line Mode . . . . . . . . . . . . . . . 55
4.6 Limitation of the TC Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 5 Versatile Data Cache (DT Cache) : as a Trace Buffer 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Features and Feasibility of Data/Trace Cache . . . . . . . . . . . . . . . . . . 62
5.2.1 Features of Data/Trace Cache . . . . . . . . . . . . . . . . . . . . . . 62
5.2.2 Feasibility of Data/Trace Cache . . . . . . . . . . . . . . . . . . . . . 63
5.3 Data/Trace Cache Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.1 Data Cache Modification for Trace Buffer . . . . . . . . . . . . . . . . 65
5.3.1.1 Line Index Calculator . . . . . . . . . . . . . . . . . . . . . 66
5.3.1.2 D/T Configuration Logic . . . . . . . . . . . . . . . . . . . 67
5.3.1.3 Trace Protection Logic . . . . . . . . . . . . . . . . . . . . 68
5.3.1.4 Trace Dump Logic . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.2 Static DT Cache Configuration . . . . . . . . . . . . . . . . . . . . . . 70
5.3.2.1 Cache Size Sensitive Partitioning . . . . . . . . . . . . . . . 70
5.3.2.2 Trace-Oriented Partitioning . . . . . . . . . . . . . . . . . . 71
5.4 Extension for Dynamic Reconfiguration . . . . . . . . . . . . . . . . . . . . . 71
5.4.1 Post-Triggering (Post-T) Trace . . . . . . . . . . . . . . . . . . . . . . 72
5.4.1.1 Victim Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4.1.2 HW Assisted TB Allocator . . . . . . . . . . . . . . . . . . 72
5.4.2 Pre-Triggering (Pre-T) Trace . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.3 Improvement for Trace Dump . . . . . . . . . . . . . . . . . . . . . . 74
Chapter 6 Versatile Cache Tag Arrays (Tag SPM) : Free Tag Arrays as the SPM 77
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Features of Tag SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 Tag SPM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.1 Required Modification for Tag SPM . . . . . . . . . . . . . . . . . . . 81
6.3.2 Data/Tag SPM Controller and Operations . . . . . . . . . . . . . . . . 82
6.4 Consideration of Tag Bit Width . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4.1 Normal Cache Operation . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4.2 Tag SPM Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Chapter 7 Experimental Results 88
7.1 Integration with an Academic ARM Compatible CPU . . . . . . . . . . . . . . 88
7.1.1 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 88
7.1.1.1 Individual Results of TC Cache, DT Cache, and Tag SPM . . 89
7.1.1.2 Integration Results of TC Cache, DT Cache, and Tag SPM . 96
7.1.2 TC Cache Effectiveness: Compression Quality . . . . . . . . . . . . . 98
7.1.2.1 Program Trace Compression Ratio . . . . . . . . . . . . . . 98
7.1.2.2 Comparison with Related Trace Compression Techniques . . 102
7.1.3 DT Cache Application: MiBench Tracing . . . . . . . . . . . . . . . . 104
7.1.3.1 Trace Quality vs. Cache Performance . . . . . . . . . . . . . 104
7.1.3.2 Comparison with Related Cache-Enhancement Works . . . . 109
7.1.4 Tag SPM Effectiveness: Capacity Reclaiming . . . . . . . . . . . . . . 110
7.2 Deployment in SoC Chip: the 3D Graphics SoC Chip . . . . . . . . . . . . . . 112
7.2.1 FPGA Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.2.2 Prototype Chip Fabrication . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter 8 Considerations of Versatile Cache Architecture in Multi-Core and Higher
Level Cache Hierarchy (L2/L3 Cache) 118
8.1 Considerations of TC (Trace-Compression) Cache . . . . . . . . . . . . . . . . 118
8.1.1 TC Cache Applied to Multi-Core SoCs . . . . . . . . . . . . . . . . . 118
8.1.2 TC Cache Applied to Higher Level Cache Hierarchy (L2/L3 Cache) . . 120
8.1.3 TC Cache Applied to Superscalar Architecture . . . . . . . . . . . . . 120
8.2 Considerations of DT (Data/Trace) Cache . . . . . . . . . . . . . . . . . . . . 121
8.3 Considerations of Tag SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Chapter 9 Conclusions and Future Work 122
9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Bibliography 126
參考文獻 References
[1] A. L. Shimpi, “Nehalem Part 3: The Cache Debate, LGA-1156 and the 32nm Future,”
tech. rep., Anandtech, http://www.anandtech.com/show/2671, 2008.
[2] M. K. Qureshi and Y. N. Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-
Performance, Runtime Mechanism to Partition Shared Caches,” in Proc. IEEE/ACM Int.
Symp. on Microarchitecture, pp. 423–432, Dec. 2006.
[3] M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar, “Reducing Leakage
in a High-Performance Deep-Submicron Instruction Cache,” IEEE Trans. VLSI Syst.,
vol. 9(1), pp. 77–89, Feb. 2001.
[4] H. Dybdahl and P. Stenstrom, “An Adaptive Shared/Private NUCA Cache Partitioning
Scheme for Chip Multiprocessors,” in Proc. IEEE Int. Symp. on High Performance Computer
Architecture, pp. 2–12, Feb. 2007.
[5] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou, “An Energy-Efficient
Adaptive Hybrid Cache,” in Proc. Int. Symp. on Low Power Electronics and Design,
pp. 67–72, Aug. 2011.
[6] T. Kluter, P. Brisk, P. Ienne, and E. Charbon, “Way Stealing: Cache-assisted Automatic
Instruction Set Extensions,” in Proc. ACM/IEEE Design Automation Conference, pp. 31–
36, July 2009.
[7] W. Stallings, Computer Organization and Architecture: Designing for Performance,
Eighth Edition. Prentice Hall, 2010.
[8] P. Machanick, “Approaches to Addressing the MemoryWall,” tech. rep., School of IT and
Electrical Engineering, University of Queensland, 2002.
[9] P. Ranganathan, S. Adve, and N. P. Jouppi, “Reconfigurable Caches and their Application
to Media Processing,” in Proc. Int. Symp. on Computer Architecture, pp. 214–224, Jun.
2000.
[10] M. Paul and P. Petrov, “Dynamic and Application-Driven I-Cache Partitioning for Low-
Power Embedded Multitasking,” in Proc. IEEE Int. Symp. on Application Specific Processors,
pp. 101–106, July 2009.
[11] C. Piguet, Low-Power Processors and Systems on Chips. CRC Press, September 2005.
[12] D. H. Albonesi, “Selective Cache Ways: On-Demand Cache Resource Allocation,” in
Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 248–259, Nov. 1999.
[13] M. Zhang and K. Asanovic, “Fine-Grain CAM-Tag Cache Resizing Using Miss Tags,” in
Proc. Int. Symp. on Low Power Electronics and Design, pp. 130–135, 2002.
[14] S. Banerjee, S. G, and S. K. Nandy, “Program Phase Directed Dynamic Cache Way Reconfiguration
for Power Efficiency,” in Proc. Design Automation Conference in Asia and
South Pacific, pp. 884–889, Jan. 2007.
[15] J. Chang and G. S. Sohi, “Cooperative Caching for Chip Multiprocessors,” in Proc. Int.
Symp. on Computer Architecture, pp. 264–276, July 2006.
[16] C. Lim and G. T. Byrd, “Exploiting Producer Patterns and L2 Cache for Timely
Dependence-Based Prefetching,” in Proc. IEEE Int. Conf. on Computer Design, pp. 685–
692, Oct. 2008.
[17] G. Kalokerinos, V. Papaefstathiou, G. Nikiforos, S. Kavadias, M. Katevenis, D. Pnevmatikatos,
and X. Yang, “FPGA Implementation of a Configurable Cache/Scratchpad
Memory with Virtualized User-Level RDMA Capability,” in Proc. Int. Symp. on Systems,
Architectures, Modeling, and Simulation, pp. 149–156, July 2009.
[18] Z. Ge, W. F. Wong, and H. B. Lim, “DRIM: A Low Power Dynamically Reconfigurable
Instruction Memory Hierarchy for Embedded Systems,” in Proc. Design, Automation &
Test in Europe Conference & Exhibition, pp. 1–6, Apr. 2007.
[19] C. Zhang, F. Vahid, and W. Najjar, “A Highly Configurable Cache Architecture for Embedded
Systems,” in Proc. Int. Symp. on Computer Architecture, pp. 136–146, June 2003.
[20] C. Zhang, F. Vahid, and R. Lysecky, “A Self-Tuning Cache Architecture for Embedded
Systems,” in Proc. Design, Automation & Test in Europe Conference & Exhibition,
pp. 142–147, Feb. 2004.
[21] M. Peng, J. Sun, and Y.Wang, “A Phase-Based Self-Tuning Algorithm for Reconfigurable
Cache,” in Proc. Int. Conf. on Digital Society, pp. 27–32, Jan. 2007.
[22] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to
Reduce Cache Leakage Power,” in Proc. Int. Symp. on Computer Architecture, pp. 240–
251, Jun. 2001.
[23] H. Zhou, M. C. Toburen, E. Rotenberg, and T. M. Conte, “Adaptive Mode Control: A
Static-Power-Efficient Cache Design,” ACM Transactions on Embedded Computing Systems,
vol. 2(3), pp. 347–372, Aug. 2003.
[24] Y.-T. Chen, J. Cong, H. Huang, B. Liu, C. Liu, M. Potkonjak, and G. Reinman, “Dynamically
reconfigurable hybrid cache: An energy-efficient last-level cache design,” in Proc.
Design, Automation & Test in Europe Conference & Exhibition, pp. 45–50, Mar. 2012.
[25] C. Zhang, “Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches through
Programmable Decoders,” in Proc. Int. Symp. on Computer Architecture, pp. 155–166,
2006.
[26] M. K. Qureshi, D. Thompson, and Y. N. Patt, “The V-Way Cache : Demand-Based Associativity
via Global Replacement,” in Proc. Int. Symp. on Computer Architecture, pp. 544–
555, Jun. 2005.
[27] G. Bournoutian and A. Orailoglu, “Miss Reduction in Embedded Processors Through Dynamic,
Power-Friendly Cache Design,” in Proc. Design Automation Conference, pp. 304–
309, Jun. 2008.
[28] D. Rolan, B. B. Fraguela, and R. Doallo, “Adaptive Line Placement with the Set Balancing
Cache,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 529–540, Dec.
2009.
[29] M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer, “Adaptive Insertion Policies
for High Performance Caching,” in Proc. Int. Symp. on Computer Architecture, pp. 381–
391, Jun. 2007.
[30] M. Chaudhuri, “Pseudo-LIFO: The Foundation of a New Family of Replacement Policies
for Last-level Caches,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 401–412,
Dec. 2009.
[31] D. Zhan, H. Jiang, and S. C. Seth, “STEM: Spatiotemporal Management of Capacity
for Intra-Core Last Level Caches,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture,
pp. 163–174, 2010.
[32] Z. Chishti, M. D. Powell, and T. N. Vijaykumar, “Optimizing Replication, Communication,
and Capacity Allocation in CMPs,” in Proc. Int. Symp. on Computer Architecture,
pp. 357–368, Jun. 2005.
[33] H. Lee, S. Cho, and B. R. Childers, “StimulusCache: Boosting Performance of Chip
Multiprocessors with Excess Cache,” in Proc. Int. Symp. on High Performance Computer
Architecture, pp. 1–12, Jan. 2010.
[34] M. Kondo, H. Okawara, H. Nakamura, and T. Boku, “SCIMA: Software Controlled Integrated
Memory Architecture for High Performance Computing,” in Proc. Int. Conf. on
Computer Design, pp. 105–111, Sep. 2000.
[35] Z. Ge, H. B. Lim, and W. F. Wong, “A Reconfigurable Instruction Memory Hierarchy for
Embedded Systems,” in Proc. Int. Conf. on Field Programmable Logic and Applications,
pp. 7–12, Aug. 2005.
[36] H. Kim, A. K. Somani, and A. Tyagi, “A Reconfigurable Multifunction Computing Cache
Architecture,” IEEE Trans. VLSI Syst., vol. 9(4), pp. 509–523, Aug. 2001.
[37] ARM Ltd., Cortex-A5 Technical Reference Manual, Chapter 7.6.1: Data Cache Tag and
Data Encoding, Sep. 2010.
[38] P. Petrov and A. Orailoglu, “Energy Frugal Tags in Reprogrammable I-caches for
Application-specific Embedded Processors,” in Proc. Int. Symp. on Hardware/Software
Codesign, pp. 181–186, May 2002.
[39] P. Petrov, D. Tracy, and A. Orailoglu, “Energy-Efficient Physically Tagged Caches for
Embedded Processors with Virtual Memory,” in Proc. Design Automation Conference,
pp. 17–22, Jun. 2005.
[40] M. Loghi, P. Azzoni, and M. Poncino, “Tag Overflow Buffering: An Energy-Efficient
Cache Architecture,” in Proc. Design, Automation & Test in Europe Conference & Exhibition,
pp. 520–525, Mar. 2005.
[41] J. W. Kwak and Y. T. Jeon, “Compressed Tag Architecture for Low-Power Embedded
Cache Systems,” J. Systems Architecture, vol. 56(9), pp. 419–428, Sept. 2010.
[42] J. Lee, S. Hong, and S. Kim, “TLB Index-based Tagging for Cache Energy Reduction,”
in Proc. Int. Symp. on Low Power Electronics and Design, pp. 85–90, Aug. 2011.
[43] G. Kalokerinos, V. Papaefstathiou, G. Nikiforos, S. Kavadias, M. Katevenis, D. Pnevmatikatos,
and X. Yang, “Prototyping a Configurable Cache/Scratchpad Memory with
Virtualized User-Level RDMA Capability,” Transactions on High Performance and Embedded
Architecture and Compilation, vol. 5(3), pp. 75–95, Aug. 2010.
[44] IEEE-ISTO Nexus 5001 Forum.
[45] Tensilica Inc., http://www.tensilica.com/products/xtensa/xtensalx/traceLX.htm, Xtensa
Processor Real-Time Trace.
[46] Freescale Semiconductor Inc., MPC565 Reference Manual, Chapter 22, Development
Support, Nov. 2005.
[47] ARM Ltd., http://www.arm.com/products/solutions/ETM.html, Embedded Trace Macrocell
Architecture.
[48] M.-C. Hsieh and C.-T. Huang, “An Embedded Infrastructure of Debug and Trace Interface
for the DSP Platform,” in Proc. IEEE Design Automation Conference, pp. 866–871, Jun.
2008.
[49] A. Hopkins and K. McDonald-Maier, “Debug Support Strategy for Systems-On-Chips
with Multiple Processor Cores,” IEEE Trans. Comput., vol. 55(2), pp. 174–184, Feb.
2006.
[50] J.-M. Chen and C.-H. Wei, “VLSI Design for High-Speed LZ-Based Data Compression,”
Proc. IEE Circuits, Devices, Syst., vol. 146(5), pp. 268–278, Oct. 1999.
[51] M.-B. Lin, J.-F. Lee, and G. E. Jan, “A Lossless Data Compression and Decompression
Algorithm and Its Hardware Architecture,” IEEE Trans. VLSI Syst., vol. 14(9), pp. 925–
936, Sept. 2006.
[52] J. Nunez and S. Jones, “Gbit/s Lossless Data Compression Hardware,” IEEE Trans. VLSI
Syst., vol. 11(3), pp. 499–510, Jun. 2003.
[53] S. Kasera and N. Jain, “A Survey of Lossless Data Compression Techniques,” tech. rep.,
2004.
[54] C.-F. Kao, S.-M. Huang, and I.-J. Huang, “A Hardware Approach to Real-Time Program
Trace Compression for Embedded Processors,” IEEE Trans. Circuits Syst. I, vol. 54(3),
pp. 530–543, Mar. 2007.
[55] V. Uzelac and A. Milenkovic, “Hardware-Based Data Value and Address Trace Filtering
Techniques,” in Proc. Int. Conf. on Compilers, Architectures and Synthesis for Embedded
Systems, pp. 117–126, Oct. 2010.
[56] F.-C. Yang, C.-L. Chiang, and I.-J. Huang, “A Reverse-Encoding-Based On-Chip Bus
Tracer for Efficient Circular-Buffer Utilization,” IEEE Trans. VLSI Syst., vol. 18(5),
pp. 732–741, May 2010.
[57] F.-C. Yang, Y.-T. Lin, C.-F. Kao, and I.-J. Huang, “An On-Chip AHB Bus Tracer With
Real-Time Compression and Dynamic Multiresolution Supports for SoC,” IEEE Trans.
VLSI Syst., vol. 19(4), pp. 571–584, Apr. 2011.
[58] S. Narayanasamy, G. Pokam, and B. Calder, “BugNet: Continuously Recording Program
Execution for Deterministic Replay Debugging,” in Proc. Int. Symp. on Computer Architecture,
pp. 284–295, June 2005.
[59] S.-B. Park and S. Mitra, “IFRA: Instruction Footprint Recording and Analysis for
Post-Silicon Bug Localization in Processors,” in Proc. Design Automation Conference,
pp. 373–378, June 2008.
[60] E. A. Daoud and N. Nicolici, “Real-Time Lossless Compression for Silicon Debug,” IEEE
Trans. Computer-Aided Design, vol. 28(9), pp. 1387–1400, 2009.
[61] E. A. Daoud and N. Nicolici, “On Using Lossy Compression for Repeatable Experiments
during Silicon Debug,” IEEE Trans. Comput., vol. 60(7), pp. 937–950, 2011.
[62] J.-S. Yang and N. A. Touba, “Expanding Trace Buffer ObservationWindow for In-System
Silicon Debug through Selective Capture,” in Proc. IEEE VLSI Test Symposium, pp. 345–
351, Apr. 2008.
[63] S. Prabhakar, R. Sethuram, and M. S. Hsiao, “Trace Buffer-Based Silicon Debug with
Lossless Compression,” in Proc. Int. Conf. on VLSI Design, pp. 358–363, Jan. 2011.
[64] H. F. Ko and N. Nicolici, “Algorithms for State Restoration and Trace-Signal Selection
for Data Acquisition in Silicon Debug,” IEEE Trans. Computer-Aided Design, vol. 28(2),
pp. 285–297, Feb. 2009.
[65] X. Liu and Q. Xu, “Trace signal Selection for Visibility Enhancement in Post-Silicon
Validation,” in Proc. Design, Automation & Test in Europe Conference & Exhibition,
pp. 1338–1343, Apr. 2009.
[66] H.-M. Kyung, G.-H. Park, J.-W. Kwak, T.-J. Kim, and S.-B. Park, “Design and Implementation
of Performance Analysis Unit (PAU) for AXI-Based Multi-Core System on Chip
(SOC),” J. Microprocessors & Microsystems, vol. 34(2), pp. 102–116, Mar. 2010.
[67] ARM Ltd., ARM1156T2-S Technical Reference Manual, May 2007.
[68] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE
Trans. Inform. Theory, vol. 23(3), pp. 337–343, May 1977.
[69] W. J. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. C. Harting, V. Parikh, J. Park, and
D. Sheffield, “Efficient Embedded Computing,” IEEE Computer, vol. 41(7), pp. 27–32,
July 2008.
[70] S. Basumallick and K. Nilsen, “Cache Issues in Real-Time Systems,” in Proc. ACM SIGPLAN
Workshop on Language, Compiler and Tool Support for Real-Time Systems, May
1994.
[71] ARM Ltd., Embedded Trace Macrocell ETMv1.0 to ETMv3.4 Architecture Specification,
Chapter 4.6, Data Trace, July 2007.
[72] J. Kin, M. Gupta, and W. H. Mangione-Smith, “The Filter Cache: An Energy Efficient
Memory Structure,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 184–193,
1997.
[73] L. H. Lee, B. Moyer, and J. Arends, “Instruction Fetch Energy Reduction using Loop
Caches for Embedded Applications with Small Tight Loops,” in Proc. Int. Symp. on Low
Power Electronics and Design, pp. 267–269, Aug. 1999.
[74] C.-H. Lai, F.-C. Yang, and I.-J. Huang, “A Trace-Capable Instruction Cache for
Cost-Efficient Real-Time Program Trace Compression in SoC,” IEEE Trans. Comput.,
vol. 60(12), pp. 1665–1677, Dec. 2011.
[75] J. Montanaro, R. Witek, and K. Anne, “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,”
IEEE J. Solid-State Circuits, vol. 31, pp. 1703–1714, Nov. 1996.
[76] W. Zhang and Y. Ding, “Hybrid SPM-Cache Architectures to Achieve High Time Predictability
and Performance,” in Proc. Int. Conf. on Application-Specific Systems, Architectures
and Processors, pp. 297–304, June 2013.
[77] H. Thane, D. Sundmark, J. Huselius, and A. Pettersson, “Replay Debugging of Real-
Time Systems Using Time Machines,” in Proc. Int. Parallel and Distributed Processing
Symposium, Apr. 2003.
[78] H. Cook, K. Asanovi´c, and D. A. Patterson, “Virtual Local Stores: Enabling Software-
Managed Memory Hierarchies in Mainstream Computing Environments,” technical report
no. ucb/eecs-2009-131, 2009.
[79] ARM Ltd., ARM Cortex-A53 MPCore Processor Technical Reference Manual, Chapter
6.7.1: Data Cache Tag and Data Encoding, July 2014.
[80] Gaisler Research, LEON2 Processor User’s Manual (Version 1.0.30), July 2005.
[81] M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown, “MiBench:
A Free, Commercially Representative Embedded Benchmark Suite,” in Proc. IEEE Int.
Workshop on Workload Characterization, pp. 3–14, Dec. 2001.
[82] C. MacNamee and D. Heffernan, “Emerging On-Chip Debugging Techniques for Real-
Time Embedded Systems,” J. Computing & Control Engineering, vol. 11(6), pp. 295–303,
Dec. 2000.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code