國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,支援有效快取空間分享與使用之多用途快取記憶體架構,A Versatile Cache Architecture for Better Utilization of the Cache Space

論文名稱 Title	支援有效快取空間分享與使用之多用途快取記憶體架構 A Versatile Cache Architecture for Better Utilization of the Cache Space
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	103 學年度第 1 學期 The fall semester of Academic Year 103	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	151
研究生 Author	賴俊宏 Chun-Hung Lai
指導教授 Advisor	黃英哲 Ing-Jer Huang
召集委員 Convenor	楊佳玲 Chia-Lin Yang
口試委員 Advisory Committee	劉廣治, 楊馥璟, 吳奇峰, 蕭勝夫 Guang-Zhi Liu; Fu-Ching Yang; Chi-Feng Wu; Shen-Fu Hsiao
口試日期 Date of Exam	2014-10-24	繳交日期 Date of Submission	2014-11-27
關鍵字 Keywords	快取記憶體、追蹤資料壓縮、追蹤資料儲存區、多用途快取記憶體、可重組態快取記憶體、快取標籤陣列、草稿記憶體 Trace Compression, Trace Storage, Multi-Role Cache, Cache Tag Arrays, Reconfigurable Cache, SPM, Cache
統計 Statistics	本論文已被瀏覽 5670 次，被下載 235 次 The thesis/dissertation has been browsed 5670 times, has been downloaded 235 times.

中文摘要
隨著半導體製程的進步,更多的電晶體數目可以被整合至單一晶片中,因此現今的處理器其快取記憶體容量也隨之不斷加以因應未來更加多樣化與複雜的應用程式其記憶體存取需求。雖然較大的快取記憶體容量對於不同的應用程式負載而言,大多數可帶來效能提昇的好處。然而對於快取記憶體使用量低的應用程式負載而言,這樣的設計是較無效率且浪費的,因為就算是增加快取記憶體的容量,其所能提昇的效能也是有限的;因此結果就是使用率較低的快取記憶體不斷的帶來能量消耗,但卻無助於整體效能的提昇。雖然目前已有許多文獻提出方法以根據應用程式負載來進行快取記憶體的重組態,然而這些方法都僅考量快取記憶體的資料陣列,但快取記憶體的標籤陣列卻都未被有效率的使用。因此本論文提出一種多用途快取記憶體架構,以將使用率低的快取記憶體分享給晶片中的其它元件,進一步擴充快取記憶體的功能性。本論文共提出三種重組態的方式;首先,我們利用指令快取記憶體的特性來進行程式追蹤資料的壓縮。其次,資料快取記憶體的一部分可被分割出來儲存追蹤資料,以提供處理器的監測及除錯支援。最後,原先未被使用到的快取記憶體標籤陣列也可被利用來提供較多的草稿記憶體容量。此多用途快取記憶體架構同時也已經與一般用途處理器以及三維圖形繪圖處理器系統晶片進行整合,並於晶片製作的各階層中進行軟硬體整合及驗證以證明其可行性與效果。實驗結果顯示此多用途快取記憶體架構僅帶來少量的額外硬體成本,同時其所需要的電路及快取記憶體修改並不會降低處理器的運作頻率以及效能。因此本論文所提出的多用途快取記憶體架構對於開發多樣性的快取記憶體功能是相當可行的方案,同時快取記憶體中的資料陣列及標籤陣列於重組態時皆都被有效的開發與使用。
Abstract
Larger on-chip cache design is a clear design trend for general purpose systems to accommodate different characteristics in ever diversifying developments and applications of modern SoC’s. While larger caches are effective for a wide range of the conventional workloads, whereas such design philosophy is inadequate for workloads that benefit little from large caches. Its consequence is that the underutilized cache space consumes power constantly without any contribution. Though many research efforts have been attempted to build some flexibility in the cache to adapt the behavior of different applications, most of these studies take only the cache data arrays into account but the cache tag arrays are not considered. In this thesis, we propose a versatile cache architecture on which the underutilized portion of cache can be organized in different ways other than the conventional caching. First, the instruction cache is reused to perform real time program trace compression without incurring any additional cache misses. Second, the data cache can be configured as a trace bufferer for monitoring and debugging support and is protected from regular cache operations. Finally, the unused cache tag arrays can be transformed as the extension of SPM while the corresponding cache ways are configured as other types of functional units (SPM or buffers/lookup tables). The integration of the proposed versatile cache architecture with an academic ARM general purpose processor, and a programmable shader 3D graphics (3DG) SoC has been accomplished at RTL, FPGA, and chip levels to prove its feasibility and effectiveness. The results show that the hardware overhead is very minor, TC cache and DT cache have only 3.652K gates and 2.383K gates, only 0.123% and 0.081% overhead respectively to a modern SoC. In addition, the required support circuit and cache modification do not impair the global critical path delay. Therefore, the proposed approaches are highly feasible solutions for exploring a diversity of cache functionalities. Furthermore, both the cache data and tag arrays are taken into consideration for the efficient cache utilization.

目次 Table of Contents
論文審定書 i 論文聲明書 ii Acknowledgments iv 摘要 v Abstract vi List of Figures xi List of Tables xiv Chapter 1 Introduction 1 1.1 Background: Plenty of On-Chip Memory Capacity . . . . . . . . . . . . . . . 1 1.2 Motivation..................................... 2 1.3 Contributions of theThesis............................ 4 1.4 Organization of theThesis ............................ 6 Chapter 2 Related Works 8 2.1 Innovative Cache Design for Better Cache Utilization . . . . . . . . . . . . . . 8 2.1.1 Cache Resizing for Power Reduction .................. 9 2.1.2 Efficient Cache Architecture for Cache Miss Reduction . . . . . . . . . 14 2.1.3 Cache Sharing with Other Activities................... 16 2.1.3.1 Cache Sharing among Processors Cores . . . . . . . . . . . 17 2.1.3.2 Cache Space Used for Other Purposes . . . . . . . . . . . . 19 2.2 Innovative Tag Organization ........................... 24 2.2.1 Reduction of Tag Arrays for Power/Area Efficiency . . . . . . . . . . . 24 2.2.2 Free Tag Utilization ........................... 26 2.3 Trace-Based Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.1 Hardware-Based Program Trace Compression Techniques . . . . . . . 28 2.3.2 Trace Buffer Based Approaches . . . . . . . . . . . . . . . . . . . . . 29 2.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 3 Thesis Statement and Design Overview 33 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Target System Architecture and Research Goals . . . . . . . . . . . . . . . . . 34 3.3 Overview of Design Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 Cache as Program Trace Compressor . . . . . . . . . . . . . . . . . . 35 3.3.2 Cache as Trace Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.3 Free Tag as SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Versatile Instruction Cache (TC Cache) : as a Trace Compressor 40 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Features of Trace-Compression Instruction Cache . . . . . . . . . . . . . . . . 41 4.3 Trace-Compression Instruction Cache Architecture . . . . . . . . . . . . . . . 42 4.3.1 Branch/Target Identifier . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.2 Cache Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.3 Trace Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Data Paths and Operations of Bypass and On-line Mode . . . . . . . . . . . . . 46 4.4.1 The Bypass Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4.2 The On-Line Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4.3 Comparison of the Bypass Mode and the On-Line Mode . . . . . . . . 51 4.4.4 Extension to Data Trace . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Decompression for the Compressed Program Trace . . . . . . . . . . . . . . . 53 4.5.1 Decompression Flow of the Bypass Mode . . . . . . . . . . . . . . . . 54 4.5.2 Decompression Flow of the On-Line Mode . . . . . . . . . . . . . . . 55 4.6 Limitation of the TC Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 5 Versatile Data Cache (DT Cache) : as a Trace Buffer 60 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2 Features and Feasibility of Data/Trace Cache . . . . . . . . . . . . . . . . . . 62 5.2.1 Features of Data/Trace Cache . . . . . . . . . . . . . . . . . . . . . . 62 5.2.2 Feasibility of Data/Trace Cache . . . . . . . . . . . . . . . . . . . . . 63 5.3 Data/Trace Cache Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3.1 Data Cache Modification for Trace Buffer . . . . . . . . . . . . . . . . 65 5.3.1.1 Line Index Calculator . . . . . . . . . . . . . . . . . . . . . 66 5.3.1.2 D/T Configuration Logic . . . . . . . . . . . . . . . . . . . 67 5.3.1.3 Trace Protection Logic . . . . . . . . . . . . . . . . . . . . 68 5.3.1.4 Trace Dump Logic . . . . . . . . . . . . . . . . . . . . . . . 68 5.3.2 Static DT Cache Configuration . . . . . . . . . . . . . . . . . . . . . . 70 5.3.2.1 Cache Size Sensitive Partitioning . . . . . . . . . . . . . . . 70 5.3.2.2 Trace-Oriented Partitioning . . . . . . . . . . . . . . . . . . 71 5.4 Extension for Dynamic Reconfiguration . . . . . . . . . . . . . . . . . . . . . 71 5.4.1 Post-Triggering (Post-T) Trace . . . . . . . . . . . . . . . . . . . . . . 72 5.4.1.1 Victim Tags . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4.1.2 HW Assisted TB Allocator . . . . . . . . . . . . . . . . . . 72 5.4.2 Pre-Triggering (Pre-T) Trace . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.3 Improvement for Trace Dump . . . . . . . . . . . . . . . . . . . . . . 74 Chapter 6 Versatile Cache Tag Arrays (Tag SPM) : Free Tag Arrays as the SPM 77 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.2 Features of Tag SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.3 Tag SPM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.3.1 Required Modification for Tag SPM . . . . . . . . . . . . . . . . . . . 81 6.3.2 Data/Tag SPM Controller and Operations . . . . . . . . . . . . . . . . 82 6.4 Consideration of Tag Bit Width . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.4.1 Normal Cache Operation . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.4.2 Tag SPM Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Chapter 7 Experimental Results 88 7.1 Integration with an Academic ARM Compatible CPU . . . . . . . . . . . . . . 88 7.1.1 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . 88 7.1.1.1 Individual Results of TC Cache, DT Cache, and Tag SPM . . 89 7.1.1.2 Integration Results of TC Cache, DT Cache, and Tag SPM . 96 7.1.2 TC Cache Effectiveness: Compression Quality . . . . . . . . . . . . . 98 7.1.2.1 Program Trace Compression Ratio . . . . . . . . . . . . . . 98 7.1.2.2 Comparison with Related Trace Compression Techniques . . 102 7.1.3 DT Cache Application: MiBench Tracing . . . . . . . . . . . . . . . . 104 7.1.3.1 Trace Quality vs. Cache Performance . . . . . . . . . . . . . 104 7.1.3.2 Comparison with Related Cache-Enhancement Works . . . . 109 7.1.4 Tag SPM Effectiveness: Capacity Reclaiming . . . . . . . . . . . . . . 110 7.2 Deployment in SoC Chip: the 3D Graphics SoC Chip . . . . . . . . . . . . . . 112 7.2.1 FPGA Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2.2 Prototype Chip Fabrication . . . . . . . . . . . . . . . . . . . . . . . . 115 Chapter 8 Considerations of Versatile Cache Architecture in Multi-Core and Higher Level Cache Hierarchy (L2/L3 Cache) 118 8.1 Considerations of TC (Trace-Compression) Cache . . . . . . . . . . . . . . . . 118 8.1.1 TC Cache Applied to Multi-Core SoCs . . . . . . . . . . . . . . . . . 118 8.1.2 TC Cache Applied to Higher Level Cache Hierarchy (L2/L3 Cache) . . 120 8.1.3 TC Cache Applied to Superscalar Architecture . . . . . . . . . . . . . 120 8.2 Considerations of DT (Data/Trace) Cache . . . . . . . . . . . . . . . . . . . . 121 8.3 Considerations of Tag SPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Chapter 9 Conclusions and Future Work 122 9.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Bibliography 126

參考文獻 References
[1] A. L. Shimpi, “Nehalem Part 3: The Cache Debate, LGA-1156 and the 32nm Future,” tech. rep., Anandtech, http://www.anandtech.com/show/2671, 2008. [2] M. K. Qureshi and Y. N. Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High- Performance, Runtime Mechanism to Partition Shared Caches,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 423–432, Dec. 2006. [3] M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar, “Reducing Leakage in a High-Performance Deep-Submicron Instruction Cache,” IEEE Trans. VLSI Syst., vol. 9(1), pp. 77–89, Feb. 2001. [4] H. Dybdahl and P. Stenstrom, “An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors,” in Proc. IEEE Int. Symp. on High Performance Computer Architecture, pp. 2–12, Feb. 2007. [5] J. Cong, K. Gururaj, H. Huang, C. Liu, G. Reinman, and Y. Zou, “An Energy-Efficient Adaptive Hybrid Cache,” in Proc. Int. Symp. on Low Power Electronics and Design, pp. 67–72, Aug. 2011. [6] T. Kluter, P. Brisk, P. Ienne, and E. Charbon, “Way Stealing: Cache-assisted Automatic Instruction Set Extensions,” in Proc. ACM/IEEE Design Automation Conference, pp. 31– 36, July 2009. [7] W. Stallings, Computer Organization and Architecture: Designing for Performance, Eighth Edition. Prentice Hall, 2010. [8] P. Machanick, “Approaches to Addressing the MemoryWall,” tech. rep., School of IT and Electrical Engineering, University of Queensland, 2002. [9] P. Ranganathan, S. Adve, and N. P. Jouppi, “Reconfigurable Caches and their Application to Media Processing,” in Proc. Int. Symp. on Computer Architecture, pp. 214–224, Jun. 2000. [10] M. Paul and P. Petrov, “Dynamic and Application-Driven I-Cache Partitioning for Low- Power Embedded Multitasking,” in Proc. IEEE Int. Symp. on Application Specific Processors, pp. 101–106, July 2009. [11] C. Piguet, Low-Power Processors and Systems on Chips. CRC Press, September 2005. [12] D. H. Albonesi, “Selective Cache Ways: On-Demand Cache Resource Allocation,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 248–259, Nov. 1999. [13] M. Zhang and K. Asanovic, “Fine-Grain CAM-Tag Cache Resizing Using Miss Tags,” in Proc. Int. Symp. on Low Power Electronics and Design, pp. 130–135, 2002. [14] S. Banerjee, S. G, and S. K. Nandy, “Program Phase Directed Dynamic Cache Way Reconfiguration for Power Efficiency,” in Proc. Design Automation Conference in Asia and South Pacific, pp. 884–889, Jan. 2007. [15] J. Chang and G. S. Sohi, “Cooperative Caching for Chip Multiprocessors,” in Proc. Int. Symp. on Computer Architecture, pp. 264–276, July 2006. [16] C. Lim and G. T. Byrd, “Exploiting Producer Patterns and L2 Cache for Timely Dependence-Based Prefetching,” in Proc. IEEE Int. Conf. on Computer Design, pp. 685– 692, Oct. 2008. [17] G. Kalokerinos, V. Papaefstathiou, G. Nikiforos, S. Kavadias, M. Katevenis, D. Pnevmatikatos, and X. Yang, “FPGA Implementation of a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability,” in Proc. Int. Symp. on Systems, Architectures, Modeling, and Simulation, pp. 149–156, July 2009. [18] Z. Ge, W. F. Wong, and H. B. Lim, “DRIM: A Low Power Dynamically Reconfigurable Instruction Memory Hierarchy for Embedded Systems,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 1–6, Apr. 2007. [19] C. Zhang, F. Vahid, and W. Najjar, “A Highly Configurable Cache Architecture for Embedded Systems,” in Proc. Int. Symp. on Computer Architecture, pp. 136–146, June 2003. [20] C. Zhang, F. Vahid, and R. Lysecky, “A Self-Tuning Cache Architecture for Embedded Systems,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 142–147, Feb. 2004. [21] M. Peng, J. Sun, and Y.Wang, “A Phase-Based Self-Tuning Algorithm for Reconfigurable Cache,” in Proc. Int. Conf. on Digital Society, pp. 27–32, Jan. 2007. [22] S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power,” in Proc. Int. Symp. on Computer Architecture, pp. 240– 251, Jun. 2001. [23] H. Zhou, M. C. Toburen, E. Rotenberg, and T. M. Conte, “Adaptive Mode Control: A Static-Power-Efficient Cache Design,” ACM Transactions on Embedded Computing Systems, vol. 2(3), pp. 347–372, Aug. 2003. [24] Y.-T. Chen, J. Cong, H. Huang, B. Liu, C. Liu, M. Potkonjak, and G. Reinman, “Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 45–50, Mar. 2012. [25] C. Zhang, “Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders,” in Proc. Int. Symp. on Computer Architecture, pp. 155–166, 2006. [26] M. K. Qureshi, D. Thompson, and Y. N. Patt, “The V-Way Cache : Demand-Based Associativity via Global Replacement,” in Proc. Int. Symp. on Computer Architecture, pp. 544– 555, Jun. 2005. [27] G. Bournoutian and A. Orailoglu, “Miss Reduction in Embedded Processors Through Dynamic, Power-Friendly Cache Design,” in Proc. Design Automation Conference, pp. 304– 309, Jun. 2008. [28] D. Rolan, B. B. Fraguela, and R. Doallo, “Adaptive Line Placement with the Set Balancing Cache,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 529–540, Dec. 2009. [29] M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. S. Jr., and J. Emer, “Adaptive Insertion Policies for High Performance Caching,” in Proc. Int. Symp. on Computer Architecture, pp. 381– 391, Jun. 2007. [30] M. Chaudhuri, “Pseudo-LIFO: The Foundation of a New Family of Replacement Policies for Last-level Caches,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 401–412, Dec. 2009. [31] D. Zhan, H. Jiang, and S. C. Seth, “STEM: Spatiotemporal Management of Capacity for Intra-Core Last Level Caches,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 163–174, 2010. [32] Z. Chishti, M. D. Powell, and T. N. Vijaykumar, “Optimizing Replication, Communication, and Capacity Allocation in CMPs,” in Proc. Int. Symp. on Computer Architecture, pp. 357–368, Jun. 2005. [33] H. Lee, S. Cho, and B. R. Childers, “StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache,” in Proc. Int. Symp. on High Performance Computer Architecture, pp. 1–12, Jan. 2010. [34] M. Kondo, H. Okawara, H. Nakamura, and T. Boku, “SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing,” in Proc. Int. Conf. on Computer Design, pp. 105–111, Sep. 2000. [35] Z. Ge, H. B. Lim, and W. F. Wong, “A Reconfigurable Instruction Memory Hierarchy for Embedded Systems,” in Proc. Int. Conf. on Field Programmable Logic and Applications, pp. 7–12, Aug. 2005. [36] H. Kim, A. K. Somani, and A. Tyagi, “A Reconfigurable Multifunction Computing Cache Architecture,” IEEE Trans. VLSI Syst., vol. 9(4), pp. 509–523, Aug. 2001. [37] ARM Ltd., Cortex-A5 Technical Reference Manual, Chapter 7.6.1: Data Cache Tag and Data Encoding, Sep. 2010. [38] P. Petrov and A. Orailoglu, “Energy Frugal Tags in Reprogrammable I-caches for Application-specific Embedded Processors,” in Proc. Int. Symp. on Hardware/Software Codesign, pp. 181–186, May 2002. [39] P. Petrov, D. Tracy, and A. Orailoglu, “Energy-Efficient Physically Tagged Caches for Embedded Processors with Virtual Memory,” in Proc. Design Automation Conference, pp. 17–22, Jun. 2005. [40] M. Loghi, P. Azzoni, and M. Poncino, “Tag Overflow Buffering: An Energy-Efficient Cache Architecture,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 520–525, Mar. 2005. [41] J. W. Kwak and Y. T. Jeon, “Compressed Tag Architecture for Low-Power Embedded Cache Systems,” J. Systems Architecture, vol. 56(9), pp. 419–428, Sept. 2010. [42] J. Lee, S. Hong, and S. Kim, “TLB Index-based Tagging for Cache Energy Reduction,” in Proc. Int. Symp. on Low Power Electronics and Design, pp. 85–90, Aug. 2011. [43] G. Kalokerinos, V. Papaefstathiou, G. Nikiforos, S. Kavadias, M. Katevenis, D. Pnevmatikatos, and X. Yang, “Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability,” Transactions on High Performance and Embedded Architecture and Compilation, vol. 5(3), pp. 75–95, Aug. 2010. [44] IEEE-ISTO Nexus 5001 Forum. [45] Tensilica Inc., http://www.tensilica.com/products/xtensa/xtensalx/traceLX.htm, Xtensa Processor Real-Time Trace. [46] Freescale Semiconductor Inc., MPC565 Reference Manual, Chapter 22, Development Support, Nov. 2005. [47] ARM Ltd., http://www.arm.com/products/solutions/ETM.html, Embedded Trace Macrocell Architecture. [48] M.-C. Hsieh and C.-T. Huang, “An Embedded Infrastructure of Debug and Trace Interface for the DSP Platform,” in Proc. IEEE Design Automation Conference, pp. 866–871, Jun. 2008. [49] A. Hopkins and K. McDonald-Maier, “Debug Support Strategy for Systems-On-Chips with Multiple Processor Cores,” IEEE Trans. Comput., vol. 55(2), pp. 174–184, Feb. 2006. [50] J.-M. Chen and C.-H. Wei, “VLSI Design for High-Speed LZ-Based Data Compression,” Proc. IEE Circuits, Devices, Syst., vol. 146(5), pp. 268–278, Oct. 1999. [51] M.-B. Lin, J.-F. Lee, and G. E. Jan, “A Lossless Data Compression and Decompression Algorithm and Its Hardware Architecture,” IEEE Trans. VLSI Syst., vol. 14(9), pp. 925– 936, Sept. 2006. [52] J. Nunez and S. Jones, “Gbit/s Lossless Data Compression Hardware,” IEEE Trans. VLSI Syst., vol. 11(3), pp. 499–510, Jun. 2003. [53] S. Kasera and N. Jain, “A Survey of Lossless Data Compression Techniques,” tech. rep., 2004. [54] C.-F. Kao, S.-M. Huang, and I.-J. Huang, “A Hardware Approach to Real-Time Program Trace Compression for Embedded Processors,” IEEE Trans. Circuits Syst. I, vol. 54(3), pp. 530–543, Mar. 2007. [55] V. Uzelac and A. Milenkovic, “Hardware-Based Data Value and Address Trace Filtering Techniques,” in Proc. Int. Conf. on Compilers, Architectures and Synthesis for Embedded Systems, pp. 117–126, Oct. 2010. [56] F.-C. Yang, C.-L. Chiang, and I.-J. Huang, “A Reverse-Encoding-Based On-Chip Bus Tracer for Efficient Circular-Buffer Utilization,” IEEE Trans. VLSI Syst., vol. 18(5), pp. 732–741, May 2010. [57] F.-C. Yang, Y.-T. Lin, C.-F. Kao, and I.-J. Huang, “An On-Chip AHB Bus Tracer With Real-Time Compression and Dynamic Multiresolution Supports for SoC,” IEEE Trans. VLSI Syst., vol. 19(4), pp. 571–584, Apr. 2011. [58] S. Narayanasamy, G. Pokam, and B. Calder, “BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging,” in Proc. Int. Symp. on Computer Architecture, pp. 284–295, June 2005. [59] S.-B. Park and S. Mitra, “IFRA: Instruction Footprint Recording and Analysis for Post-Silicon Bug Localization in Processors,” in Proc. Design Automation Conference, pp. 373–378, June 2008. [60] E. A. Daoud and N. Nicolici, “Real-Time Lossless Compression for Silicon Debug,” IEEE Trans. Computer-Aided Design, vol. 28(9), pp. 1387–1400, 2009. [61] E. A. Daoud and N. Nicolici, “On Using Lossy Compression for Repeatable Experiments during Silicon Debug,” IEEE Trans. Comput., vol. 60(7), pp. 937–950, 2011. [62] J.-S. Yang and N. A. Touba, “Expanding Trace Buffer ObservationWindow for In-System Silicon Debug through Selective Capture,” in Proc. IEEE VLSI Test Symposium, pp. 345– 351, Apr. 2008. [63] S. Prabhakar, R. Sethuram, and M. S. Hsiao, “Trace Buffer-Based Silicon Debug with Lossless Compression,” in Proc. Int. Conf. on VLSI Design, pp. 358–363, Jan. 2011. [64] H. F. Ko and N. Nicolici, “Algorithms for State Restoration and Trace-Signal Selection for Data Acquisition in Silicon Debug,” IEEE Trans. Computer-Aided Design, vol. 28(2), pp. 285–297, Feb. 2009. [65] X. Liu and Q. Xu, “Trace signal Selection for Visibility Enhancement in Post-Silicon Validation,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, pp. 1338–1343, Apr. 2009. [66] H.-M. Kyung, G.-H. Park, J.-W. Kwak, T.-J. Kim, and S.-B. Park, “Design and Implementation of Performance Analysis Unit (PAU) for AXI-Based Multi-Core System on Chip (SOC),” J. Microprocessors & Microsystems, vol. 34(2), pp. 102–116, Mar. 2010. [67] ARM Ltd., ARM1156T2-S Technical Reference Manual, May 2007. [68] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. Inform. Theory, vol. 23(3), pp. 337–343, May 1977. [69] W. J. Dally, J. Balfour, D. Black-Shaffer, J. Chen, R. C. Harting, V. Parikh, J. Park, and D. Sheffield, “Efficient Embedded Computing,” IEEE Computer, vol. 41(7), pp. 27–32, July 2008. [70] S. Basumallick and K. Nilsen, “Cache Issues in Real-Time Systems,” in Proc. ACM SIGPLAN Workshop on Language, Compiler and Tool Support for Real-Time Systems, May 1994. [71] ARM Ltd., Embedded Trace Macrocell ETMv1.0 to ETMv3.4 Architecture Specification, Chapter 4.6, Data Trace, July 2007. [72] J. Kin, M. Gupta, and W. H. Mangione-Smith, “The Filter Cache: An Energy Efficient Memory Structure,” in Proc. IEEE/ACM Int. Symp. on Microarchitecture, pp. 184–193, 1997. [73] L. H. Lee, B. Moyer, and J. Arends, “Instruction Fetch Energy Reduction using Loop Caches for Embedded Applications with Small Tight Loops,” in Proc. Int. Symp. on Low Power Electronics and Design, pp. 267–269, Aug. 1999. [74] C.-H. Lai, F.-C. Yang, and I.-J. Huang, “A Trace-Capable Instruction Cache for Cost-Efficient Real-Time Program Trace Compression in SoC,” IEEE Trans. Comput., vol. 60(12), pp. 1665–1677, Dec. 2011. [75] J. Montanaro, R. Witek, and K. Anne, “A 160-MHz, 32-b, 0.5-W CMOS RISC Microprocessor,” IEEE J. Solid-State Circuits, vol. 31, pp. 1703–1714, Nov. 1996. [76] W. Zhang and Y. Ding, “Hybrid SPM-Cache Architectures to Achieve High Time Predictability and Performance,” in Proc. Int. Conf. on Application-Specific Systems, Architectures and Processors, pp. 297–304, June 2013. [77] H. Thane, D. Sundmark, J. Huselius, and A. Pettersson, “Replay Debugging of Real- Time Systems Using Time Machines,” in Proc. Int. Parallel and Distributed Processing Symposium, Apr. 2003. [78] H. Cook, K. Asanovi´c, and D. A. Patterson, “Virtual Local Stores: Enabling Software- Managed Memory Hierarchies in Mainstream Computing Environments,” technical report no. ucb/eecs-2009-131, 2009. [79] ARM Ltd., ARM Cortex-A53 MPCore Processor Technical Reference Manual, Chapter 6.7.1: Data Cache Tag and Data Encoding, July 2014. [80] Gaisler Research, LEON2 Processor User’s Manual (Version 1.0.30), July 2005. [81] M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown, “MiBench: A Free, Commercially Representative Embedded Benchmark Suite,” in Proc. IEEE Int. Workshop on Workload Characterization, pp. 3–14, Dec. 2001. [82] C. MacNamee and D. Heffernan, “Emerging On-Chip Debugging Techniques for Real- Time Embedded Systems,” J. Computing & Control Engineering, vol. 11(6), pp. 295–303, Dec. 2000.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-1027114-184928.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS