國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,可支援效能下降容忍之嶄新快取記憶體設計與分析,Design and Analysis of A New Cache to Support Performance Degradation Tolerance

論文名稱 Title	可支援效能下降容忍之嶄新快取記憶體設計與分析 Design and Analysis of A New Cache to Support Performance Degradation Tolerance
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	101 學年度第 2 學期 The spring semester of Academic Year 101	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	60
研究生 Author	紀雅修 Ya-Hsiu Chi
指導教授 Advisor	謝東佑 Tong-Yu Hsieh
召集委員 Convenor	鄺獻榮 Shiann-Rong Kuang
口試委員 Advisory Committee	丁信文, 葉家宏 Hsin-Wen Ting; Chia-Hung Yeh
口試日期 Date of Exam	2013-07-30	繳交日期 Date of Submission	2013-09-10
關鍵字 Keywords	功能型錯誤、效能下降錯誤、效能下降容忍、錯誤容忍、快取記憶體 performance degrading fault, performance degradation tolerance, fault tolerance, cache, functionality fault, performance fault
統計 Statistics	本論文已被瀏覽 5704 次，被下載 89 次 The thesis/dissertation has been browsed 5704 times, has been downloaded 89 times.

中文摘要
隨著半導體製程技術的尺寸愈來愈小，晶片對外界的干擾例如製造過程產生的電路缺陷以及製程參數飄移(process variation)等愈加敏感，使得晶片中可能存在硬體錯誤(hardware fault)讓製造出來的電晶體無法如預期地運作，而導致晶片良率偏低。處理器是驅動現今許多電子產品的重要動力，其效能表現決定了電子產品的競爭力，而快取記憶體(cache)是目前處理器設計中常用來提升運算效能的關鍵元件。倘若快取記憶體(cache)中的資料細胞(data cell，用來儲存資料的單元)存在硬體錯誤，將可能使儲存於其中的資料受到感染，提供錯誤的資料給處理器執行，而造成錯誤運算結果。如何確保快取記憶體的良率與穩定度一直以來為學術界與工業界的重要研究課題之一。過去文獻中曾提出許多容錯(fault tolerance)方法可有效避免快取記憶體發生資料錯誤，然而在良率偏低的情況下，這些方法在良率提升的效果上將可能變得相當有限。最近幾年一嶄新良率提升方法被提出，稱之為效能下降容忍(performance degradation tolerance)。此方法之核心觀念為效能下降錯誤 (performance degrading fault)之辨認。這類錯誤相當特殊，僅會造成晶片的操作效能下降，但不會使晶片產生錯誤運算結果。假如晶片中僅存在這類錯誤，且其導致之效能下降幅度仍可接受的話，則此晶片雖有缺陷存在，仍可在一些較為低階的應用中繼續使用。此觀念相當適用於處理器中專門用於提升其運算效能的元件，如分支預測器 (branch predictor)。文獻上的相關研究顯示分支預測器中的所有錯誤均為效能下降錯誤，且超過99%的錯誤所導致的效能下降幅度均小於1%。在效能下降容忍的觀念下，目標電路中的效能下降錯誤所佔比例，以及其可能造成的效能損失幅度均為相當關鍵且值得深入探討之議題。快取記憶體中的錯誤在未考慮容錯或效能下降容忍的觀念下絕大部分均非效能下降錯誤。文獻中雖有一些容錯方法，但這些方法均無法增加快取記憶體中效能下降錯誤的比例。本論文旨在設計出一可支援效能下降容忍的快取記憶體架構。在此架構中所有資料儲存單元中的錯誤均為效能下降錯誤。為了評估這些錯誤所造成之效能損失，我們使用隨機方式將具有不同密度之多重錯誤注入資料儲存單元中並進行模擬。實驗結果顯示在錯誤密度為1%以內時，效能損失小於1%，而在錯誤密度提升到20%時，效能下降不到16%。
Abstract
As the feature size of transistors becomes smaller, chip is more sensitive to external disturbances such as defects during the manufacturing process and process variation. These disturbances may result in low yields of chips. Processor designs are one of the main driving forces for electronic products nowadays. Among the components containing a processor design caches are quite critical for enhancing the computation performance. For cache designs any faults in the data storage cells are quite likely to contaminate the data, which may lead to wrong computation results of processors. How to effectively improve the yield and reliability of cache has been one of important research topics in the academic and industry. In the literature many fault-tolerance methods have been developed to prevent from operation errors of caches. However when the chip yield is low, the effectiveness of these methods may become limited. In recent years a new notion to improve yield is proposed, which is called performance degradation tolerance. The focus of this notion is on a particular type of faults, called performance degrading faults. This type of faults can only result in some performance degradation without any computation errors. Therefore as long as the defective chips contain only this type of faults and the resulting degraded performance is acceptable, these chips are still marketable for certain lower-end applications. This notion is quite applicable to the components that are dedicated for enhancing the computation performance of processors, such as branch predictors. Our prior research results have shown that all faults in a branch predictor are performance degrading faults, and the induced performance degradation for over 99% of faults is less than 1%. Under the notion of performance degradation tolerance, the fraction of performance degrading faults and the resulting degree of performance degradation of the target circuit are quite critical issues that are worthy to investigate. It is important to point out that most hardware faults in cache are not performance degrading faults. Although there have been some fault tolerance methods developed in the literature, the number of the resulting performance degrading faults by these methods may be limited. Also these methods may require large hardware overhead and induce significant performance loss. This thesis focuses on proposing a new cache design that can support performance degradation tolerance. In this design, all faults in data storage cells are performance degrading faults. In order to evaluate the resulting performance degradation of faults, we use the SimpleScalar processor simulation tool to implement the proposed cache design. We then randomly inject multiple faults with various fault densities into the data cells of the cache and employ several CPU2000 benchmark programs to perform a large number of simulations. The experimental results show that when the fault density is less than 1%, the performance degradation is less than 1% as well. The performance degradation is less than 16% when the fault density is 20%.

目次 Table of Contents
論文審定書…………………………………………………………………i 誌謝………………………………………………………………………...ii 中文摘要…………………………………………………………………..iii 英文摘要………………………………………………………………...…v 第一章介紹及研究動機 1.1研究動機.………………………………………..…………………1 1.2架構概述.…………………………………………..………………2 1.3貢獻.…………………………………………………..……………2 1.4章節介紹………….……………………………………..…………3 第二章背景知識與過去相關研究 2.1錯誤容忍及效能下降容忍…...………………………..........……..4 2.2微處理機中的記憶體存取機制…...…………………...……….....5 2.3微處理機中的管線化運作……………………………….………..9 2.4字組列/組集合/單路刪用方法實現錯誤容忍…..………..........12 2.5重新映射方法實現錯誤容忍……………………………..….......14 2.6重劃規格方法實現錯誤容忍……..…………………………...…16 2.7挽救式快取記憶體實現錯誤容忍…..………………………...…19 2.8 以二維錯誤修正碼實現錯誤容忍….…………………………..20 第三章可支援效能下降容忍之嶄新快取記憶體設計與分析 3.1 基本概念….……………………………………………………..24 3.2 自我測試模式與錯誤位元設定….……………………………..26 3.3 快取記憶體設計細節….…..……………………………………31 3.4 架構優點…………………………………………….…………..33 第四章模擬結果 4.1 模擬軟體-SimpleScalar………………………………………...34 4.2 模擬參數設定與效能跑分執行程式…………………………..35 4.3 模擬結果………………………………………………………..36 第五章結論…….……………………………………………………...46 參考文獻…………...…………………………………………………….47

參考文獻 References
[ 1 ] SEMATECH. “Critical Reliability Challenges for the International Technology Roadmap for Semiconductors(ITRS),” Technology Transfer #03024377A-TR, 2003. [ 2 ] D. E. Hocevar, P. F. Cox, and P. Yang, “Parametric Yield Optimization for MOS Circuit Blocks,” IEEE Transactions on Computer Aided Design, 7(6): pp. 645-658, 1988. [ 3 ] J. Srinivasan et al. “The Impact of Technology Scaling on Lifetime Reliability,” International Conference on Dependable Systems and Networks, pp. 177-186, 2004. [ 4 ] D. C Bossen et al. “Power4 System Design for High Reliability,” IEEE Micro, 22(2): pp. 16-24, 2002. [ 5 ] E. Rotenberg. “AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors,” International Symposium on Fault Tolerant Computing Systems, pp. 84-91, 1999. [ 6 ] N. Karimi et al. “Impact Analysis of Performance faults in Modern Microprocessors,” IEEE International Conference on Computer Design, pp91-96, 2009 [ 7 ] T.-Y. Hsieh, M. A. Breuer, M. Annaveram, S. K. Gupta, and K.-J. Lee, “Tolerance of Performance Degrading Faults for Effective Yield Improvement,” in Proc. International Test Conference, pp. 1-10, 2009. [ 8 ] D.A. Patterson, and J. L. Hennessy, Computer Organization and Design. Morgan Kaufmann, 2nd edition. [ 9 ] H. Lee, S. Cho, B. R. Childers, ”Performance of Graceful Degradation for Cache Faults,” in Symposium on VLSI, pp.409-415 , 2007. [ 10 ] A. Agarwal, B.C. Paul, and K. Roy, “A Novel Fault Tolerant Cache to Improve Yield in Nanometer Technologies,” International OnLine Testing Symposium (OLTS 2004), pp. 149-154, 2004. [ 11 ] C. Koh, W. Wong, Y. Chen, H. Li “The salvage Cache: A Fault Tolerant Cache Architecture for Next-Generation Memory Technologies,” IEEE International Conference on Computer Design, pp. 268-274, 2009. [ 12 ] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, J.C. Hoe, “Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding,” International Symposium on Microarchitecture, pp.197-209, 2007. [ 13 ] L. T. Wang, C. E. Stroud, N. A. Touba, System-On-Chip Test Architectures, Morgan Kaufmann, 2008. [ 14 ] T. Austin, E. Larson, and D. Ernst, “SimpleScalar: An Infrastructure for Computer System Modeling,” IEEE Computer, 35(2): pp. 59-67, 2002.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0810113-160440.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS