國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,可容忍製程缺陷與軟性錯誤之快取記憶體設計與實現,Design and Implementation of A Defect and Soft-Error Tolerable Cache

論文名稱 Title	可容忍製程缺陷與軟性錯誤之快取記憶體設計與實現 Design and Implementation of A Defect and Soft-Error Tolerable Cache
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	103 學年度第 2 學期 The spring semester of Academic Year 103	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	71
研究生 Author	池宗亮 Tsung-Liang Chih
指導教授 Advisor	謝東佑 Tong-Yu Hsieh
召集委員 Convenor	丁信文 Hsin-Wen Ting
口試委員 Advisory Committee	王太平, 鄺獻榮 Tai-Ping Wang; Shiann-Rong Kuang
口試日期 Date of Exam	2015-07-28	繳交日期 Date of Submission	2015-09-08
關鍵字 Keywords	製程缺陷、軟性錯誤、快取記憶體、效能下降容忍、效能下降錯誤 cache, performance degradation tolerance, performance degrading faults, defect, soft-error
統計 Statistics	本論文已被瀏覽 5668 次，被下載 30 次 The thesis/dissertation has been browsed 5668 times, has been downloaded 30 times.

中文摘要
隨著半導體製程的進步，電子元件尺寸可以有效的被縮小，使晶片更容易受到製造缺陷(defect)與製程飄移(process variation)的影響，而導致整體晶片良率偏低。另一方面，由於元件尺寸的縮小，使其臨界電壓也隨之減少，使電路對於軟性錯誤越來越敏感，進而影響了晶片的可靠度。效能下降容忍(performance degradation tolerance)在近幾年中被提出來提升電路良率與穩定度。此概念的基礎在於電路中可能存在一種特殊錯誤，稱為效能下降錯誤(performance degrading fault, pdef)，此種錯誤僅會造成系統效能下降，並不會造成錯誤的運算結果。倘若系統的效能下降仍可被接受，則此晶片將可繼續使用，提升晶片之有效良率。基於效能下降容忍本論文提出一嶄新快取記憶體架構，透過停用錯誤的儲存區塊及利用記憶體的階層關係，將功能性錯誤(functional error)轉換成效能下降錯誤。我們再更進一步透過錯誤修正碼(error correcting code, ECC)修正錯誤並檢測軟性錯誤，減緩錯誤所造成的效能下降程度，實現出可容忍製程缺陷與軟性錯誤之快取記憶體架構。考量到錯誤修正碼所帶來的龐大面積負擔，我們提出一設計方法將其所需之檢測位元(check bits)存放於快取記憶體既有的儲存空間，以降低整體面積消耗。此方法可減少約6.27%的硬體面積消耗 (從16.64%降低至10.27%)。利用標準測試程式所進行之實驗結果說明當快取記憶體有1%的儲存區塊發生錯誤時，其效能下降最多僅約1%左右。當20%的儲存區塊中發生錯誤時，其效能下降平均為16.67%。然而若加入錯誤修正碼修正錯誤之考量，假設有錯區塊有10%的機率僅有單一錯誤而可被修正，其效能下降將可減緩至15.27%；若錯誤修正機率為50%，其效能下降可更減緩至8.72%；若錯誤有90%的機率可被修正，其效能下降可再減緩至僅有1.62%。
Abstract
When the feature size of transistors becomes smaller, chips are more sensitive to process defects and variation, which may result in low yield and reliability. In recent years, a yield and reliability improvement method is proposed, which is called performance degradation tolerance (PDT). The focus of this method is based on a particular type of fault, called performance degrading faults (pdef). This type of faults can’t cause functional errors at system outputs, but may result in system performance degradation. If defective chips contain only pdef with acceptable performance degradation, these chips are still marketable, thereby enhancing the effective yield. In this thesis, we propose a new cache architecture based on PDT. In this architecture all functional errors in the storage part are transformed to pdef by disabling faulty blocks and retrieving the data from the next level of memories. In addition, error correcting code (ECC) is employed to correct and detect errors such that the proposed cache can tolerate hard and soft errors, and the incurred performance degradation can be mitigated. For the area concern, we store check bits in the existing storage cells to further lower the incurred area overhead. The logical synthesis results show that the area overhead is thus reduced by 6.27% (from 16.64% to10.27%). Experimental results based on several large realistic benchmark programs show that the incurred performance degradation of the proposed cache is less than 1% when the fault density is 1%. The performance degradation is about 16.67% when the fault density is 20%. However this degradation can be mitigated to 15.27%, 8.72%, and 1.62% by our ECC mechanism when there are probabilities of 0%, 50%, and 90% that these errors can be corrected.

目次 Table of Contents
論文審定書 i 摘要 ii Abstract iii 目錄 iv 圖次 vii 表次 ix 第一章介紹 1 1.1 研究動機 1 1.2 貢獻 5 1.3 章節介紹 6 第二章背景知識與相關背景 8 2.1 容錯設計 8 2.2 效能下降錯誤 9 2.3 快取記憶體之架構與存取機制 10 2.4 儲存區塊關閉與取代 14 2.5 備份快取記憶體(Replication cache) 17 2.6 影檢測法(Shadow checking) 18 2.7 錯誤修正碼(Error correcting code，ECC) 19 2.8 內建自我測試電路 20 2.9 模擬參數設定與執行效能跑分程式 21 2.10 本論文架構與過去研究之比較 23 第三章可容忍製程缺陷之快取記憶體設計與實現 24 3.1 硬體架構 24 3.2 可容忍製程缺陷之快取記憶體架構讀取操作 25 3.3 可容忍製程缺陷之快取記憶體架構寫入操作 26 3.4 效能下降錯誤數量分析 27 3.5 硬體實現結果 28 3.6 效能下降結果分析 28 3.7 快取記憶體尺寸之可適性分析 30 第四章可容忍製程缺陷與軟性錯誤之快取記憶體設計與實現 31 4.1 結合單錯修正雙錯檢測電路之PDT cache 31 4.1.1 基本概念 31 4.1.2 儲存區塊分類與好處 32 4.1.3 快取記憶體架構讀取操作 33 4.1.4 快取記憶體架構寫入操作 34 4.1.5 硬體架構 35 4.1.6 記憶體單元(Memory unit)架構說明 36 4.1.7 LRU修改 38 4.1.8 效能下降錯誤數量分析 39 4.1.9 硬體實現結果 40 4.1.10 效能下降結果分析 42 4.2 結合內嵌檢測位元的單錯修正雙錯檢測電路之PDT cache 44 4.2.1 儲存區塊減少對於快取記憶體效能之影響分析 45 4.2.2 記憶體單元(Memory unit)架構說明 46 4.2.3 LRU修改 48 4.2.4 效能下降錯誤數量分析 49 4.2.5 硬體實現結果 52 4.2.6 效能下降結果分析 52 第五章總結 57 參考文獻 58

參考文獻 References
[1] Y. Zorian, D. Gizopoulos, C. Vandenberg, and P. Magarshack, "Guest editors' introduction: design for yield and reliability," IEEE Design & Test of Computers, vol.21, no.3, pp.177-182, 2004. [2] Jaume Abell̀, Javier Carretero, Pedro Chaparr, Xavier Vera and Antonio González, "Low vccmin fault-tolerant cache with highly predictable performance", in Proc. International Symposium on Microarchitecture, pp. 111-121, 2009. [3] J. Ziegler "Terrestrial cosmic rays", IBM Journal of Research and Development, vol. 40, no. 1, pp.19 -39 1996. [4] T.-Y. Hsieh, M.A. Breuer, M. Annavaram, S.K. Gupta, K.-J. Lee, "Tolerance of performance degrading faults for effective yield improvement," in Proc. International Test Conference, pp. 1-10, 2009. [5] M. A. Breuer, and H. Zhu, "Error-tolerance and multi-media," in Proc. International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp.18-20, 2006. [6] S. Almukhaizim, P. Petrov, and A. Orailoglu, “Faults in processor control subsystems: Testing correctness and performance faults in the data prefetching unit,” in Proc. Asian Test Symposium, 2001, pp. 319–324. [7] A. Agarwal , B. C. Paul , H. Mahmoodi , A. Datta and K. Roy "A process-tolerant cache architecture for improved yield in nano-scale technologies", IEEE Transactions Very Large Scale Integration (VLSI) Systems, vol.13, no.1, pp.27 -38, 2005. [8] J. J. Tang, (2012, January). 創造殺手級3D IC產品 CPU/記憶體堆疊勢在必行. 新電子科技雜誌. Retrieved July 13, 2014, from http://www.mem.com.tw/article_content.asp?sn=1201130015. [9] S. Almukhaizim, T. Verdel and Y. Makris, "Cost-effective graceful degradation in speculative processor subsystems: the branch prediction case", in Proc. International Conference on Computer Design, pp. 194-197, 2003. [10] Avesta Sasan , Houman Homayoun , Ahmed Eltawil , Fadi Kurdahi, "A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache)," in Proc. International Conference of Compilers Architectures and Synthesis for Embedded Systems (CASES), pp. 251-260, 2009. [11] S.Kim and A.K.Somani, "Area efficient architectures for information integrity in cache memories," in Proc. International Symposium on Computer Architecture (ISCA), pp. 246 - 255, 1999. [12] W. Zhang, “Replication cache: a small fully associative cache to improve data cache reliability,” IEEE Transactions on Computers, vol. 54, no. 12, pp. 1547-1555, 2005. [13] A. Agarwal, B. Paul, and K. Roy, "A novel fault tolerant cache to improve yield in nanometer technologies," in Proc. On-Line Testing Symposium, pp.149 -154 2004. [14] L. T. Wang, C. E. Stroud, and N. A. Touba, System-on-chip test architectures: nanometer design for testability, Elsevier, Morgan Kaufmann Publishers, 2007. [15] D. A. Patterson and J. L. Hennessy, Computer organization and design: the hardware/software interface, 5th edition, Morgan Kaufman Publishers, Inc. [16] H.Lee, S. Cho, B. R. Childers, “Performance of graceful degradation for cache faults,” in Proc. Symposium on Very Large Scale Integration (VLSI), pp.409-415, 2007. [17] M. Manoochehri, M. Annavaram, M. Dubois. "CPPC: correctable parity protected cache," in Proc. International Symposium on Computer Architecture, pp. 223-234, 2011. [18] M. Y. Hsiao. “A class of optimal minimum odd-weight-column SEC-DED codes”, IBM Journal of Reserach and Development, vol. 14, no. 4, pp. 395-401, 1970. [19] L.D. Hung, H. Irie, M. Goshima and S. Sakai, "Utilization of SECDED for soft error and variation induced defect tolerance in caches," in Proc. Design, Automation &, Test in Europe Conf. &, Exhibition (DATE), pp. 1-6, 2007. [20] A. van de Goor and Y. Zorian, "Effective march algorithms for testing single-order addressed memories," in Proc. European Conference Design Automation with the European Event in ASIC Design, pp. 499-505, 1993. [21] J. Kim, N. Hardavellas, K. Mai, B. Falsafi and J.C. Hoe, "Multi-bit error tolerant caches using two-dimensional error coding," in Proc. International Symposium on Microarchitecture (Micra-40), pp. 197-209, Dec., 2007. [22] L.D. Hung, M. Goshima and S. Sakai, "SEVA: a soft-error-and variation-aware cache architecture," in Proc. Pacific Rim Int'l Symposium Dependable Computing (PRDC), pp. 47-54, 2006. [23] T. Austin, E. Larson and D. Ernst "Simplescalar: an infrastructure for computer system modeling", IEEE Computer, vol. 35, no. 2, pp. 59-67, 2002.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0808115-142034.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS