國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,智慧型記憶體系統之耗能分析與低耗能排程技術,Power Analysis and Low Power Scheduling Techniques for Intelligent Memory System

論文名稱 Title	智慧型記憶體系統之耗能分析與低耗能排程技術 Power Analysis and Low Power Scheduling Techniques for Intelligent Memory System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	89 學年度第 2 學期 The spring semester of Academic Year 89	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	64
研究生 Author	鄭連福 Lien-Fu Cheng
指導教授 Advisor	黃宗傳 Tsung-Chuan Huang
召集委員 Convenor	陳澤生 Tse-Sheng Chen
口試委員 Advisory Committee	楊竹星, 竇其仁, 侯廷偉 Chu-Sing Yang; Chyi-Ren Dow; Ting-Wei Hou
口試日期 Date of Exam	2001-07-25	繳交日期 Date of Submission	2001-07-27
關鍵字 Keywords	none Energy-oriented low power scheduling, Power Analysis and Low Power Scheduling Techniqu, Performance-oriented low power scheduling, Intelligent Memory System, A new scheduling methodology in source code leve
統計 Statistics	本論文已被瀏覽 5650 次，被下載 23 次 The thesis/dissertation has been browsed 5650 times, has been downloaded 23 times.

中文摘要
在今日的計算機系統設計中，如何減少能源消耗已成為一項重要的課題。目前有關低耗能的研究，大多著重在新的半導體科技與硬體架構上，而較少利用軟體最佳化技術來減低耗能。在本論文中，我們將針對智慧型記憶體系統，提出一個純粹使用編譯技術的原始碼階層（Source code level）排程法，它擁有兩種不同考量的選項：“效能導向低耗能排程”與“耗能導向低耗能排程”。本排程機制兼顧了高效能與低耗能的考量，且具有高度的使用彈性。我們也分別列出多組的實驗數據，並加以討論，藉以驗證其優異性。
Abstract
Power consumption is gradually becoming an important issue of designing computing systems. Most of the researches of low power issues have focused on semiconductor techniques or hardware architecture designs, but less utilized the techniques of software optimization. This paper presents a new scheduling methodology in source code level for Intelligent Memory System, which reduces the energy consumption by means of code compilation techniques. The scheduling kernel provides two options for users: performance-oriented low power scheduling and energy-oriented low power scheduling, to achieve the objective of considering high performance and low power issues. The experimental results are also presented and discussed.

目次 Table of Contents
目錄中文摘要………………………………………………………………I 英文摘要………………………………………………………………II 目錄……………………………………………………………………III 圖目錄…………………………………………………………………………V 表格目錄 …………………………………………………………………VII 演算法目錄 ………………………………………………………………VIII 第一章前言 …………………………………………………………………1 第二章 FlexRAM的架構………………………………………………………6 第2.1節 FlexRAM的架構描述 ………………………………………………6 第2.2節 FlexRAM的基本參數 ………………………………………………7 第三章耗能分析與低耗能排程 ……………………………………………9 第3.1節加權區塊關係圖（WPG）的建立…………………………………10 第3.2節延遲加權值與能量加權值的估算 ………………………………12 第3.3節波前的產生與排程的決定 ………………………………………19 第3.3.1節效能導向低耗能排程法 ………………………………………21 第3.3.2節耗能導向低耗能排程法 ………………………………………28 第3.4節 PIM式的分塊法 …………………………………………………33 第四章範例…………………………………………………………………35 第4.1節使用效能導向低耗能排程法的範例 ……………………………37 第4.2節使用耗能導向低耗能排程法的範例 ……………………………37 第五章實驗結果……………………………………………………………38 第5.1節測試程式：swim …………………………………………………38 第5.2節測試程式：cgemm…………………………………………………40 第5.3節測試程式：cg ……………………………………………………42 第5.4節測試程式：ft ……………………………………………………43 第5.5節測試程式：mdg……………………………………………………44 第六章結論…………………………………………………………………47 圖目錄圖1-1.a L-Cache的組織架構圖……………………………………………3 圖1-1.b 放入L-Cache之優先順序 …………………………………………3 圖1-2.a 未重新排序前的程式………………………………………………4 圖1-2.b 重新排序後的程式…………………………………………………4 圖2-1 FlexRAM的組織架構圖 ……………………………………………6 圖3-1 低耗能排程演算法的流程圖………………………………………10 圖4-1.a 由swim中擷取的一段範例程式…………………………………35 圖4-1.b 經使用迴圈分離法後的結果 …………………………………36 圖4-1.c 建構出的WPG圖 …………………………………………………36 圖5-1.a swim的實測數據-時間柱狀圖 …………………………………39 圖5-1.b swim的實測數據-耗能柱狀圖 …………………………………40 圖5-2.a cgemm的實測數據-時間柱狀圖…………………………………41 圖5-2.b cgemm的實測數據-耗能柱狀圖 ………………………………41 圖5-3.a cg的實測數據-時間柱狀圖 ……………………………………42 圖5-3.b cg的實測數據-耗能柱狀圖 ……………………………………43 圖5-4.a ft的實測數據-時間柱狀圖 ……………………………………44 圖5-4.b ft的實測數據-耗能柱狀圖 ……………………………………44 圖5-5.a mdg的實測數據-時間柱狀圖……………………………………45 圖5-5.b mdg的實測數據-耗能柱狀圖……………………………………46 表格目錄表1-1 能量轉換消耗表 ……………………………………………………4 表2-1 FlexRAM組織架構的參數表…………………………………………7 表2-2 FlexRAM各運算資源的耗能參數表…………………………………8 表5-1 swim的實測數據……………………………………………………39 表5-2 cgemm的實測數據 …………………………………………………40 表5-3 cg的實測數據………………………………………………………42 表5-4 ft的實測數據 ……………………………………………………43 表5-5 mdg的實測數據 ……………………………………………………45 演算法目錄演算法1. Statement Splitting Algorithm ……………………………12 演算法2. Delay_Weight_Determine Algorithm…………………………13 演算法3. Delay_Weight Patching Algorithm …………………………14 演算法4. Energy_Weight_Determine Algorithm ………………………16 演算法5. Energy _Weight Patching Algorithm ………………………17 演算法6. Low Power Scheduling Algorithm …………………………19 演算法7. Loop Splitting ………………………………………………22 演算法8. Speedup Reduce and Get the Maximum Potential Energy Reduce Algorithm1 ……………………………………………23 演算法9. Constrain Energy and Get the Maximum Potential Speedup Algorithm1………………………………………………………29 演算法10. Tiling for PIM ………………………………………………33

參考文獻 References
[1] A. Parikh; M. Kandemir; N. Vijaykrishnan; and M.J. Irwin, “Energy-Aware Instruction Scheduling”. In: Proc. of 7th International Conference on High Performance Computing-HiPC 2000, pp. 335-344, Dec. 2000. [2] Bajwa, R.S.; Hiraki, M.; Kojima, H.; Gorny, D.J.; Nitta, K.; Shridhar, A.; Seki, K.; Sasaki, K., “Instruction buffering to reduce power in processors for signal processing”. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Volume: 5 Issue: 4, pp. 417-424, Dec. 1997. [3] Chung-Hsing Hsu and Ulrich Kremer*; Michael Hsiao, “Compiler-Directed Dynamic Frequency and Voltage Scheduling”, In: Proc. of Workshop on Power-Aware Computer Systems PACS’ 00, Nov. 2000. URL: http://www.ece.purdue.edu/~pacs00 [4] D. J. Kuck, “A survey of parallel machine organization and programming”. ACM Comput. Surv. 9, 1, pp. 29-59, Mar. 1977. [5] D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Tomas, and K. Yelick, “A Case for Intelligent DRAM”. IEEE Micro, pp. 33-44, Mar./Apr. 1997. [6] Hajj, N.B.M.; Polyckronopoulos, C.; Stamoulist, G., “Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors”. In: Proc. of 1998 International Symposium on Low Power Electronics and Design, pp. 70-75, 1998. [7] J. Granacki et al, “Data Intensive Architecture: DIVA”. http://www.isi.edu/asd/diva/, 1998. [8] Jiménez, M., “Multilevel Tiling for Non-Rectangular Iteration Spaces”. Ph.D. Thesis, Departamento de Arquittectura de Computadores, Universitat Politécniac de Catalunya, May 1999. [9] J. R. Allen, D. Callahan, and K. Kennedy, “Automatic decomposition of scientific programs for parallel execution”. In proc. of Fourteenth Annual ACM Symposium on the Principles of Programming Languages, Munich, Germany, Jan. 1987. [10] J. Veenstra, and R. Fowler, “MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors”. In proc. of the MAS-COTS’94, pp. 201-207, Jan. 1994. [11] Kin, J.; Munish Gupta; Mangione-Smith, W.H., “The filter cache: an energy efficient memory structure”. In: Proc. of Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, pp. 184-193, 1997. [12] K. Kennedy and K. S. McKinley, “Loop distribution with arbitrary control flow”. In proc. of the Supercomputing ’90, New York, NY, Nov. 1990. [13] Krishna, V.; Ranganathan, N.; Vijaykrishnan, N., “An energy efficient scheduling scheme for signal processing applications”, In: Proc. of Thirty-Second Asilomar Conference on Signals, Systems & Computers, Volume: 2, pp. 1057-1061, 1998. [14] Krishna, V.; Ranganathan, N.; Vijaykrishnan, N., “Energy efficient datapath synthesis using dynamic frequency clocking and multiple voltages”, In: Proc. of Twelfth International Conference On VLSI Design, pp. 440-445, 1999. [15] Lea Hwang Lee; Moyer, B.; Arends, J., “Instruction fetch energy reduction using loop caches for embedded applications with small tight loops”. In: Proc. of 1999 International Symposium on Low Power Electronics and Design, pp. 267-269, 1999. [16] Mehta, R.; Owens, R.M.; Irwin, M.J.; Chen, R.; Ghosh, D., “Techniques for low energy software”. In: Proc. of 1997 International Symposium on Low Power Electronics and Design, pp. 72-75, 1997. [17] Michael Huang, Jose Renau, Seung-Moon Yoo, and Josep Torrellas, “A Framework for Dynamic Energy Efficiency and Temperature Management”. In: Proc. of 33rd International Symposium on Microarchitecture, Dec. 2000. [18] Michael Huang; Jose Renau, Seung-Moon Yoo and Josep Torrellas, “Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips”, In: Proc. of 2nd Workshop on Intelligent Memory Systems, Nov. 2000. [19] M. Oskin, F. Chong, and T. Sherwood, “Active Pages: A Computation Model for Intelligent Memory”. In proc. of 25th Annual International Symposium on Computer Architecture, pp. 192-203, Jun. 1998. [20] P. Kogge, “The EXECUBE Approach to Massively Parallel Processing”. In: proc. of the International Conference on Parallel Processing, Aug. 1994. [21] Seung-Moon Yoo, Wei Huang, Jose Renau, and Josep Torrellas, “FlexRAM architecture design parameters”. Technical Report 1584, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Oct. 2000. [22] T. C. Huang, and S. L. Chu, “SAGE: A New Analysis and Optimization System for FlexRAM Architecture”. In: Proc. of 2nd Workshop on Intelligent Memory Systems, Cambridge, MA, Nov. 2000. [23] T. C. Huang, and S. L. Chu, “A New Analysis Approach for Intelligent Memory Systems”. In proc. of ISCA 16th International Conference on Computer and Their Applications, pp. 452-457, Seattle, Mar. 2001. [24] T. C. Huang; S. L. Chu; L.C. Lee, “Improving Workload Balance and Code Optimization in Processor-in-Memory Systems”. In: Proc. of Eighth International Conference on Parallel and Distributed Systems, pp. 273-278, Jun. 2001. [25] Tiwari, V.; Malik, S.; Wolfe, “Power analysis of embedded software: a first step towards software power minimization”. IEEE Transactions on A Very Large Scale Integration (VLSI) Systems, Volume: 2 Issue: 4, pp. 437-445, 1994 [26] Tiwari, V.; Malik, S.; Wolfe, A., “Compilation techniques for low energy: an overview”. In 1994 Digest of Technical Papers, IEEE Symposium on Low Power Electronics, pp. 38-39, 1994. [27] Tiwari, V.; Malik, S.; Wolfe, A.; Lee, M.T.-C., “Instruction level power analysis and optimization of software”. In: Proc. of Ninth International Conference on VLSI Design, pp. 326-328, 1995. [28] W. H. Press, S.A. Teukolsky, W. T. Vetterling, and B. P. Flannery, “Numerical Recipes in Fortran 77”. Cambridge University Press, 1992. [29] W. Huang, “Exploiting Application Parallelism Using Advanced Intelligent Memory – The FlexRAM approach”. MS Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 1999. [30] Y. Kang, W. Huang, S. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas, “FlexRAM: Toward an Advanced Intelligent Memory System”. In: Proc. of the International Conference on Computer Design , Austin, Texas, Oct. 1999.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內公開，校外永不公開 restricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.217.208.72 論文開放下載的時間是校外不公開 Your IP address is 18.217.208.72 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS