國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,低能量多重模式浮點數運算單元及指令精確度指定方法,Energy-Efficient Multiple-Mode Floating-Point Arithmetic Units and Instruction Precision Assignment Methods

論文名稱 Title	低能量多重模式浮點數運算單元及指令精確度指定方法 Energy-Efficient Multiple-Mode Floating-Point Arithmetic Units and Instruction Precision Assignment Methods
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	102 學年度第 2 學期 The spring semester of Academic Year 102	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	150
研究生 Author	吳坤益 Kun-Yi Wu
指導教授 Advisor	鄺獻榮 Shiann-Rong Kuang
召集委員 Convenor	周哲民 Jer-Min Jou
口試委員 Advisory Committee	蕭勝夫, 鄺獻榮, 邱日清, 張雲南 Shen-Fu Hsiao.; Shiann-Rong Kuang; Jih-ching Chiu; Yun-Nan Chang
口試日期 Date of Exam	2014-07-16	繳交日期 Date of Submission	2014-07-28
關鍵字 Keywords	低能量、仿射算術、多重模式浮點數運算單元、指令精確度指定、禁忌搜尋 low energy, affine arithmetic, instruction precision assignment, Tabu search, multiple-mode FP arithmetic units
統計 Statistics	本論文已被瀏覽 5718 次，被下載 62 次 The thesis/dissertation has been browsed 5718 times, has been downloaded 62 times.

中文摘要
隨著現代系統中浮點數運算應用的快速成長，使得浮點數運算單元已經成為這些系統主要的能量消耗來源。幸運地是許多浮點數應用可以忍受些許輸出資料的失真，而這些失真是人類感官可以忽略或是接受的。換句話說，我們可以利用多重模式浮點數運算單元，藉著調降浮點數運算單元的精確度(低於IEEE單精度浮點數指令)，犧牲整體應用輸出資料的準確度，以換取降低能量消耗。所以，如何在可容許的確實性損失下，能夠快速、有效率地指定每個浮點數指令一個合適的精確度模式並且達到最低能量消耗，已經成為非常重要的議題(稱為精確度指定問題，即PAP)。由於針對低能量、高效能或其他特殊目的，有些運算可以轉換成不同的指令，而如何選擇指令並且在多重模式運算單元上執行上述運算，則是一個關鍵的問題(我們稱為指令轉換問題，即ITP)。此外，指令排程問題(即ISP)對於追求高效能的系統而言，亦是非常重要的。因此，為了有效解決上述三個問題，本論文提出一個低能量指令精確度指定系統，其中包括多重模式浮點數運算單元的硬體實作以及誤差分析和指令精確度指定方法的軟體發展兩方面。首先，我們將介紹多重模式浮點數運算單元的設計與特色。我們利用堆疊和截斷技術實現多重精確度模式設計，其所有精確度模式皆可隨不同指令的需求動態調整，以達到降低更多能量消耗的目標。為了有效使用多重模式浮點數運算單元，並且確保應用程式的輸出資料確實性限制可以滿足，我們利用仿射算術(AA)建立一個區間分析之浮點數誤差模式，以便指出每一個浮點數指令的精確度和浮點數應用程式輸出資料確實性之間的關係。接著，我們將誤差模式所產生的輸出資料仿射算術格式儲存在確實性檢查函式，以便在PAP和ITP問題下，執行確實性限制檢查。此外，本論文採用簡化指令排程方法和應用程式的資料非循環圖(DAG)建立效能檢查函式，用來檢查是否滿足ISP的效能限制。基於多重模式浮點數運算單元的資訊和上述兩個檢查函式，我們發展出一個指令精確度指定方法，此方法結合了快速貪婪方法以及我們所改良的快速塔布搜尋(禁忌搜尋)演算法，能在應用程式的確實性和效能限制下，透過快速指定每個浮點指令的精確度模式並且重新排程所有指令，以同時解決PAP、ITP和ISP三個問題。從實際應用程式和人工隨機例子的實驗結果顯示，我們所提出的方法可以在有限的時間限制之下，找到比其他方法節省更多能量消耗的精確度指定解。
Abstract
With the rapid growth in applying floating-point (FP) arithmetic to the modem systems, FP arithmetic units have become the main energy consumers in these systems. Fortunately, many FP applications allow a slight output distortion that human senses can neglect or tolerate. In other words, we can trade the energy consumption with output quality of FP applications by reducing the precision of FP instructions (less accurate than IEEE single-precision FP one) via multiple-mode FP arithmetic units. However, how to quickly and effectively assign each FP instruction to a suitable precision mode of these multiple-mode FP arithmetic units for maximizing the energy saving under acceptable accuracy constraints is an essential problem (called precision assignment problem, PAP). Because some operations can be transformed to different instructions for low energy, high performance or other special purposes, it is a critical problem that determines which instructions will be chosen to perform above operations in various multiple-mode FP arithmetic units (called instruction transform problem, ITP). Moreover, instruction scheduling problem (denoted to ISP) is also very important for many high performance systems. Thus, this dissertation proposes a low energy instruction precision assignment system that includes the hardware implementation of multiple-mode FP arithmetic units and the software development of error analysis model and instruction precision assignment methods for efficiently solving PAP, ITP and ISP. Firstly, we introduce the design and characteristics of our multiple-mode FP arithmetic units which utilize the iterative and truncated techniques to support multiple-modes with various errors and energy consumption. All precision modes of above arithmetic units can be dynamically changed when they perform different FP instructions to reduce more energy consumption. In order to effectively utilize above-mentioned multiple-mode FP arithmetic units and ensure that the accuracy constraints of application are satisfied, affine arithmetic (AA) is modified to build a FP error model in interval analysis that indicates the relationship between the accuracies of each FP instruction and the output data of the given FP applications. Afterward, we store the AA form of output data generated by above FP error model in accuracy check function for checking accuracy constraints in PAP and ITP. In addition, a simplified instruction scheduling and the DAG of application are used to build performance check function for checking performance constraints in ISP. Based on the information of multiple -mode arithmetic units and above two check functions, our proposed precision assignment method that integrates a fast greedy method with our modified fast Tabu search (TS) algorithm is then developed to quickly solve PAP, ITP and ISP by assigning the precision modes of each FP instruction and re-scheduling all instructions under the accuracy and performance constraints on the given application. Experimental results for real applications and artificial random cases show that our proposed method can efficiently find a precision assignment solution and the most energy saving within acceptable time when compared to previous methods on average.

目次 Table of Contents
Contents CHAPTER 1. Introduction 1 CHAPTER 2. Related Works 5 2.1 Multiple-mode Floating-Point Arithmetic Unit 5 2.1.1 Multiplier 5 2.1.2 Multiply-add Fused Unit (MAF) 8 2.1.3 Special Function Interpolator (SFI) 10 2.2 Error Model 12 2.3 Precision Assignment Method 14 CHAPTER 3. Proposed Multiple-mode Floating-point Arithmetic Units 16 3.1 Multiple-mode Floating-point Multiplier 17 3.1.1 Floating-point Multiplication 17 3.1.2 Iterative Multiplication 21 3.1.3 Iterative and Truncated Multiplication 25 3.1.4 Error Analysis 33 3.2 Multiple-mode Floating-Point Multiply-add Fused Unit 45 3.2.1 Iterative Multiplication and Truncated Addition 47 3.2.2 The Operations of MMAF 50 3.2.3 Error Analysis 53 3.3 Multiple-mode Floating-point Function Interpolator 57 3.3.1 Fundamental Function Interpolator (FFI) 57 3.3.2 Row-column-based Multiple-mode Mechanism (RCBMM) 61 CHAPTER 4. Modified Affine Arithmetic Based Error Model 65 4.1 Interval Arithmetic (IA) 66 4.2 Affine Arithmetic (AA) 66 4.3 Proposed Affine Arithmetic Based Error Model 70 4.3.1 Multiplication (MUL) 72 4.3.2 Addition (ADD) 74 4.3.3 Multiply and Accumulate Operation (MAC) 74 4.3.4 Dot Product 3 (DP3) and Dot Product 4 (DP4) 75 4.3.5 Reciprocal operation (REC) 76 4.3.6 Reciprocal square root operation (RSQ) 81 4.3.7 Logarithm operation (LG2) 82 4.3.8 Exponential operation (EX2) 84 4.3.9 Accuracy Constraint 85 CHAPTER 5. Proposed Precision Assignment Method 90 5.1 PAP, ITP and ISP 90 5.2 Our Proposed Precision Assignment Method 96 5.3 Fast Greedy Method 96 5.4 Our Modified Fast Tabu Search 97 5.4.1 Initial Solution 99 5.4.2 Neighbor Structure 99 5.4.3 Fast Tabu List 100 5.4.4 Tabu Length Adjust Mechanism 103 5.4.5 Branch and Bound Strategies 104 5.4.6 Aspiration Rule 106 5.4.7 Diversification Strategies 106 5.4.8 Stop Rule 107 CHAPTER 6. Experimental Results 108 6.1 Experimental Environment 108 6.2 Compared Methods 109 6.3 The Results and Discussion of Real Applications 111 6.4 The Results and Discussion of Artificial Cases 118 CHAPTER 7. Conclusions 121 CHAPTER 8. Future Work 122 References 125 Publication List 136

參考文獻 References
[1] IEEE Standard for Binary Floating-Point Arithmetic. New York: ANSI/IEEE 754-1985, 1985. [2] G. Visalli and F. Pappalardo, “Low-power floating-point encoding for signal processing applications,” IEEE Workshop on Signal Processing Systems, pp. 292-297, 2003. [3] C. A’lvarez, J. Corbal, and M. Valero, “Fuzzy memoization for FP multimedia applications,” IEEE Transactions on Computers, Vol. 54, No.7, July, 2005. [4] T. J. Lin, H. Y. Lin, C. M. Chao, C. W. Liu, and C. W. Jen, “A compact DSP core with static floating-point arithmetic,” The Journal of VLSI Sign Processing System, Vol. 42, No.2, pp. 127-138, July, 2006. [5] B. G. Nam and H. J. Yoo, “An embedded stream processor core based on logarithmic arithmetic for a low-power 3-D graphics SoC,” IEEE Journal of Solid-State Circuits, Vol. 44, pp. 1554-1570, 2009. [6] Y. F. Tong, D. Nagle, and R. A. Rutenbar, “Reducing power by optimizing the necessary precision/range of floating-point arithmetic,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 273-285, 2000. [7] F. C. Fang, C. Tsuhan, and R. A. Rutenbar, “Floating-point bit-width optimization for low-power signal processing applications,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), pp. 3208 – 3211, 2002. [8] A. A. Gaffar, O. Mencer, and W. Luk, “Unifying bit-width optimization for fixed-point and floating-point designs,” 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 79-88, 2004. [9] M. Jo, V. K. Prasad Arava, H. Yang, and K. Choi, “Implementation of Floating-Point Operations for 3D Graphics on a Coarse-Grained Reconfigurable Architecture,” Proceedings of IEEE International SOC Conference, pp. 127–130, 2007. [10] J. Pool, A. Lastra, and M. Singh, “Energy-Precision Tradeoffs in Mobile Graphics Processing Units,” Proceedings of IEEE International Conference on Computer Design, pp. 60–67, 2008. [11] W. Liu and A. Nannarelli, “Power dissipation challenges in multicore floating -point units,” 2010 21th IEEE International Conference on Application-specific Systems Architectures and Processors (ASAP), France, PP. 257-264, July, 2010. [12] K. E. Wires, M. J. Schulte, and J. E. Stine, “Variable correction truncated floating point multipliers,” Proceedings of the 34th Asilomar Conference on Signals Systems and Computers, PP. 1344–1348, 2000. [13] J. Choi, J. Jeon, and K. Choi, “Power minimization of functional units by partially guarded computation,” Proc. Int. Symp. Low Power Electronics and Design, pp.131–136, 2000. [14] Z. Huang and M. D. Ercegovac, “Two-dimensional signal gating for low-power array multiplier design,” Proc. IEEE Int. Symp. Circuits and Syst., vol. 1, pp. 489–492, 2002. [15] W. Ling and Y. Savaria, “Variable-Precision Multiplier for Equalizer with Adaptive Modulation,” Proc. 47th IEEE Int. Midwest Symp. Circuits and Syst., vol. 1, pp. 553–556, 2004. [16] S. Krithivasan and M. J. Schulte, “Multiplier Architectures for Media Processing,” Proc. 37th Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 2193–2197, Nov. 2003. [17] P. Mokrian, M. Ahmadi, G. Jullien, and W. C. Miller, “A reconfigurable digit multiplier architecture,” Proc. IEEE Canadian Conference on Electrical and Computer Engineering, pp. 125–128, 2003. [18] C. L. Wey and J. F. Li, “Design of reconfigurable array multipliers and multiplier-accumulators,” Proc. IEEE Asia-Pacific Conference on Circuits and Syst., pp. 37–40, 2004. [19] S. Quan, Q. Qiang, and C. L. Wey, “A novel reconfigurable architecture of low-power unsigned multiplier for digital signal processing,” Proc. IEEE Int. Symp. Circuits and Syst., pp. 3327–3330, 2005. [20] K. E. Wires, M. J. Schulte, and J. E. Stine, “Combined IEEE compliant and truncated floating point multipliers for reduced power dissipation,” Proceedings of the International Conference on Computer Design, pp. 497–500, 2001. [21] Y. C. Lim, “Single-precision multiplier with reduced circuit complexity for signal processing applications,” IEEE Transactions on Computers, Vol. 41, No. 10, pp. 1333–1336, 1992. [22] M. J. Schulte and E. E. Jr. Swartzlander, “Truncated multiplication with correction constant,” Proceedings of the Workshop on VLSI Signal Processing, VI, pp. 388–396, 1993. [23] K. J. Cho, K. C. Lee, J. G. Chung, and K. K. Parhi, “Design Low-error fixed-width modified booth multiplier,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 5, pp. 522–531, 2004. [24] Y. F. Tong, R. A. Rutenbar, and D. F. Nagle, “Minimizing floating-point power dissipation via bit-width reduction,” The 25th International Symposium on Computer Architecture, Barcelona, Spain, pp. 114-118, 1998. [25] D. Tan, C. E. Lemonds, and M. J. Schulte, “Low-Power Multiple-mode Iterative Floating-Point Multiplier with SIMD Support,” IEEE Transactions on Computer, pp.175-187, 2009. [26] G. Even, S. M. Mueller, and P. M. Seidel, “A dual mode IEEE multiplier,” Proceedings of the 2nd Annual IEEE International Conference on Innovative Systems in Silicon, pp. 282–289, 1997. [27] S. R. Kuang, K. Y. Wu, and K. K. Yu, “Energy-Efficient Multiple-mode Floating-Point Multiplier for Embedded Applications,” Journal of Signal Processing Systems, Vol. 72, No. 1, pp. 43–55, July 2013. [28] A. Kumar, “The HP PA-8000 RISC CPU,” IEEE Micro Magazine, vol.17, pp. 27-32, April, 1997. [29] K. C. Teager, “The MIPS R1000 superscalar microprocessor,” IEEE Micro Magazine, vol.16, no.2, pp. 28-40, March, 1996. [30] B. Greer, J. Harrision, G. Henry, W. Li, and P. Tang, “Scientific Computing on the Itanium Processor.,” ACM/IEEE Conference on Supercomputering, pp. 1-8, 2001. [31] E. Quinnell, E. E. Swartzlander, and C. Lemonds, “Bridge Floating-Point Fused Multiply-Add Design,” IEEE Trancations on Very Large Scale Integration (VLSI) Systems, vol. 16, pp. 1727-1731, December 2008. [32] T. Yao, D. Gao, X. L. Ren, L. M. Han, X. Fan, and L. Yang, “A novel floating-point function unit combining MAF and 3-input adder,” 2012 IEEE international Conference on Signal Processing, Commuication and Computing, pp. 109-113, 2012. [33] L. Hunag, L. Shen, K. Dai, and Z. Wang, “A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design,” 18th IEEE Symposium on Computer Arithmetic, pp. 69-76, June 2007. [34] K. Manolopoulos, D. Reisis, and V. A. Chouliaras, “An efficient dual-mode floating-point Multiply-Add Fused Unit,” 2010 17th IEEE International Conference on Electronics, Circuits, and Systems, pp. 5-8, December 2010. [35] L. Hung, S. Ma, L. Shen, Z. Wang, and N. Xiao, “Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support,” IEEE Transactions on Computers, vol. 61, pp. 745-751, May 2012. [36] H. Kual, M. Anders, S. Mathew, S. Hsu, A. Agarwal, F. Sheikh, R. Krishnamurthy, and S. Boeker, “A 1.45GHz 52-to-162GFLOPS/W Variable-Precision Floating-Point Fused Multiply-Add Unit with Certainty Tracking in 32nm CMOS,” 2012 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 182-184, 2012. [37] J. Preiss, M. Boersma, and S. M. Muller, “Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point Units,” 2009 19th IEEE Symposium on Computer Arithmetic, pp. 48-56, June 2009. [38] K. Y. Wu, C. Y. Liang, K. K. Yu, and S. R. Kuang, “Multiple-Mode Floating-Point Multiply-Add Fused Unit for Trading Accuracy with Power Consumption,” 12th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2013), pp. 429-435, 2013. [39] U. Kucukkabak and A. Akkas, “Design and implement -tation of reciprocal unit using table look-up and Newton-Raphson iteration,” Euromicro Symposium on Digital System Design, pp. 249-253, 2004. [40] M. Ercegovac, J. M. Muller, and A. Tisserand, “Simple seed architectures for reciprocal and square root reciprocal,” Thirty-Ninth Asilomar Conference on Signals, Systems and Computers, pp. 1167-1171, 2005. [41] D. D. Caro and N. Petra, ”Elementary functions hardware implementation using constrained piecewise-polynomial approximations,” IEEE Transactions on Computers, Vol. 60 No. 3, pp. 418-432, 2011. [42] S. F. Hsiao, H. J. Ko, and C. S. Wen, “Two-level hardware function evaluation based on correction of normalized piecewise difference function,” IEEE Transactions on Circuits and Systems II: Express Brief, Vol. 59, No.5, pp. 292-296, 2012. [43] J. A. Pineiro, S. F. Oberman, J. M. Muller, and J. D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Transactions on Computers, Vol. 54, No. 3, pp. 304-318, 2005. [44] K. Y. Wu, C. Y. Liang, C. K. Cheng, and S. R. Kuang, “Multi-precision Function Interpolator for Trading Accuracy with Power Consumption,”2014 Annual Conference on Engineering & information Technology , 2014. [45] T. Xiang and K. Benkrid, “Fixed-Point Arithmetic Error Estimation in Monte-Carlo Simulations,” 2010 International Conference on Reconfigurable Computing and FPGAs, pp. 202, 2010. [46] K. I. Kum and W. Sung, “Combined word-length optimization and high-level synproposal of digital signal processing systems,” IEEE Trans. Computers, vol. 20, no. 8, pp. 921–930, Aug. 2001. [47] B.Y. Liu, Y.B. Chai, and X. J. Zhang, “Research on Fixed-Point Simulation Approaches in the Design of FPGAs,” 2010 2nd International Conference on Information Engineering and Computer Science, pp. 1, 2010. [48] W. G. Obsborne, R. C. C. Cheung, J. G. F. Coutinho, W. Luk, and O. Mencer, “Automatic accuracy-guaranteed bit-width optimization for fixed and floating-point systems”, FPM 2007 International Conference on Field Programmable Logic and Applications, pp. 617–620, Aug. 2007. [49] M. A. Cantin, Y. Savaria, and P. Lavoie, “A comparison of automatic word length optimization procedures,” IEEE International Symposium on Circuits and Systems (ISCAS 2002), Vol. 2, pp. 612–615, 2002. [50] J. Cong, K. Gururaj, B. Liu, C.Y. Liu, and Z. Zhang, “Evaluation of Static Analysis Techniques for Fixed-Point Precision Optimization,” 17th IEEE symposium on Field Programmable Custom Computing Machines, PP. 231-234, 2009. [51] M. L. Chang and S. Hauck, “Precis: a design-time precision analysis tool,” The 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 229-238, 2002. [52] J. Pool, A. Lastra, and M. Singh, “Energy-precision tradeoffs in mobile Graphics Processing Units,” IEEE International Conference on Computer Design, pp. 60-67, 2008. [53] Y. Pu and Y. Ha, ”An automated efficient and static bit-width optimization methodology towards maximum bit-width-to-error tradeoff with affine arithmetic model,” Proceedings of the Asia and South Pacific Design Automation Conference, pp. 24-27, 2006. [54] F. C. Fang, C. Tsuhan, and R. A. Rutenbar, “Floating-point error analysis based on affine arithmetic,”IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 561-564, 2003. [55] F. C. Fang, R. A. Rutenbar, M. Puschel, and C. Tsuhan,“Toward Efficient static analysis of finite-precision effects in DSP applications via affine arithmetic modeling,”Proceedings of the 40th annual Design Automation Conference, pp. 496-501, 2003. [56] A. Mallik. D. Sinha, P. Banerjee, and H. Zhou, “Low-Power Optimization by Smart Bit-Width Allocation in a SystemC-Based ASIC Design Environment,” IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, vol. 26, no. 3, pp. 447-455, 2007. [57] J. S. Park, J. H. choi, and K. Roy, “Dynamic Bit-Width Adaptation in DCT: An Approach to Trade Off Image Quality and Computation Energy,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, issue:5, pp. 787. 2010. [58] F. Glover and M. Laguna, TS. Kluwer Academic Publisher, 1997. [59] T. James, C. Rego, and F. Glover, “Multistart TS and Diversification Strategies for the Quadratic Assignment Problem,” IEEE Transactions on System, Man and Cybernetic, Part A: Systems and Humans, pp. 579-596, 2009. [60] M. Y. Qi, L. X. Miao, L. Zhang, and H. Y. Xu, “A new TS heuristic algorithm for the Vehicle Routing Problem with Time Windows,” International Conference on Management Science and Engineering 15th Annual Conference Proceedings (ICMSE), Long Beach, CA, pp. 1648-1653, 2008. [61] M. Huang, R. Luo, and J. Yuan, “Heuristic-Tabu-Genetic Algorithm Based Method for Flowshop Scheduling to Minimize Flowtime,” The Sixth World Congress on Intelligent Control and Automation (WCICA), pp. 7220-7224, 2006. [62] N. Yang, X. Ma, and P. Li, “An Improved Angle-Based Crossover Tabu Search for the Larger-Scale Traveling Salesman Problem,” Global Congress on Intelligent Systems (GCIS’09), pp. 584-587, May, 2009. [63] Z. Huang, “High-Level Optimization Techniques for Low-Power Multiplier Design,” PhD dissertation, University of California, Los Angeles, 2003. [64] K. Y. Wu, S. R. Kuang, and K. K. Yu, “An Exact Method for Estimating Maximum Errors of Multi-mode Floating-point Iterative Booth Multiplier,” International Journal of Computational Science and Engineering, Vol. 8, No. 4, pp. 306–315, 2013. [65] M. Sjalander, M. Drazdziulis, P. Larsson-Edefors, and H. Eriksson, “A low-leakage twin-precision multiplier using reconfigurable power gating,” Proc. IEEE Int. Symp. Circuits and Syst., vol. 2, pp. 1654–1657, 2005. [66] K. Usami, M. Nakata, T. Shirai, S. Takeda, N. Seki, H. Amano, and H. Nakamura, “Implementation and evaluation of fine-grain run-time power gating for a multiplier,” Proc. IEEE Int. Conference on IC Design and Technology, pp. 7–10, 2009. [67] M. H. Chowdhury, J. Gjanci, and P. Khaled, “Innovative power gating for leakage reduction,” Proc. IEEE Int. Symp. Circuits Syst., pp. 1568–1571, 2008. [68] M. C. Wen, S. J. Wang, and Y. N. Lin, “Low-power parallel multiplier with column bypassing,” Electronics Letters, vol. 41, no. 10, pp. 581–583, 2005. [69] E. Hokenek, R. K. Montoye, and P. W. Cook, ”Second-Generation RISC Floating Point with Multiply-Add Fused,” IEEE Journal of Solid-State Circuits, vol. 25, pp. 1207-1213, 1990. [70] E. Quinnell, E. E. Swartlander, and C. Lemonds, “Floating-Point Fused Multiply-add Architectures,” Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computer, pp. 331-337, 2007. [71] R. E. Lander and M. J. Fischer, “Parallel prefix computation,” Journal of the ACM, vol. 27, pp. 831-838, 1980. [72] W. C. Yeh and C. W. Jen, “A high performance carry-save to signed-digit recoder for fused addition-multiplication,” IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3259-3262, 2000. [73] J. L. D. Comba and J. Stolfi, “Affine arithmetic and its applications to computer graph,” Proceedings of the VI SIBGITPI, pp. 9-18, 1993. [74] A. Benedetti and P. Perona, “Bit-width optimization for configurable DSP's by multi-interval analysis,” Thirty-Fourth Asilomar Conference on Signals, Systems and Computers, pp. 355-359, 2000. [75] W. T. J. Chan, A. B. Kahng, S. Kang, R. Kumar, and J. Satori, “Statistical analysis and modeling for error composition in approximate computation circuits,” Statistical analysis and modeling for error composition in approximate computation circuits Computer Design (ICCD), pp. 47-53, 2013. [76] R. E. Moore, “Interval analysis,” Prentice-Hall, 1966. [77] R. C. T. Lee, S. S. Tseng, R. C. Chang, and Y. T. Tsai, “Introduction to the Design and Analysis of Algorithms: A Strategic Approach,” McGraw Hill, (2005). [78] A. Andrel, M. Schmitz, P. Eles, Z. Peng, and B. M. Al-Hashimi, “Overhead-conscious voltage selection for dynamic and leakage energy reduction of time-constrained systems,” IEE Proceedings- Computer and Digital Techniques, Vol.152, No.1, pp. 28-38, 2005. [79] T. Haqras and J. Janecek, “A high performance, low Complexity algorithm for compile-time Job scheduling in homogenous Computing Environments,” International Conference on Parallel Processing Workshops, pp. 149-155, 2003. [80] A. M. A. Malik, M. Ayob, and A.R. Hamdan, “Iterated two-stage multi- neighborhood Tabu search approach for examination timetabling problem,” 2nd Conference on Data Mining and Optimization, pp. 141-148, 2009. [81] S. Tsubakitani and J. R. Evans, “Optimizing Tabu list size for the traveling salesman problem,” Computer & Operations Research, Vol. 194, No. 1, pp. 341–363, 2012. [82] T. C. Pais and P. Amaral, “Managing the Tabu list length using a fuzzy inference system: an application to examination timetabling,” Annals of Operations Research, Vol. 25, No. 2, pp. 91–97 ,1998. [83] P. R. Kumar and S. Palani, “A dynamic voltage scaling with single power supply and varying speed factor for multiprocessor system using genetic algorithm,” 2012 International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME), pp. 342–346 ,2012. [84] D. Oliver, C. David, and J. H. Nathan, Libaffa – C++ affine arithmetic library for GNU/Linux. (http:// http://www.nongnu.org/libaffa/), 2006 [85] ATTILA: http://attila.ac.upc.edu/wiki/index.php/Main_P [86] V. M. del Barrio, C. González, J. Roca, A. Fernández, and R. Espasa, “ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures,” IEEE International Symposium on Performance Analysis of Systems and Software, pp. 231-241, March 2006.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0628114-155539.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS