國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,適用多媒體應用之乘法器功率及誤差縮減技術,Power and Error Reduction Techniques of Multipliers for Multimedia Applications

論文名稱 Title	適用多媒體應用之乘法器功率及誤差縮減技術 Power and Error Reduction Techniques of Multipliers for Multimedia Applications
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	98 學年度第 1 學期 The fall semester of Academic Year 98	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	213
研究生 Author	王俊評 Jiun-ping Wang
指導教授 Advisor	鄺獻榮 Shiann-Rong Kuang
召集委員 Convenor	周哲民 Jer-Min Jou
口試委員 Advisory Committee	夏世昌, 余兆棠, 陳培殷, 蕭勝夫, 許明華 Shih-Chang Hsia; Chao-Tang Yu; Pei-Yin Chen; Shen-Fu Hsiao; Ming-hwa Sheu
口試日期 Date of Exam	2010-01-27	繳交日期 Date of Submission	2010-02-03
關鍵字 Keywords	錯誤補償電路、乘法器、低功率 low power, multiplier, error compensation circuit
統計 Statistics	本論文已被瀏覽 5798 次，被下載 13 次 The thesis/dissertation has been browsed 5798 times, has been downloaded 13 times.

中文摘要
近年來，多媒體應用被廣泛地使用在許多嵌入式以及可攜式系統中，例如行動電話、MP3播放器以及個人數位助理，這些產品在高效能的限制下都需要較低的功率消耗。因此，有效的低功率設計變成了超大型積體電路設計中一個極為重要的研究主題。此外，乘法單元總是位於多媒體系統電路的臨界路徑(critical path)上，而且對數位電子產品的效能以及功率消耗具有決定性的影響力。為了達到高效能以及延長電池的可用時數，發展一個高效能且低功率的乘法器是非常重要的。在多媒體和數位訊號處理系統中，許多低功率技術藉由關閉不必要的運算電路來減少乘法器的功率消耗。而且，這些系統中的乘法運算通常允許資料在輸出精確度上具有些許失真，以便節省更多的功率消耗。基於這些概念，本論文根據不同多媒體和數位訊號處理系統中輸入資料特性以及乘法算術特色，提出新的功率縮減和截斷技術以幫助我們設計節能乘法器及高精確度固定寬度乘法器。在陣列及樹狀乘法器的設計方面，我們首先提出一個低功率管線化固定寬度乘法器，它能夠動態地依據輸入資料範圍的大小來關閉不需要運算的加法元件。此外，它還提供了功率消耗和輸出精確度之間的彈性折衷，可重置的特色對於有不同精確度需求的系統而言是非常有用的。其次，我們提出一個低功率可重置布斯乘法器，它提供多種乘法運算模式並且盡可能地消除乘法器中多餘的符號運算。對於同時要求運算效能和彈性的系統而言，此乘法器能夠有效地減少其功率消耗。雖然這兩種低功率乘法器能獲得可觀的功率節省，但在錯誤補償電路的複雜度及平均誤差和均方誤差效能方面上仍不適用於含有大量乘累加運算的多媒體系統。為了有效地改善精確度及電路複雜度，我們針對固定寬度樹狀乘法器和固定寬度改良式布斯乘法器提出了新的錯誤補償電路。在浮點乘法器設計方面，我們提出一個低功率可變潛伏期浮點乘法器，它符合電機電子工程學會二進位浮點數算術標準 (IEEE 754) 並且適用於三維圖學及多媒體應用。在此架構中，我們首先將有效數乘法器分割成上、下兩半部。其次，針對進位位元、黏著位元和上半部有效數乘積提出一項有效的預測機制。當預測正確時，關閉有效數乘法器的下半部運算，因此浮點乘法運算能消耗更少功率並且提早完成。在模數乘法器設計方面，為了設計出高效能及低功率之模數乘法器，我們提出一個有效的模數乘法演算法。它利用商數管線化及消除非必要運算的技術，移除基底2之模數乘法演算法中資料相依性和多餘的計算週期，以便有效改善模數乘法器中的運算速度、功率消耗及能量消耗。
Abstract
Recently, multimedia applications are used widely in many embedded and portable systems, such as mobile phones, MP3 player and PDA, which require lower power consumption within high performance constraints. Therefore, power-efficient design becomes a more important objective in Very Large Scale Integration (VLSI) designs. Moreover, the multiplication unit always lies on the critical path and ultimately determines the performance and power consumption of arithmetic computing systems. To achieve high-performance and lengthen the battery lifetime, it is crucial to develop a multiplier with high-speed and low power consumption. In multimedia and digital signal processing (DSP) applications, many low-power approaches have been presented to lessen the power consumption of multipliers by eliminating spurious computations. Moreover, the multiplication operations adopted in these systems usually allow accuracy loss to output data so as to achieve more power savings. Based on these conceptions, this dissertation considers input data characteristics and the arithmetic features of multiplications in various multimedia and DSP applications and presents novel power reduction and truncation techniques to design power-efficient multipliers and high-accuracy fixed-width multipliers. In the design of array and tree multipliers, we first propose a low power pipelined truncated multiplier which dynamically deactivates non-effective circuitry based on input range. Moreover, the proposed multiplier offers a flexible tradeoff between power reduction and product precision. This reconfigurable characteristic is very useful to systems which have different requirement on output precision. Second, a low-power configurable Booth multiplier that supports several multiplication modes and eliminates the redundant computations of sign bits in multipliers as much as possible is developed. This architecture can efficaciously decrease the power consumption of systems which demand computing performance and flexibility simultaneously. Although these two kinds of low power multipliers can achieve significant power savings, the hardware complexity of error compensation circuits and error performance in terms of the mean error and mean-square error are unsuitable for many multimedia systems composed of a large amount of multiply-accumulate operations. To efficiently improve the accuracy with less hardware complexity, we propose new error compensation circuits for fixed-width tree multipliers and fixed-width modified Booth multipliers. In the design of floating-point multipliers, we propose a low power variable-latency floating-point multiplier which is compliant with IEEE 754-1985 and suitable for 3-D graphics and multimedia applications. In the architecture, the significand multiplier is first partitioned into the upper and lower parts. Next, an efficient prediction scheme for the carry bit, sticky bit, and the upper part of significand product is developed. While the correct prediction occurs, the computation of lower part of significand multiplier is shut down and therefore the floating-point multiplication can consume less power and be completed early. In the design of modular multipliers, we propose an efficient modular multiplication algorithm to devise a high performance and low power modular multiplier. The proposed algorithm adopts the quotient pipelining and superfluous-operation elimination technique to discard the data dependency and redundant computational cycles of radix-2 Montgomery’s multiplication algorithm so that the operation speed, power dissipation, and energy consumption of modular multipliers can be significantly improved.

目次 Table of Contents
Chapter 1 INTRODUCTION 1 1.1 Power-Efficient Pipelined Truncated Multipliers withVarious Precision 3 1.2 Power-Efficient Configurable Booth Multiplier 4 1.3 High-accuracy Low-cost Compensation Circuits for Signed Tree-based Fixed-width Multipliers 5 1.4 High-Accuracy Fixed-Width Modified Booth Multipliers for Lossy Applications 6 1.5 Variable-Latency Floating-Point Multipliers for Low-Power Applications 7 1.6 High-Performance and Low-Power Montgomery Multiplication for RSA Cryptosystems 8 1.7 Organization of the Dissertation 9 Chapter 2 POWER-EFFICIENT PIPELINED TRUNCATED MULTIPLIERS WITH VARIOUS PRECISION 10 2.1 Introduction 10 2.2 Problem Description 13 2.3 Low-power Multiplier with Various Precision 16 2.3.1 Basic Dynamic-range Detector 19 2.3.2 Modified Dynamic-range Detector 22 2.3.3 Sign-bit Generator and Sign-extension Unit 26 2.4 Experimental Results 28 2.5 Conclusion 38 Chapter 3 DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER 39 3.1 Introduction 39 3.2 Configurable Multiplication with One-level Recursion 43 3.3 Power-Aware Configurable Booth Multiplier 50 3.3.1 Dynamic Range Detector 52 3.3.1.1 Switching logic 54 3.3.1.2 Shutdown logic 56 3.3.2 Error-compensation Circuit 60 3.3.3 EV generator and CV generator 62 3.3.4 Adjustor 67 3.3.5 Sign-bit Generator and Sign-extension Unit 68 3.3.6 Error Analysis 70 3.4 Experimental Results 72 3.5 Conclusion 79 Chapter 4 HIGH-ACCURACY LOW-COST COMPENSATION CIRCUITS FOR SIGNED TREE-BASED FIXED-WIDTH MULTIPLIERS 80 4.1 Introduction 80 4.2 Background and Motivation 82 4.3 Proposed Low-cost Compensation Circuits 90 4.4 Experimental Results 98 4.5 Conclusion 103 Chapter 5 HIGH-ACCURACY FIXED-WIDTH MODIFIED BOOTH MULTIPLIERS FOR LOSSY APPLICATIONS 104 5.1 Introduction 104 5.2 Fundamental of Modified Booth Multiplier 108 5.3 Proposed Fixed-width Modified Booth Multiplier 111 5.3.1 Proposed Error Compensation Function 112 5.3.2 Proposed Low Error Compensation Circuit 122 5.3.3 Error Performance 127 5.4 Experimental Results 131 5.5 Conclusion 135 Chapter 6 VARIABLE-LANTECY FLOATING-POINT MULTIPLIERS FOR LOW-POWER APPLICATIONS 137 6.1 Introduction 137 6.2 Low-power Floating-point Multiplication 139 6.2.1 Prediction of c0, r, and s 141 6.2.2 Proposed Algorithm 145 6.3 Architecture and Implementation 149 6.4 Conclusion 154 Chapter 7 DESIGN OF HIGH-PERFORMANCE AND LOW-POWER MONTGOMERY MODULAR MULTIPLIER FOR RSA CRPTOSYSTEMS 155 7.1 Introduction 155 7.2 Montgomery’s Modular Multiplication Algorithm 158 7.3 Proposed Montgomery Modular Multiplication Algorithm 161 7.3.1 Quotient Pipelining 162 7.3.2 Superfluous Operation Elimination 166 7.3.3 Proposed Modified Modular Multiplication Algorithm 169 7.4 Architecture of Modified Modular Multiplier 172 7.4.1 Barrel Register Full Adder Component 172 7.4.2 Proposed Modular Multiplication Architecture 173 7.5 Experimental Results 176 7.6 Conclusion 177 Chapter 8 CONCLUSION AND FUTURE WORK 178 8.1 Conclusion 178 8.2 Future Work 181 References 182 Publication Lists 195

參考文獻 References
[1] C.R. Baugh and B.A. Wooley, “A Two’s Complement Parallel Array Multiplication Algorithm,” IEEE Trans. Computers, C-22, pp. 1045-1047, 1973. [2] C.S Wallace, “A suggestion for a fast multiplier,” IEEE Trans. Electronic Computers, 13, pp. 14-17, 1964. [3] L. Dadda, “Some schemes for parallel multipliers,” Alta Frequenza, 34, pp. 349-356, 1965. [4] A. Habibi and P.A. Wintz, “Fast Multipliers,” IEEE Trans. Computers., C-19, (2), pp. 153-157, 1970. [5] V.G. Oklobdzija, D. Villeger, and S.S. Liu, “A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach,” IEEE Trans. Computers, 45, (3), pp. 294-306, 1996. [6] M.S. Elrabaa, I.S. Abu-Khater, and M.I. Elmasry, “Advanced Low-Power Digital Circuit Techniques”(Kluwer Academic Publishers, 1997). [7] M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthymiou, “Precomputation-Based Sequential Logic Optimization for Low Power,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2, (4), pp. 426-436, 1994. [8] V. Tiwari and P. Ashar, “Guarded Evaluation: Pushing Power Management to Logic Synthesis/Design,” Proc. Int. Symp. on Low Power Design (ISLPD), 1995, pp. 221-226. [9] J. Choi, J. Jeon, and K. Choi, “Power Minimization of Functional Units by Partially Guarded Computation,” in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED), 2000, pp. 131-136. [10] A.A. Fayed and M.A. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers,” in Proc. IEEE Computer Society Annual Workshop on VLSI, 2001, pp.149-154. [11] A. Abddollahi, M. Pedarm, F. Fallah, and I. Ghosh, “Precomputation-based Guarding for Dynamic and Leakage Power Reduction,” in Proc. Int. Conf. on Computer Design (ICCD), 2003, pp. 90-97. [12] Z. Huang and M.D. Ercegovac, “Two-Dimensional Signal Gating for Low-Power Array Multiplier Design,” in Proc. of IEEE Int. Symp. on circuits and systems, 2002, pp. 489-492. [13] O.T.-C. Chen, S. Wang, and Y.-W. Wu, “Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers,” IEEE Trans. VLSI Systems, 11, (3), pp. 418-433, June 2003. [14] H. Lee, “Power-Aware Scalable Pipelined Booth Multiplier,” IEICE Trans. FUNDAMENTALS, 88, (11), pp. 3230-3234, Nov. 2005. [15] J. Park, S. Kim, and Y.S. Lee, “A Low-Power Booth Multiplier Using Novel Data Partition Method,” IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Aug. 2004, pp. 54-57. [16] Y.C. Lim, “Single precision multiplier with reduced circuit complexity for signal processing applications,” IEEE Trans. Computers, 41, (10), pp. 1333-1336, 1992. [17] M.J. Schulte and E.E. Swartzlander, Jr., “Truncated multiplication with correction constant,” VLSI Signal Processing (VLSISP), pp. 388-396, Oct. 1993. [18] S.S. Kidambi, F. El-Guibaly, and A. Antoniou, “Area-efficient multipliers for digital signal processing applications,” IEEE Trans. Circuits & Sys. II, 43, (2), pp. 90-94, 1996. [19] E.J. King and E.E. Swartzlander, Jr., “Data-dependent truncation scheme for parallel multipliers,” Proc. 31st Asilomar Conf. Signals, Systems, Computers (ACSSC), pp. 1178-1182, 1997. [20] E.E. Swartzlander, Jr., “Truncated multiplication with approximate rounding,” Proc. 33st Asilomar Conf. Signals, Systems, Computers (ACSSC), pp. 1480-1483, 1999. [21] J.-M. Jou, S.-R. Kuang, and R.-D. Chen, “Design of low-error fixed-width multipliers for DSP applications,” IEEE Trans. Circuits & Sys. II, 46, (6), pp. 836-842, 1999. [22] M.J. Schulte, J.E. Stine, and J.G. Jansen, “Reduced power dissipation through truncated multiplication,” Proc. IEEE Alessandro Volta Memorial Workshop on Low-Power Design, pp. 61-69, 1999. [23] F. Curticapean and J. Niittylahti, “A hardware efficient direct digital frequency synthesizer,” Proc. IEEE Int. Conf. on Electronics, Circuits, and Systems, pp. 51-54, 2001. [24] L.-D. Van, S.-S. Wang, and W.-S. Feng, “Design of the Low Error Fixed-Width Multiplier and Its Application,” IEEE Trans. Circuits Syst. II, 47, (10), pp. 1112-1118, 2000. [25] J.-S. Wang, C.-N. Kuo, and T.-H. Yang, “Low-power fixed-width array multipliers,” Proc. Int. Symp. on Low Power Electronics and Design (ISLPED), pp. 307-312, Aug. 2004. [26] A.G.M. Strollo, N. Petra, and D.D. Caro, “Dual-tree error compensation for high performance fixed-width multipliers,” IEEE Trans. Circuits & Sys. II, 52, (8), pp. 501-507, 2005. [27] L.D. Van and C.C. Yang, “Generalized low-error area-efficient fixed-width multipliers,” IEEE Trans. Circuits & Systems-I, vol. 52, no. 8, pp. 1608-1619, 2005. [28] Y.-C. Liao, H.-C. Chang, and C.-W. Liu, “Carry Estimation for Two's Complement Fixed-Width Multipliers,” in Proc. IEEE Workshop on Signal Processing Systems, 2006, pp. 345-350. [29] J.P. Wang and S.R. Kuang, “Area-Efficient Signed Fixed-Width Multipliers with Low-Error Compensation Circuit,” in Proc. IEEE Workshop on Signal Processing Systems, 2007, pp. 157-162. [30] T. Kitahara, F. Minami, T. Ueda, K. Usami, S. Nishio, M. Murakata, and T. Mitsuhashi, “A Clock-Gating Method for Low-Power LSI Design,” Int. Symp. On Low Power Electronic Design (ISLPED), pp. 307-312, 1998. [31] F. Emnett and M. Biegel, “Power Reduction Through RTL Clock Gating,” in Synopsys Users Group San Jose, 2000. [32] X. Chang, M. Zhang, G. Zhang, Z. Zhang, and J. Wang, “Adaptive clock gating technique for low power IP core in SOC design,” in Proc. IEEE Int. Symp. on Circuits and Systems, May 2007, pp. 2120-2123. [33] K.C. Bickerstaff, E.E. Swartzlander, Jr. and M.J. Schulte, “Analysis of column compression multipliers,” in Proc. IEEE Symposium on Computer Arithmetic, 2001, pp. 33-39. [34] O. Gustafsson, “Lower bounds for constant multiplication problems,” IEEE Trans. on Circuits and Systems II, vol. 54, no. 11, pp. 974-978, Nov. 2007. [35] P. Mokrian, M. Ahmadi, G. Jullien, and W. C. Miller, “A reconfigurable digit multiplier architecture,” in Proc. IEEE Canadian Conf. on Electrical and Computer Engineering, May 2003, pp. 125-128. [36] C.-L. Wey and J.-F. Liu, “Design of Reconfigurable Array Multipliers and Multiplier-Accumulators,” Proc. IEEE Int. Asia-Pacific Conf. on Circuits and Systems, pp. 37-40, 2004. [37] S. Quan, Q. Qiang, and C. L. Wey, “A novel reconfigurable architecture of low-power unsigned multiplier for digital signal processing,” IEEE Int. Symp. on circuits and Systems, May 2005, pp. 3327-3330. [38] S. Krithivasan and M.J. Schulte, “Multiplier architectures for media processing,” in Proc. 37th Asilomar Conf. Signals, Systems and Computers, Nov. 2003, pp. 2193-2197. [39] Y. Sun, L. Dong, D. Yue, S. Li, and M. Zhang, “Multiple-precision subword-parallel multiplier using correct-value merging technique,” in Proc. 7 th Int. Conf. on ASIC, Oct. 2007, pp. 48-51. [40] Z. Shun, O.A. Pfander, H.-J. Pfleiderer, and A. Bermak, “A VLSI architecture for a run-time multi-precision reconfigurable Booth multiplier,” in Proc. 14th IEEE Int. Conf. on Electronics, Circuits and Systems, Dec. 2007, pp. 975-978. [41] N. Honarmand and A. A. Kusha, “Low power minimization combinational multipliers using data-driven signal gating,” IEEE Int. Conf. Asia-Pacific on circuits and Systems, Dec. 2006, pp. 1430-1433. [42] K.-H. Chen and Y.-S. Chu, “A Spurious-Power Suppression Technique for Multimedia/DSP Applications,” IEEE Trans. on Circuits and Systems I: Regular papers, vol. 56, no. 1, pp. 132-143, Jan. 2009. [43] T. Yamanaka and V. G. Moshnyaga, “Reducing energy of digital multiplier by adjusting voltage supply to multiplicand variation,” in Proc. 46th IEEE Midwest Symp. on Circuits and Systems, Dec. 2003, pp. 1423-1426. [44] N.-Y. Shen and O.T.-C. Chen, “Low-power multipliers by minimizing switching activities of partial products,” in Proc. IEEE Int. Symp. on Circuits and Systems, May 2002, vol. 4, pp. 93-96. [45] S.-J. Jou, M.-H. Tsai, and Y.-L. Tsao, “Low-Error Reduced-Width Booth Multipliers for DSP Applications,” IEEE Trans. Circuits & Sys. I: Fundamental and Applications, 50, (11), pp. 1470-1474, 2003. [46] M.-A. Song, L.-D. Van, T.-C. Huang, and S.-Y. Kuo, “A Low-Error and Area-Time Efficient Fixed-Width Booth Multiplier,” Proc. 46st IEEE Int. Midwest Symp. on Circuits and Systems, pp. 590-593, 2003. [47] M.-A. Song, L.-D. Van, T.-C. Huang, and S.-Y. Kuo, “A Generalized Methodology for Low-Error and Area-Time Efficient Fixed-Width Booth Multipliers,” 47st IEEE Int. Midwest Symp. on Circuits and Systems, pp. I–9–12, 2004. [48] M.-A. Song, L.-D. Van, C.-C. Yang, S.-C. Chiu, and S.-Y. Kuo, “A Framework for the Design of Error-Aware Power-Efficient Fixed-width Booth Multipliers,” IEEE Int. Symp. on Circuits and Systems, pp. 81-84, 2005. [49] M.-A. Song, L.-D. Van, and S.-Y. Kuo, “Adaptive Low-Error Fixed-Width Booth Multipliers,” IEICE Trans. Fundamentals, vol. E90-A, no. 6, pp. 1180-1187, June 2007. [50] K.J. Cho, J.G. Chung, and K.K. Parhi, “Design of Low-Error Fixed-Width Modified Booth Multiplier,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 12, (5), pp. 522-531, 2004. [51] H.-A. Huang, Y.-C. Liao, and H.-C. Chang, “A Self-Compensation Fixed-Width Booth Multiplier and Its 128-points FFT Applications,” Proc. Int. Symp. on Circuits and Systems (ISCAS), pp. 3538-3541, 2006. [52] T.-B. Juang and S.-F. Hsiao, “Low-power carry-free fixed-width multipliers with low-cost compensation circuit,” IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing, vol. 52, no. 6, pp. 299-303, June 2005. [53] K.J. Cho, J.G. Chung, and K.K. Parhi, “Low-Error Fixed-Width Modified Booth Multiplier,” in Proc. IEEE Workshop on Signal Processing Systems, pp. 45-50, 2002. [54] C. J. Nicol and P. Larsson, “Low power multiplication for FIR filters,” in Proc. Int. Symp. on Low Power Electronics and Design, Aug. 1997, pp.76-79. [55] G. O. Young, A. Inoue, R. Ohe, S. Kashiwakura, S. Mitarai, T. Tsuru, and T. Izawa, “A 4.1-ns compact 54x54 multiplier utilizing sign-select Booth encoders,” IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1676-1682, Nov. 1997. [56] F. Elguibaly, “A fast parallel multiplier-accumulator using the modified Booth algorithm,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 9, pp. 902-908, Sep. 2000. [57] W.-C. Yeh and C.-W. Jen, “High-Speed Booth Encoded Parallel Multiplier Design,” IEEE Trans. Computers, vol. 49, no. 7, pp. 692-701, July 2000. [58] K. Choi and M. Song, “Design of a high performance 32x32-bit multiplier with a novel sign select Booth encoder,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. 2, 2001, pp. 701-704. [59] E. de Angel and E. E. Swartzlander, Jr., “Low power parallel multipliers,” Workshop on VLSI Signal Processing, IX, 1996, pp. 199-208. [60] A. A. Farooqui and V. G. Oklobdzija, “General data-path organization of a MAC unit for VLSI implementation of DSP processors,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. 2, 1998, pp. 260-263. [61] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed. Cambridge, MA: MIT Press, 1990. [62] Z. Hong and R. Sedgewick, “Notes on merging networks,” in Proc. ACM Symp. Theory Comput., 1982, pp. 296-302. [63] IEEE Standard for Binary Floating-Point Arithmetic. New York: ANSI/IEEE 754-1985, 1985. [64] IEEE Standard for Floating-Point Arithmetic. New York: IEEE 754™-2008, 2008. [65] Z. Wang, A. Jullien, and C. Miller, “A New Design Technique for Column Compression Multipliers,” IEEE Trans. Computers, vol. 44, no. 8, pp. 962-970, Aug. 1995. [66] N. Quach, N. Takagi, and M. Flynn, “On Fast IEEE Rounding,” Technical Report CSL-TR-91-459, Stanford Univ., Jan. 1991. [67] R.K. Yu and G.B. Zyner, “167 MHz radix-4 floating point multiplier,” Proc. 12th Symp. Computer Arithmetic, 1995, pp. 149-154. [68] P.-M. Seidel, “How to Half the Latency of IEEE Compliant Floating-Point Multiplication,” Proc. 24th Euromicro Conf., vol.1, 1998, pp. 329-332. [69] G. Even and P.-M. Seidel, “A comparison of three rounding algorithms for IEEE floating-point multiplication,” IEEE Trans. Computers, vol. 49, no. 7, pp.638-650, July 2000. [70] N.T. Quach, N. Takagi, and M.J. Flynn, "Systematic IEEE rounding method for high-speed floating-point multipliers," IEEE Trans. VLSI Systems, vol. 12, no. 5, pp. 511-521, May 2004. [71] K.E. Wires, M.J. Schulte, and J.E. Stine, “Variable-correction truncated floating point multipliers,” Proc. 34th Asilomar Conference on Signals, Systems and Computers, vol. 2, 2000, pp. 1344-1348. [72] J.Y.F. Tong, D. Nagle, and R.A. Rutenbar, “Reducing power by optimizing the necessary precision/range of floating-point arithmetic,” IEEE Trans. VLSI Systems, vol. 8, no. 3, pp. 273-286, June 2000. [73] K.E. Wires, M.J. Schulte, and J.E. Stine, “Combined IEEE compliant and truncated floating point multipliers for reduced power dissipation,” Proc. International Conference on Computer Design, 2001, pp. 497-500. [74] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signature and public-key cryptosystems,” Communications of the ACM, vol. 21, pp. 120-126, Feb. 1978. [75] P. L. Montgomery, “Modular multiplication without trial division,” Mathmatics Computation, vol. 44, pp. 519-521, Apr. 1985. [76] C. K. Koc, T. Acar, and B. S. Kaliski, Jr., “Analyzing and comparing Montgomery multiplication algorithms,” IEEE Micro. Chip, Systems, Software and Applications, pp. 26-33, June 1996. [77] Y. S. Kim, W. S. Kang, and J. R. Choi, “Implementation of 1024-bit modular processor for RSA cryptosystem,” in Proc. IEEE Asia-Pacific Conf. on ASIC, Aug. 2000, pp. 187-190. [78] V. Bunimov, M. Schimmler, and B. Tolg, “A complexity-effective version of Montgomery’s Algorithm,” in Proc. Workshop on Complexity Effective Designs, May 2002. [79] A. Cilardo, A. Mazzeo, L. Romano, and G. P. Saggese, “Carry-save Montgomerymodular exponentiation on reconfigurable hardware,” in Proc. Des., Autom. Test Eur. Conf. Exhibition, Feb. 2004, vol. 3, pp.206-211. [80] Z. B. Hu, R. M. Al Shboul, and V. P. Shirochin, “An efficient architecture of 1024-bits Cryptoprocessor for RSA Cryptosystem based on modified Montgomery’s algorithm,” in Proc. 4th IEEE Int. Workshop on Intelligent Data Acquisition and Advanced Computing Systems, Sept. 2007, pp.643-646. [81] C. McIvor, M. McLoone, and J. V. McCanny, “Modified Montgomery modular multiplication and RSA exponentiation techniques,” IEE Proc.-Comput. Digit. Techniques, vol. 151, no. 6, pp. 402-408, Nov. 2004. [82] K. Manochehri, and S. Pourmozafari, “Fast Montgomery modular multiplication by pipelined CSA architecture,” in Proc. IEEE Int. Conf. Microelectron., Dec. 2004, pp. 144-147. [83] K. Manochehri, and S. Pourmozafari, “Modified radix-2 Montgomery modular multiplication to make it faster and simpler,” in Proc. IEEE Int. Conf. on Information Technology, April 2005, vol. 1, pp. 598-602. [84] M. D. Shieh, J. H. Chen, H. H. Wu, and W. C. Lin, “A new modular exponentiation architecture for efficient design of RSA Cryptosystem,” IEEE Trans. Very Large Scale Integr. Syst., vol. 16, no. 9, pp. 1151-1161, Sept. 2008. [85] C. D. Walter, “Systolic modular multiplication,” IEEE Trans. Computer, vol. 43, no. 3, pp. 376-378, Mar. 1993. [86] A. A. Tiountchik, “Systolic modular exponentiation via Montgomery algorithm,” Electronics Letters, vol. 34, no. 9, pp. 874-875, Apr. 1998. [87] J. H. Hong and C. W. Wu, “Cellular-array modular multiplier for fast RSA public-key cryptosystem based on modified booth’s algorithm,” IEEE Trans. Very Large Scale Integr. Syst., vol. 11, no. 3, pp. 474-484, Jun. 2003. [88] Q. Liu, F. Ma, D. Tong, and X. Cheng, “A regular parallel RSA processor,” in Proc. 47th IEEE Midw. Symp. Circuits and Systems, Jul. 2004, vol. 3, pp. iii-467–iii-470. [89] A. P. Fournaris and O. Koufopavlou, “A new RSA encryption architecture and hardware implementation based on optimized Montgomery multiplication,” in Proc. IEEE Int. Symp. Circuits and Systems, May 2005, vol. 5, pp. 4645-4648. [90] N. Nedjah and L. M. Mourelle, “Three hardware architectures for the binary modular exponentiation: sequential, parallel, and systolic” IEEE Trans. Circuits and Systems I: Regular papers, vol. 53, no. 3, pp. 627-633, March 2006. [91] U. Prabhu and B. M. Panqrle, “Superpipelined control and data path synthesis,” in Proc. 29th Design Automation Conference, Jun. 1992, pp. 638-643. [92] J. P. Wang and S. R. Kuang, “Design of parallelized controllers for high-performance controller-datapath systems,” in 9th Int. Workshop on Cellular Neural Networks and Their Applications, May 2005, pp. 257-260. [93] M.E. Paul and K. Bruce, C Language Algorithms for Digital Signal Processing (Prentice-Hall, 1991). [94] K. Andra, C. Chakrabarti, and T. Acharya, “A VLSI Architecture for Lifting-Based Forward and Inverse Wavelet Transform,” IEEE Trans. Signal Process., 50, (4), pp.966-977, 2002. [95] X. Lan, N. Zheng, and Y. Liu, “Low-Power and High-Speed VLSI Architecture For Lifting-Based Forward and Inverse Wavelet Transform,” IEEE Trans. Consumers Electronics, 51, (2), pp.379-385, 2005. [96] RGB/YUV Pixel Conversion, http://www.fourcc.org/fccyvrgb.php. [97] A. Garimella, M.V.V. Satyanarayana, P.S. Murugesh, and U.C. Niranjan, “ASIC for digital color image watermaking,” in IEEE Signal Processing Education Workshop, 2004, pp. 292-296. [98] IEEE Std 1180-1990, “IEEE Standard Specification for the Implementation of 8x8 Inverse Discrete Cosine Transform,” Institute of Electrical and Electronics Engineers, Dec. 1990. [99] K. R. RAO and P. YIP, Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press, 1990. [100] E.G. Walters, M.G. Arnold, and M.J. Schulte, “Using Truncated Multipliers in DCT and IDCT Hardware Accelerators,” Proc. SPIE: Advanced Signal Processing Algorithms, Architectures, and Implementations XIII, pp. 573-584, 2003. [101] B. Gordon, N. Chaddha, and T.H.-Y. Meng, “A low-power multiplierless YUV to RGB converter based on human vision perception,” in Proc. Workshop on VLSI Signal Processing, Oct. 1994, pp. 408-417. [102] CIC Referenced Flow for Cell-based IC Design, Chip Implementation Center, CIC, Taiwan, Document no. CIC-DSD-RD-08-01, 2008

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.118.140.108 論文開放下載的時間是校外不公開 Your IP address is 18.118.140.108 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS