Responsive image
博碩士論文 etd-0729108-152635 詳細資訊
Title page for etd-0729108-152635
論文名稱
Title
低功率可變延遲浮點乘法器實作
Implementation of Variable-Latency Floating-Point Multipliers for Low-Power Applications
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
83
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2008-07-16
繳交日期
Date of Submission
2008-07-29
關鍵字
Keywords
低功率、浮點乘法器、時脈閘控、可變延遲
clock-gating, variable latency, floating-point multiplier, low-power
統計
Statistics
本論文已被瀏覽 5728 次,被下載 0
The thesis/dissertation has been browsed 5728 times, has been downloaded 0 times.
中文摘要
在現今的許多嵌入式系統應用之中,設計者都偏好使用較省電的浮點乘法單元。在本論文中,我們提出了一個可變延遲(variable latency)之浮點乘法單元架構。它適合使用於低功率消耗、高效能以及高精確度的應用設計之中。此架構將有效數乘法器(significand multiplier)分成上、下兩半部,並且從上半部的有效數乘法器中,預測所需要的有效數乘積(significand product) 和黏著位元(sticky bit)。當可預測正確時,下半部的計算就可以運用時脈閘控使其不作運算,有助於節省功率消耗。此預測機制同時也可以將捨入模式(rounding mode)更加的簡化,幫助浮點乘法運算可以提早完成。
最後,我們詳細描述所提出之浮點乘法器的細部架構,並且將提出的雙精度浮點乘法單元與傳統式以及快速式的浮點乘法單元的功率消耗做比較。實驗結果顯示我們所提出的雙精度浮點乘法單元只比快速式的浮點乘法單元多出些許的面積及延遲,卻可以分別減少最多26.41%以及24.97%的功率消耗與能量消耗。實驗結果亦顯示我們所提出的浮點乘法單元的效能非常接近快速式浮點乘法單元的效能。因此,我們的浮點乘法單元亦非常適合使用於高速的浮點乘法應用上。
Abstract
Floating-point multipliers are typically power hungry which is undesirable in many embedded applications. This paper proposes a variable-latency floating-point multiplier architecture, which is suitable for low-power, high-performance, and high-accuracy applications. The architecture splits the significand multiplier into upper and lower parts, and predicts the required significand product and sticky bit from upper part. In the case of correct prediction, the computation of lower part is disabled and the rounding operation is significantly simplified so that floating-point multiplication can be completed early.
Finally, detailed design and simulation of the floating-point multiplier is presented, together with its evaluation by comparing power consumption with the fast and conventional floating-point multipliers. Experimental results demonstrate that the proposed double-precision multiplier consumes up to 26.41% and 24.97% less power and energy than the fast floating-point multiplier respectively at the expense of only small area and delay overhead. In addition, the results also show that the performance of proposed floating-point multiplier is very approximate to that of fast floating-point multipliers.
目次 Table of Contents
Contents
Chapter 1 Introdution
1.1 Motivation
1.2 Contribution
1.3 Organization
Chapter 2 Background and Related Work
2.1 IEEE Floating-Point Multiplication
2.2 Compression Tree
2.3 Rounding Modes
2.4 Related Work
2.5 Low power technique
Chapter 3 Low Power Floating-Point Multiplication Algorithm
3.1 Low Power Floating-Point Multiplication
3.2 Prediction of c0 and r
3.3 Prediction of s
3.4 Proposed Algorithm
Chapter 4 Rounding and Normalization of Proposed Algorithm
4.1 Rounding and Normalization
4.2 RNE Mode
4.3 RI and RZ Mode
4.4 Rounding for Exact Prediction Chapter 5 Implementation and Results
5.1 Architecture of Proposed Multiplier
5.2 Add-one Circuit and Carry-Select Adder
5.3 LPm
5.4 ORL
5.5 Parallel-Prefix Adder
5.6 Results
Chapter 6 Conclusion and Future Work
6.1 Conclusion
6.2 Future Work
References
參考文獻 References
[1] IEEE Standard 754 for Binary Floating-Point Arithmetic. New York: ANSI/IEEE 754-1985, 1985.
[2] L. Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, vol. 34, pp. 349-356, 1965.
[3] C.S. Wallace, “A Suggestion for Parallel Multipliers,” IEEE Trans. Electronic Computers, vol. 13, pp. 14-17, 1964.
[4] R.M. Owens, R.S. Bajwa, and M.J. Irwin, “Reducing the Number of Counters Needed for Integer Multiplication,” Proc. 12th Symp. Computer Arithmetic, pp. 38-41, 1995.
[5] Z. Wang, A. Jullien, and C. Miller, “A New Design Technique for Column Compression Multipliers,” IEEE Trans. Computers, vol. 44, no. 8, pp. 962-970, Aug. 1995.
[6] V.G. Oklobdzija, D. Villeger, and S.S. Liu, “A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach,” IEEE Trans. Computers, vol. 45, no. 3, pp. 294-306, Mar. 1996.
[7] N. Itoh, Y. Naemura, H. Makino, and Y. Nakase, “A compact 54×54-bit multiplier with improved Wallace-tree structure,” Proc. 9th Symp. VLSI Circuits, pp. 15-16, 1999.
[8] M.R. Santoro, G. Bewick, and M.A. Horowitz, “Rounding Algorithms for IEEE Multipliers,” Proc. 9th Symp. Computer Arithmetic, pp. 176-183, 1989.
[9] N. Quach, N. Takagi, and M. Flynn, “On Fast IEEE Rounding,” Technical Report CSL-TR-91-459, Stanford Univ., Jan. 1991.
[10] R.K. Yu and G.B. Zyner, “167 MHz radix-4 floating point multiplier,” Proc. 12th Symp. Computer Arithmetic, pp. 149-154, 1995.
[11] P.-M. Seidel, “How to Half the Latency of IEEE Compliant Floating-Point Multiplication,” Proc. 24th Euromicro Conf., vol.1, pp. 329-332, 1998.
[12] G. Even and P.-M. Seidel, “A comparison of three rounding algorithms for IEEE floating-point multiplication,” IEEE Trans. Computers, vol. 49, no. 7, pp.638-650, July 2000.
[13] N.T. Quach, N. Takagi, and M.J. Flynn, "Systematic IEEE rounding method for high-speed floating-point multipliers," IEEE Trans. VLSI Systems, vol. 12, no. 5, pp. 511-521, May 2004.
[14] K.E. Wires, M.J. Schulte, and J.E. Stine, “Variable-correction truncated floating point multipliers,” Proc. 34th Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1344-1348, 2000.
[15] J.Y.F. Tong, D. Nagle, and R.A. Rutenbar, “Reducing power by optimizing the necessary precision/range of floating-point arithmetic,” IEEE Trans. VLSI Systems, vol. 8, no. 3, pp. 273-286, June 2000.
[16] K.E. Wires, M.J. Schulte, and J.E. Stine, “Combined IEEE Compliant and Truncated Floating Point Multipliers for Reduced Power Dissipation,” Proc. International Conference on Computer Design, pp. 497-500, 2001.
[17] M. Olivieri, “Design of Synchronous and Asynchronous Variable-latency Pipelined Multipliers,” IEEE Trans. VLSI Systems, vol. 9, no. 2, pp. 365-376, April 2001.
[18] V. Raghunathan, S. Ravi, and G. Lakshminarayana, “Integrating variable-latency components into high-level synthesis,” IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 19, no. 10, pp. 1105-1117, Oct. 2000.
[19] Y. Kondo, N. Ikumi, K. Ueno, J. Mori, and M. Hirano, “An early-completion-detecting ALU for a 1 GHz 64 b datapath,” Proc. 44th IEEE International Solid-State Circuits Conference, pp. 418-419, Feb. 1997.
[20] G. Dimitrakopoulos and D. Nikolos, “High-Speed Parallel-Prefix VLSI Ling Adders,” IEEE Trans. Computers, vol. 54, no. 2, pp. 225–231, 2005.
[21] Y. Choi and E. E. Swartzlander, “Speculative Carry Generation with Prefix Adder,” IEEE Trans. VLSI Systems, vol. 16, no. 3, pp. 321-326, March 2008.
[22] Y. Kim and L.-S. Kim, “64-bit carry-select adder with reduced area,” Electronics Letters, vol. 37, no. 10, pp. 614-615, May 2001.
[23] M. Alioto, G. Palumbo, and M. Poli, “A Gate-level Strategy to Fesign Carry Select Adders,” Proc. International Symposium on Circuits and Systems, vol. 2, pp. 465-468, 2004.
[24] K. Rawat, T. Darwish, and M. Bayoumi, “A low power and reduced area carry select adder,” The 2002 45th Midwest Symposium on , vol.1, pp. I-467-70 vol.1, 4-7 Aug. 2002.
[25] M.J. Schulte, J.E. Stine, and J.G. Jansen, “Reduced power dissipation through truncated multiplication,” Proc. IEEE Alessandro Volta Memorial Workshop on , pp.61-69, 4-5 Mar 1999.
[26] Massoud Pedram and Jan M. Rabaey, Power Aware Design Methodologies. Kluwer Academic Publishers, 2002.
[27] TSMC 0.18 μm Process 1.8-Volt SAGE-X Standard Cell Library Databook. Artisan Components, Inc., Oct. 2003.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.191.216.163
論文開放下載的時間是 校外不公開

Your IP address is 18.191.216.163
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code