國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,適用於多媒體應用的低功率多重精確度浮點特殊功能運算器,Multi-precision Floating Point Special Function Unit for Low Power Applications

論文名稱 Title	適用於多媒體應用的低功率多重精確度浮點特殊功能運算器 Multi-precision Floating Point Special Function Unit for Low Power Applications
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	98 學年度第 2 學期 The spring semester of Academic Year 98	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	71
研究生 Author	廖英程 Ying-Chen Liao
指導教授 Advisor	鄺獻榮 Shiann-Rong Kuang
召集委員 Convenor	陳培殷 Pei-Yin Chen
口試委員 Advisory Committee	蕭宇宏, 陳仁德 Yeu-Horng Shiau; Ren-Der Chen
口試日期 Date of Exam	2010-07-29	繳交日期 Date of Submission	2010-09-07
關鍵字 Keywords	低功率特殊功能運算器、多項式逼近法 low-power special function unit, piecewise approximation method
統計 Statistics	本論文已被瀏覽 5634 次，被下載 0 次 The thesis/dissertation has been browsed 5634 times, has been downloaded 0 times.

中文摘要
現今的科技當中含有相當多的多媒體應用，而這些應用在運算中經常可以容許一定的誤差值存在，因此本論文提出兩種具有多重精確度的重複式浮點特殊功能運算器。第一種運算器可以提供使用者四種不同精確度的執行模式，不必每次都計算出最精確的結果。第二種可提供三種精確度模式給予使用者做選擇，進而達到降低功率消耗的結果。為了降低所提架構的面積大小，本論文採用重複式的架構來設計多重精確度浮點特殊功能運算器。第一種架構可以進行三種不同的運算，包含了倒數、倒數開根號以及對數的運算。當使用者決定好要執行的運算時，他也可以選擇要執行的精確模式。我們提出的第一種架構提供使用者四種模式做選取。依照精確度的高低，我們從最低精確度至最高精確度分別把它們定義為第一模式、第二種模式、第三模式以及第四模式。在實做這個架構的時候我們也有發展一個C model來評估每一種運算在執行不同精確度時的誤差百分比。當我們執行倒數運算且選擇最高精確度(第四模式)的時候可提供22位元的精確度，它必須執行兩個週期，而且它的誤差為0.0001%。如果不需要使用到最高精確度時，我們的架構提供兩個中間模式給予使用者做選擇，分別是第三和第二模式。第三模式可以達到15位元的精準度，而我們的架構也需要執行兩個週期來得到這個結果，這個模式的誤差大約在0.01％。第二模式可以提供12位元的精確度。這個模式的硬體只需要執行一次就可以得到結果，而它的誤差百分比是0.05％。當使用者的考量不是著重在視覺呈現或是音訊的品質時，我們的架構提供一種最低精確度(第一模式)的模式給使用者使用。這個模式可以提供8位元的精確度，而它也只需要執行一個週期，這個模式的誤差值是0.8％。另外兩種運算也可以提供使用者四種模式的選擇，倒數開根號運算在每個模式提供的精確度與倒數相同，誤差百分比從最高精確度至最低精確度是0.00004%、0.01%、0.06%和0.5%。而對數提供的精確度從第一模式至第四模式分別是8、12、16以及23位元，誤差百分比從最高精確度至最低精確度分別是0.00003%、0.002%、0.06% 和0.3%。提供這些較低精確度的模式給使用者來做選擇的主要目的就是要降低功率消耗。當我們在執行較低精確度的時候必定可以把某些硬體元件關閉以節省功率消耗。除了模式的變換之外，我們在一些元件輸入前面還加入三態邏輯閘來做開關的動作以提升省電的效果。觀察實驗結果可以發現我們提出的第一種架構的省電效果並不如預期，這是因為在整合牛頓法和多項式逼近法的時候此架構的delay和面積都大幅增加，因而造成省電效果不佳。因此，我們提出第二種重複式的架構來進行實作。第二種架構單純採用並修改多項式逼近法架構，使之具有多重精確度執行模式並能降低功率消耗。此架構與第一個架構支援同樣的三種運算，而提供的模式有三種，第一種模式提供8位元的精確度，第二種模式提供14位元的精確度，而第三種最精確的模式可提供22位元的精確度。透過我們撰寫的C model，我們可以計算出每種運算在不同精確度模式下的最大誤差百分比。在倒數部分，我們的誤差百分比從最低精確度至最高精確度分別是0.19%、0.00006%和0.000015%。在倒數開根號最大誤差百分比則是0.09%、0.000022%和0.000014%，而在對數部分最大誤差百分比則是0.33%、0.000043%和0.000015%。在經過測試之後，我們可以從實驗數據發現我們所提出的第二種架構與傳統多項式逼近法的架構比較之下，面積和delay都降低許多，且功率以及能量消耗方面都有明顯的改善。
Abstract
In today’s modern society, our latest up-to-date technology contains various types of multimedia applications. These applications don’t necessarily have to be executed with the most precise accuracy. In short, they are fault tolerant. As a consequence, this thesis proposes a multi-precision iterative floating-point special function unit, which can be executed under different modes to meet the error requirements of each specific application, and also achieve power reduction during the process. In order to minimize the area of our design, we have developed two iterative architectures to implement the multi-precision floating point special function unit. The first proposed architecture can perform three kinds of operations, which include a reciprocal operation, a reciprocal square root operation, and last but not least, a logarithm operation. After deciding which function is to be performed, the user can choose four precision modes to execute the special function unit. According to each mode from lowest precision to highest, we name them the first mode, the second mode, the third mode, and the fourth mode. During implementation, a C model has also been designed to evaluate the maximum error of each mode by making comparisons with the most accurate software result, which is the 23 bit result. When the reciprocal function is chosen, and the user defines that application to be performed in full precision, the multi-precision special function operator needs to be executed twice, and it has the error rate of approximately 0.0001%. When less precision is required, we can choose from two intermediate modes, one offers 15 bit accuracy, and the other can guarantee a 12 bit precision. The former precision mode also required the hardware to be executed twice, but the latter only once. The 15 bit accuracy mode has an error rate around 0.01％, and the 12 bit mode has the error rate roughly around 0.05％. In addition, when visual effects or even audio effects are not our greatest concern, we provide a least accurate mode for the users to pick to execute the special function operator. This mode can maintain 8 bit accuracy, and has the error rate of approximately 0.8%. Other operations including the reciprocal square root, and the logarithm also have four precision modes to choose from. The reciprcocal square root operation can guarantee the same accuracy in each mode as the reciprocal operation, and their error rates are 0.004%, 0.01%, 0.06%, and 0.5% from the highest precision mode to the lowest one. The precisions the logarithm operation can guarantee from highest accuracy to the lowest one are 23, 16, 12, and 8 bits, respectively, and have error rates including 0.00003%, 0.002%, 0.06%, and 0.3%. These different precision choices are built in the proposed structure mainly to reduce the power consumption. The main concept is to pick a low precision mode in order shut down some components in our design. In addition to switching modes, we’ve also added tri-state buffers in certain components as another means to decrease power. Through experimental results we’ve discovered that the proposed architecture’s affect on power reduction was not as we’ve expected. Due to the integration of the Newton Raphson Method and the Piecewise Polynomial Approximation Method, our architecture’s delay and area have largely increased, and causing a bad influence on saving power. As a consequence, we‘ve developed a second architecture to meet our demands. This architecture is mainly based on the Piecewise Polynomial Approximation Method. From this method, we’ve implemented an iterative design which also supports three kinds of operations, the same as the first architecture. It also provides three precision modes for the user to choose. The lowest precision mode provides 8 bit accuracy. The second mode provides 14 bit accuracy, and the third mode, which is the most precise mode, can provide 22 bit accuracy. According to our C model, we can specify our maximum error rate in each function while executing under different modes. When the reciprocal function is executed, the largest error rate in from the lowest mode to the highest mode is 0.19%, 0.00006% and 0.000015% , and the error rate for reciprocal square root from lowest precision mode to the highest is 0.09%, 0.000022% and 0.000014%, and the error rate for the logarithm function is 0.33%, 0.000043% and 0.000015%, from the lowest to the highest. From experimental results we can discover that the newly proposed architecture is better in comparison with the traditional Piecewise Polynomial Approximation architecture. The proposed architecture has a smaller area, and a faster delay, and most important of all, it reduces power and energy affectively.

目次 Table of Contents
Chapter 1. 概論 1 1.1 研究動機 1 1.2 論文大綱 2 Chapter 2. 相關研究 3 2.1 特數運算單元 3 2.2 IEEE754單精準度倒數運算器 4 2.3 牛頓法求倒數 (NEWTON RAPHSON METHOD FOR RECIPROCAL FUNCTION) 6 2.4 牛頓法求倒數開根號(Newton Raphson Method for Reciprocal Square Root Function) 9 2.5 多項式逼近法（POLYNOMIAL APPROXIMATION METHOD） 11 2.6 多項式逼近法求以2為底之對數 15 Chapter 3. 提出的第一種重複式浮點特殊功能運算器 16 3.1 簡介 16 3.2 設計流程 17 3.3 第一種特殊功能運算器架構 21 3.4 控制電路 30 Chapter 4. 提出的第二種重複式浮點特殊功能運算器 33 4.1 簡介 33 4.2 第二種特殊功能運算器架構 35 Chapter 5. 實驗結果 37 5.1 實驗步驟和使用的設計軟體 37 5.2 其他架構介紹 40 5.3 第一種架構與傳統架構比較之結果 42 5.4 第二種架構與傳統架構比較之結果 49 Chapter 6. 結論和未來研究工作 52 6.1 結論 52 6.2 未來研究工作 53 參考文獻 54

參考文獻 References
[1] Michael J. Schulte, Earl E. Swartzlander, Jr., “Hardware Designs for Exactly Rounded Elementary Functions,” IEEE Trans. Computers, vol. 43, pp. 964-973, Aug. 1994. [2] Michael J. Schulte, James E. Stine, “Symmetric Bipartite Tables for Accurate Function Approximation,” Proc. 13th Symp. Computer Arithmetic, pp. 175-183, 1997. [3] J. A. Piñeiro, J. D. Bruguera, J. M. Muller, “Faithful Powering Computation Using Table Look-Up and a Fused Accumulation Tree,” Proc. 15th IEEE Symp. Computer Arithmetic, pp. 40-47, 2001. [4] Behrooz Parhami, “Computer Arithmetic：Algorithms and Hardware Designs,” Oxford University Press, Inc, 2000. [5] E.George Walters III, Michael J. Schulte, “Efficient Function Approximation Using Truncated Multipliers and Squarers,” Proc. 17th IEEE Symp. Computer Arithmetic, pp. 232-239, 2005. [6] Sandeep B. Singh, Jayanta Biswas, S. K. Nandy, “A Cost Effective Pipelined Divider for Double Precision Floating Point Number” Proc. IEEE 17th International Conference on Application-specific Systems, Architectures and Processors, pp. 132-137, 2006 [7] Jong-Chul Jeong, Woo-Chan Park, Woong Jeong, Tack-Don Han, Moon-Key Lee, “A Cost-Effective Pipelined Divider with a Small Lookup Table”, IEEE Trans. Computers, vol 53, pp. 489-495, 2004. [8] Ahmet Sertba, “A Fast Divider Implementation Based on the Newton-Raphson Method Using Parallel Computation Units”, Istanbul University Engineering Faculty Journal of Electrical & Electronics, vol 2, pp. 409-415, 2002. [9] D. Caro, N. Petra, and A.G.M. Strollo, “A High Performance Floating-Point Special Function Unit Using Constrained Piecewise Quadratic Approximation,” in Proc. IEEE Int. Symp. Circuits and Systems, pp. 472-475, May 2008. [10] S. F. Oberman and M. Y. Siu, “A high-performance area-efficient multifunction interpolator,” in Proc. IEEE 17th Int. Symp. Computer Arithmetic (ARITH17), Jun. 2005, pp. 272–279. [11] S. F. Oberman and M. Y. Siu, “A high-performance area-efficient multifunction interpolator,” in Proc. IEEE 17th Int. Symp. Computer Arithmetic [12] C. Shuang-yan, W. Dong-hui, Z. Tie-jun, and H. Chao-huan. “Design and implementation of a 64/32-bit floating-point division, reciprocal, square root, and inverse square root unit,” In Proc. IEEE Int. on Solid-State and Integrated Circuit Tech, pages 1976–1979, Oct. 2006. [13] Umut Kucukkabak, Ahmet Akkas, “Design and Implementation of Reciprocal Unit Using Table Look-up and Newton-Raphson Iteration,” In Euromicro Symposium on Digital System Design, DSD, pages 249–253, 2004. [14] Dongdong Chen, Bintian Zhou, Zhan Guo, Nilsson, P, “DESIGN AND IMPLEMENTATION OF RECIPROCAL UNIT,” In 48th Midwest Symp. Circuits and Systems, vol. 2, pp1318-1321, 2005. [15] Dong-U Lee, R. Chenug, W. Luk, J, Villasenor, “Hardware Implementation Trade-Offs of Polynomial Approximations and Interpolations,” IEEE Trans. Computers, vol. 57, pp 686-701, May 2008.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.141.100.120 論文開放下載的時間是校外不公開 Your IP address is 3.141.100.120 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS