國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,可執行特殊函數與浮點乘加運算之可變精確度架構,A Variable-precision Architecture for Special Function and Floating-point Multiply-add-fused Operation

論文名稱 Title	可執行特殊函數與浮點乘加運算之可變精確度架構 A Variable-precision Architecture for Special Function and Floating-point Multiply-add-fused Operation
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	103 學年度第 2 學期 The spring semester of Academic Year 103	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	74
研究生 Author	徐立緯 Li-wei Hsu
指導教授 Advisor	鄺獻榮 Shiann-Rong Kuang
召集委員 Convenor	陳培殷 Pei-yin Chen
口試委員 Advisory Committee	陳春僥, 謝東佑, 郭可驥 Chuen-Yau Chen; Tong-Yu Hsieh; ko-chi kuo
口試日期 Date of Exam	2015-07-21	繳交日期 Date of Submission	2015-07-30
關鍵字 Keywords	低功率、可變精確度特殊函數插補器、可變精確度浮點乘加器 A Variable-precision floating point multiply-add-fused, low power, A Variable-precision function interpolator
統計 Statistics	本論文已被瀏覽 5674 次，被下載 29 次 The thesis/dissertation has been browsed 5674 times, has been downloaded 29 times.

中文摘要
本論文提出一個符合IEEE-754單精度浮點數標準的可變精確度浮點運算器，此運算器結合了特殊函數插補器與浮點乘加器，使用者可以使用此運算器執行指數、對數、倒數、倒數開根號、乘法、加法與乘加運算，而每種運算可以使用不同精確度模式運算。硬體架構為管線化設計，適用於數位訊號處理器(DSP)、圖形處理器(GPU)…等等之硬體架構。特殊函數插補器是透過計算二次多項式來得到與目標函數相似的近似值，而二次多項式之係數是由多區間之極大極小近似法來求得，並且儲存於表格內，以提供二次多項式計算時查詢係數。而浮點乘加器則是將浮點乘法與浮點加法組合為單一硬體，用來執行乘累加運算，也就是執行A+B×C，當進行乘法時，會平行執行加法的小數點對齊，藉此提升效能。　　可變精確度浮點運算器的概念簡單來說，就是當使用者不需要高精確的運算時，可以進行較低精確的運算來節省功耗。因為高精確度硬體使用的元件比低精確度多，所以使用高精確硬體會比使用低精確度硬體多消耗不少功率，而可變精確度浮點運算器可以使用同一硬體來執行低精確度運算，且可根據需要決定要運算多少精確度，不需額外增加一個低精確度的硬體。例如，原本的運算器為IEEE-754雙精度標準的硬體，如果運算資料只需要IEEE-754單精度標準之精確度，使用雙精度標準的硬體去執行之功耗，相較使用單精度硬體會高不少。而本論文的可變精確度浮點運算器則可使用同一硬體執行IEEE-754單精度，或是其他較低精度之運算，藉此省下不必要的浪費。當執行非最高精確度運算時，會使用時脈閘控與拴鎖器來關閉不運作的元件，藉此減少非最高精確度模式運算功率消耗。此外，由於特殊函數插補器同一時脈只能執行四種運算其中一種，亦即只會使用到該運算表格的二次多項式係數，因此可以在其他運算之係數表格前加上栓鎖器，避免其他三種運算的表格有動態功率之消耗。如此一來，即便執行的是最高精確度運算，也可以減少功率消耗。有鑑於傳統浮點乘加器單獨執行乘法運算時，加法的部份運算元件也會有功率消耗，因此加上栓鎖器減少加法部分之元件的動態功率消耗；而單獨執行加法運算時，亦是加上栓鎖器，以減少乘法器之動態功率消耗。而本論文提出之架構可在執行特殊函數運算時關閉浮點乘加器。執行加法、乘法、乘累加時會關閉特殊函數插補器。關鍵詞：低功率、可變精確度浮點乘加器、可變精確度特殊函數插補器
Abstract
This thesis presents a variable-precision floating-point arithmetic unit based on IEEE-754 single precision floating standard. This arithmetic unit combines special function interpolator and floating-point multiply-add-fused. The arithmetic unit provides Exponential, Logarithm, Reciprocal, Reciprocal square root, multiplication, addition, and multiply-add operations. Each operation also provides different precision mode. Its hardware architecture is a pipeline design, which can be used in DSP, GPU and so on. 　　With computing quadratic polynomial, special function interpolator obtains the approximate value which is close to the objective function. Coefficient of quadratic polynomial is computes by using piecewise minimax approximation, and stored in the table for searching the coefficient when computing quadratic polynomial. The floating-point MAF combines the floating-point multiplication and floating-point addition into a single hardware, which is used to execute the multiply and accumulate like A+B×C. When executing multiplication, the floating-point MAF aligns the decimal point of the addition parallelly to increase the performance. 　　In other words, the concept of the variable-precision floating-point arithmetic unit is that it can decrease power consumption by executing the low precision operation when the user doesn`t need the high precision result. Because the higher precision operation uses more cells than the lower precision. Variable-precision floating-point arithmetic unit uses the same hardware to execute the different precision operations, and one proper precision mode can be selected according to user’s requirement without adding extra hardware for lower precision mode. For example, the original arithmetic unit is IEEE-754 double precision standard hardware. If only the data compliant with IEEE-754 single precision standard are needed, it would cost much more power consumption when we use the double precision arithmetic unit. Therefore, variable-precision floating-point arithmetic unit can save the unnecessary power by executing both the single and the double precision operations with the same hardware. 　　When execution non-highest precision operations, the clock-gating cells and latches are used to close the unnecessary cells to decrease the power consumption of the non-highest precision operations. Because the special function interpolator can only execute one of the four operations to reduce the dynamic power consumption of the tables for other three operations. Therefore, we can still decrease the power consumption even the highest precision operation is executed. Furthermore, the traditional multiply-add-fused still causes some power consumption in addition unit when only the multiplication is executed. Consequently, we use latches to decrease the dynamic power consumption of the addition units. When executing only the addition operation, we also use latches to decrease the dynamic power consumption of the multiplication unit. Key words: low power, A Variable-precision floating point multiply-add-fused, A Variable-precision function interpolator

目次 Table of Contents
目錄誌謝 ii 論文提要 iii 摘要 iv Abstract vi 第一章概論 1 1.1 研究動機 1 1.2 論文大綱 3 第二章研究背景 4 2.1 IEEE-754單精度浮點數標準 4 2.2 特殊函數插補器 5 2.2.1 多項式逼近法 5 2.2.2 二次多項式的特殊函數插補器架構 7 2.3 傳統特殊函數插補器 8 2.3.1 多項式係數之產生 10 2.3.2 平方器 11 2.3.3 布斯編碼器與部分積產生器 13 2.3.4 壓縮樹 15 2.4 浮點乘加器 18 2.4.1 浮點乘法與加法原理 18 2.4.2 傳統浮點乘加器 19 2.4.3 乘法器 21 2.4.4 移位器 23 2.4.5 捨入 24 第三章多重精確度特殊函數插補器 25 3.1 傳統特殊函數插補器的部分積排列 25 3.2 可變精確度特殊函數插補器 26 3.2.1 低精確度特殊函數插補器的部分積列排列 26 3.2.2 可變確度特殊函數插補器之實現 28 第四章多重精確度浮點乘加器 35 4.1 傳統浮點乘加器的部分積排列 35 4.2 可變精確度浮點乘加器 35 4.2.1 低精確度浮點乘加器的部分積排列 35 4.2.2 可變確度浮點乘加器之實現 36 4.3 可變精確度之浮點乘加器與特殊函數插補器共用硬體 39 第五章實驗結果 42 5.1 實驗步驟與方法 42 5.2 特殊函數插補器驗證與數據 43 第六章結論與未來研究方向 57 6.1 結論 57 6.2 未來研究方向 57 參考文獻 58

參考文獻 References
[1] “IEEE Standard for Floating-Point Arithmetic,” 2008. [2] 程建綱, “適用於多媒體應用的多重精確度函數插補器”,國立中山大學資訊工程學系碩士論文, 2012. [3] J.A. Pineiro, S.F. Oberman, J.-M. Muller and J.D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Transactions on Computers, vol. 54, no. 3, pp. 304-318, 2005. [4] M.J. Schulte and K.E. Wires, “High-speed inverse square roots, ” IEEE 14th Symp. Computer Arithmetic, pp. 124-131, 1999 [5] R.H. Strandberg, L.G. Bustamante, V.G. Oklobdzija, M.A. Soderstrand and Jean-Claude Duc, “Efficient realizations of squaring circuit and reciprocal used in adaptive sample rate notch filters,” Journal of VLSI Signal Processing, vol. 14, no. 3, pp. 303-309, 1996. [6] P. Bonatto and V.G. Oklobdzija, “Evaluation of Booth's algorithm for implementation in parallel multipliers,” IEEE Conference on Signals, Systems and Computers (ASILOMAR-29), vol. 1, pp. 608-610, 1996. [7] Zhijun Huang, “High-level optimization techniques for low-power multiplier design,” PhD dissertation, Univ. of California, Los Angeles, 2003. [8] 余其坤, “適用於低功率應用的多重模式浮點乘加器” 國立中山大學資訊工程學系碩士論文, 2011. [9] Kun-Yi Wu, Chih-Yuan Liang, Kee-Khuan Yu, and Shiann-Rong Kuang, “Multiple-mode floating-point multiply-add fused unit for trading accuracy with power consumption”, IEEE International Conference on Computer and Information Science, pp. 429-435, 2013. [10] 姬瑋忠, “適用於三維圖形處理器之低功率特殊函數指令精確度分配系統”, 國立中山大學資訊工程學系碩士論文, 2013. [11] Wen-Chang Yeh and Chein-Wei Jen, “A high performance carry-save to signed-digit recoder for fused addition-multiplication,” IEEE ICASSP, vol. 6, pp. 3259-3262, 2000. [12] Kucukkabak, U. and Akkas, A. , “Design and implementation of reciprocal unit using table look-up and Newton-Raphson iteration,” Euromicro Symposium on Digital System Design , pp. 249-253, 2004. [13] Erez, S. and Even, G. , “An improved micro-architecture for function approximation using piecewise quadratic interpolation,” IEEE International Conference on Computer Design, pp. 422-426, 2008. [14] B. Nam, H. Kim and H. Yoo, “A low-power unified arithmetic unit for programmable handheld 3-D graphics system,” IEEE J. Solid-State Circuits, vol. 42, no. 8, pp.1767 -1778, 2007. [15] Shen-Fu Hsiao, Chan-Feng Chiu and Chia-Sheng Wen, “Design of a low-cost floating-point programmable vertex processor for mobile graphics applications based on hybrid number system,” IEEE International Conference on IC Design & Technology, pp.1-4, 2011. [16] Sameh Galal, Ofer Shacham, John S. Brunhaver II, Jing Pu, Artem Vassiliev, and Mark Horowitz, “FPU generator for design space exploration,” IEEE Symposium on Computer Arithmetic, pp.25-34, 2013. [17] 林柏廷, “可用於三維圖形運算之低功率多重精確度功能單元產生器”, 國立中山大學資訊工程學系碩士論文, 2014. [18] D.D. Caro and N. Petra, “Elementary Functions Hardware Implementation Using Constrained Piecewise-Polynomial Approximations,” IEEE Transactions on Computers, vol. 60, no. 3, pp. 418-432, 2011. [19] M. Ercegovac, J.-M. Muller and A. Tisserand, “Simple Seed Architectures for Reciprocal and Square Root Reciprocal,” IEEE Conference on Signals, Systems and Computers (ASILOMAR-39), pp.1167-1171, 2005.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0630115-162031.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS