國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,低功耗可動態重組之多重精確度特殊功能單元設計,Design of Low-Power and Dynamically Reconfigurable Multi-Precision Special Function Units

論文名稱 Title	低功耗可動態重組之多重精確度特殊功能單元設計 Design of Low-Power and Dynamically Reconfigurable Multi-Precision Special Function Units
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	132
研究生 Author	梁翔皓 Hsiang-Hao Liang
指導教授 Advisor	蕭勝夫 Shen-Fu Hsiao
召集委員 Convenor	陳中和 Chung-Ho Chen
口試委員 Advisory Committee	鄺獻榮, 陳坤志 Shiann-Rong Kuang; Kun-Chih Chen
口試日期 Date of Exam	2017-07-25	繳交日期 Date of Submission	2017-08-28
關鍵字 Keywords	函數值計算、計算機算術、多項式逼近、多重精確度、特殊函數單元 polynomial approximation, multi-precision, special function unit, digital arithmetic, function evaluation
統計 Statistics	本論文已被瀏覽 5662 次，被下載 17 次 The thesis/dissertation has been browsed 5662 times, has been downloaded 17 times.

中文摘要
現今科技產品於立體視覺、影像、通訊等相關的應用中，都會使用到一些特殊函數的運算。以硬體實作各類特殊函數的近似法中，最常被使用的方法為多項式逼近法，此方法是將原本連續函數的曲線，分割成多個子區間、並於各個子區間內，將多項式係數儲存於ROM表中，以查表方式，進行多項式的逼近。在提升精確度的同時，多項式可由原本的二階增長到三階，以避免表格變大，但其架構內部使用的三方器會造成總體面積大幅增加。因此，本論文採取數學領域中，經常用於高階多項式的霍那法，其作法為透過疊代的方式，減少多項式中的計算量，其硬體方面則是由乘法器及加法取代了平方器、三方器等，優點為減少硬體面積，缺點為疊代運算，使得硬體運算延遲變長，而為了減少運算時間，我們將提出改良的霍納法，經比較後的結果為相對於原本三階多項式的架構中，以16bits、24bits精確度的倒數函數為例，速度變快約14%、面積大幅減少約50%；於32bits精確度時，速度雖然變慢1%、但面積則減少了約67%。另外，產品的功率消耗限制也更加嚴苛，可由不同精確度的切換中，減少部分硬體運算，並於高精確度的表格中，達成較低精確度的表格需求，減少功率消耗，而本論文將使用前述提出的改良版霍那法架構應用於四重精確度的設計中。
Abstract
Computation of special functions is widely used in many applications such as stereo vision, image processing, and communication. Piecewise polynomial approximation (PPA) is usually adopted in hardware implementation of special function computation where the coefficients of polynomials that approximate the function in the partitioned input segments are stored in lookup tables (LUT). Optimization of LUT or arithmetic components are the major design considerations in hardware function evaluation. In this thesis, we will focus on the low-cost and low-power design of hardware function evaluation that supports dynamic multiple precisions. In high-precision requirements such as 32-bit accuracy, the degree of per-segment approximation polynomial is usually increased to three in order to reduce the LUT size. However, the involving arithmetic components become more complicated. This thesis proposes several different architectures based on Horner’s rules with iterative computation of lower-degree polynomials, and compare the performance in area, delay and power consumption. Finally, we select the architecture with improved Horner’s rule. In the example of reciprocal function computation with degree-three PPA, the proposed design has area saving rate of up to 60% while almost maintaining the same delay compared with the direct implementation of degree-three polynomials. In dynamic multi-precision computation, t unused LUT and arithmetic components are turned off in order to reduce power consumption in low-precision modes.

目次 Table of Contents
目錄論文審定書 i 摘要 iv Abstract v 目錄 vi 圖目錄(List of Figures) viii 表目錄(List of Tables) xi 第一章、導論 1 1.1 研究動機 1 1.2 論文架構 1 第二章、研究背景與相關研究 3 2.1 ANSI/IEEE Std. 754-1985浮點數標準 3 2.1.1 各類函數浮點格式表示及應用 5 2.2查表法(Look-up Table Methods) 7 2.3間接查表法分類 (In-direct Look-up Table Methods Classification) 9 2.3.1 Computed-Bound Methods 9 2.3.2 Table-Bound Methods 10 2.3.3 In-between Methods 13 2.4函數近似分段查表法(Function Approximation with Piecewise Table method) 14 2.4.1函數區間定義(Definition of function approximation) 16 2.4.2計算分割間距(Calculate the number of subinterval) 17 2.4.3係數產生方式(Coefficients generation) 18 2.5截斷式乘法器(Truncated Multiplier) 21 2.5.1截斷式乘法器修正誤差方法 23 2.5.1.1變數修正法(Variable Correction) 24 2.5.1.2常數修正法(Constant Correction) 25 2.5.2壓縮樹(Tree Compression) 26 2.5.3平方器(Squarer) 29 2.5.4三方器(Cuber) 31 2.6誤差分析方法(Error Analysis Methods) 33 2.6.1預先設定誤差(Error Budget) 33 第三章、三階插值多項式使用改良霍納法之架構與設計 36 3.1三階插值硬體架構(Original Cubic Interpolator Architecture) 37 3.2傳統霍那法之硬體架構(Original Horner’s Rule Architecture) 43 3.3 Horner’s Rule Transformation Architecture - I 48 3.4 Horner’s Rule Transformation Architecture - II 52 3.5 四種架構之面積、速度數據 56 第四章、定點數多重精確度之架構與設計 67 4.1實現多重精確度硬體之算術與表格共用設計 71 4.1.1算術電路共用 75 4.1.2表格共用 80 4.2 Multiple Precision Architecture with Original Cubic Interpolation 87 4.3 Multiple Precision Architecture with Original Horner’s Rule 90 4.4 Multiple Precision Architecture with Horner’s Rule Transformation I 92 4.5 Multiple Precision Architecture with Horner’s Rule Transformation II 94 第五章、實驗結果比較與分析 96 5.1四重精確度架構(8、16、24、32bits)合成之數據比較 96 5.2四重精確度架構(8、16、24、32bits)合成Power數據比較 106 第六章、結論與未來展望 115 6.1 結論 115 6.2 未來展望 116 參考文獻 117

參考文獻 References
A. Abdelgawad, “Low power multiply accumulate unit (MAC) for future Wireless Sensor Networks,”Sensors Applications Symposium (SAS), 2013 IEEE. IEEE, 2013. [2] A., Ahmed; B., Magdy. “High speed and area-efficient Multiply Accumulate (MAC) unit for digital signal prossing applications,” In:Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on. IEEE, 2007. p. 3199-3202. [3] A, Ray. (1998, March). “A survey of CORDIC algorithms for FPGA based computers,” In Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays (pp. 191-200). ACM. [4] WALTERS III, E. George. “Linear and quadratic interpolators using truncated-matrix multipliers and squarers,”Computers, 2015, 4.4: 293-321. [5] K.-E. Wires, J.-S. Michael, and E.-S. James “Variable-correction truncated floating point multipliers,” Signals, Systems and Computers, 2000. Conference Record of the Thirty-Fourth Asilomar Conference on. Vol. 2. IEEE, 2000. [6] A.-A. Liddicoat, J.-F. Michael,“Parallel square and cube computations,” Signals, Systems and Computers, 2000. Conference Record of the Thirty-Fourth Asilomar Conference on. Vol. 2. IEEE, 2000. [7] J.-E. Stine & J. M. Blank (2007, March). “Partial product reduction for parallel cubing,” IEEE Computer Society Annual Symposium on (pp. 337-342).,2007 [8] S. Bui, J. E. Stine and M. Sadeghian(2014, July). “Experiments with high speed parallel cubing units,” In VLSI (ISVLSI), IEEE Computer Society Annual Symposium on (pp. 48-53).,2014 [9] A. R. Cooper “Parallel architecture modified Booth multiplier,” IEEE Proceedings G (Electronic Circuits and Systems). Vol. 135. No. 3. IET Digital Library, 1988. [10] Walters III, E. George, Michael J. Schulte, and Mark G. Arnold. “Truncated squarers with constant and variable correction,” Optical Science and Technology, the SPIE 49th Annual Meeting. International Society for Optics and Photonics, 2004. [11] M..-S .Putrino,Vassiliadis, and E. Schwarz. “Array two's-complement multiplier and square function,” Electronics Letters ,23.22 (1987): 1185-1187. [12] Bickerstaff, K. C., Earl E. Swartzlander, and Michael J. Schulte. “Analysis of column compression multipliers,” Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on., 2001. [13] L. Dadda, “Some Schemes for Parallel Multipliers,” Alta Frequenza, vol. 34, pp. 349{356, 1965. [14] C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. Electron. Comput., vol. EC-13, pp. 14-17, Feb. 1964. [15] H.-J. Ko, and S.-F. Hsiao, “Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding,” IEEE Transactions on Circuits and Systems II: Express Briefs 58.5 (2011): 304-308.. [16] M.-J. Schulte, and E.-S. Earl, “Truncated multiplication with correction constant [for DSP] ,”VLSI Signal Processing, VI, 1993.,[Workshop on]. IEEE, 1993. [17] Oklobdzija, Vojin G., David Villeger, and Simon S. Liu, “A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach,” IEEE Transactions on computers 45.3 (1996): 294-30613 [18] D. Aditya, and J. Draper, “Comparing squaring and cubing units with multipliers,” Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on. IEEE, 2012. [19] 黃奎鈞, “支援雙重精確度之特殊函數運算單元設計及應用”,國立中山大學資訊工程學系碩士論文,2016. [20] T., Masafumi, et al. “Method for reduction of field computation time for discrete ray tracing method,” IEICE Transactions on Electronics, 2014, 97.3: 198-206. [21] N.-M. Ales, R.-M. Gerald, and H.-A. Khalid,“FPGA-based implementation of Horner's rule on a high performance heterogeneous computer,” IEEE SoutheastCon, 2011. [22] G. Forte, , John M. Espinosa-Duran, and Jaime Velasco-Medina. ,“Systolic architectures to evaluate polynomials of degree n using the Horner's rule,” Circuits and Systems (LASCAS), IEEE Fourth Latin American Symposium on., 2013. [23] E.-G.Walters, and J.-S. Michael,“Efficient function approximation using truncated multipliers and squarers,”Computer Arithmetic, 2005. ARITH-17 2005. 17th IEEE Symposium on., 2005. [24] Sadeghian, Masoud, James E. Stine, and E. George Walters.,“Optimized Linear, Quadratic and Cubic Interpolators for Elementary Function Hardware Implementations,” Electronics 5.2 (2016): 17.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0728117-224433.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS