Responsive image
博碩士論文 etd-0629117-231250 詳細資訊
Title page for etd-0629117-231250
論文名稱
Title
採用捨棄式乘法器與平方器的硬體函數計算單元之設計最佳化
Design Optimization of Hardware Function Evaluation Units with Truncated Multipliers and Squarers
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
112
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-07-25
繳交日期
Date of Submission
2017-07-30
關鍵字
Keywords
數位算術運算、等份切割法、捨棄式乘法器和捨棄式平方器、函數求值硬體、多項式近似法
function evaluation, polynomial approximation, uniform segmentation, digital arithmetic, truncated multiplier, truncated squarer
統計
Statistics
本論文已被瀏覽 5665 次,被下載 86
The thesis/dissertation has been browsed 5665 times, has been downloaded 86 times.
中文摘要
在算術運算中函數近似法扮演著重要的角色,如繪圖處理器的特殊函數單元、立體視覺和 3D 影像處理相關研究。以硬體實現函數求值的方法中,最常見的架構通常包含表格和算術電路,其中又以等份多項式近似架構最為成熟。此論文提出一個新的方法以捨棄式乘法器和捨棄式平方器來優化等份多項式近似架構。過去的方法通常透過事先配置允許誤差在各個硬體元件(如 LUT和算術單元)來決定係數寬度以及算數單元大小。本論文提出綜合所有誤差來源,如近似誤差、量化誤差、算術單元捨棄所造成誤差和最後四捨五入誤差一併考慮,因此能更有效運用總誤差來優化各元件大小,提升整體電路面積以及縮短延遲。本論文也發現表格最佳化的情況下,無法保證其整體電路面積是最小的,原因在於算術電路面積佔整體面積中一半以上,所以本論文利用捨棄表格面積(以誤差角度來看,減少近似誤差和量化誤差,讓更多允許誤差發生在算術單元)來賺取更小的算術電路面積,在這兩者之間做取捨找出之間的平衡點來確保整體面積最佳化,數據也顯示適當的放寬表格大小在面積和速度上有更好的表現。
Abstract
Function evaluation is an important operation in the design of special function unit in graphics processing unit (GPU) and other applications in stereo vision and 3D image processing. Among various hardware function evaluation design methods, piecewise polynomial approximation (PPA), composed of Look-Up Table (LUT) and simple arithmetic components of multipliers and adders, is the most popular approach. In the thesis, we present a new design with truncated multipliers and squarer for area optimization of PPA with uniform segmentation. Unlike the previous designs that determine the design parameters of bit widths of the hardware components using error budget assignment, we propose a combined error optimization method that jointly considers different error sources, including approximation error, quantization error, truncation error, and rounding error so that the area cost and delay can be further reduced. It is observed that optimization of LUT size does not necessarily lead to smallest total area because LUT only takes a small portion of total area in small and medium precisions where the area of arithmetic component takes more than 50%. The experimental results show the trade-off between the area of LUT area and arithmetic components, allowing us to find the optimized design with the smallest total area cost.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 viii
第1章、 導論 1
1.1 研究動機 1
1.2 論文架構 2
第2章、 研究背景與相關研究 3
2.1 函數近似方法分類 3
2.2 查表法(Table-lookup Methods) 4
2.3 間接查表法分類 6
2.3.1 Computed-Bound Methods 6
2.3.2 Table-Bound Methods 6
2.3.3 In-between Methods 10
2.4 多項式函數近似Polynomial Function Approximation 12
2.4.1 各類函數研究(Investigated functions) 12
2.4.2 函數分區(Partitioning) 17
2.4.3 決定分區間距(Determining the partitioning interval) 19
2.5.3 計算係數(Calculating the coefficients) 20
2.5.4 Faithfully rounding and exactly rounding 22
2.5 誤差分析(Error Analysis In Piecewise Polynomial) 24
2.5.1 誤差分配(Error Budget) 25
2.6 係數優化方法比較 29
第3章、 Truncated multiplier and squarer在function evaluation 之應用 32
3.1 截斷式乘法器修正誤差方法 34
3.1.1 Constant correction truncated multiplier 35
3.1.2 Variable correction truncated multiplier 36
3.2 截斷式平方器修正誤差方法 37
3.1 Constant correction truncated squarer 39
3.2 Variable correction truncated squarer 40
3.3 Tree reduction of parallel multiplier and squarer 41
3.4 Truncated multiplier and squarer運用在function evaluation之架構 43
第4章、 Combine error method 46
4.1 方法敘述 46
4.2 實作方法 47
4.2.1 Optimizing using Truncated-Matrix Units 52
4.2.2 係數調整及設計優化 56
4.2.3 整合誤差(Combined Error)與窮舉搜尋(Exhaustive Search) 62
4.3 優化目標 63
4.3.1 Total table size optimization 64
4.3.2 Total area optimization 65
4.4 演算法流程 71
第5章、 實驗結果與數據比較 74
5.1 Table size optimization 和Total area optimization比較數據 75
5.2 各函數數據 82
5.3 論文數據比較 90
第6章、 結論與未來展望 97
6.1 結論 97
6.2 未來展望 98
參考文獻 99
參考文獻 References
[1] M.J. Schulte, E. E. Swartzlander, Jr. “Hardware Designs for Exactly Rounded Elementary Functions,” IEEE Transactions on Computers, 43(8):964–973, August 1994.
[2] K. E. Wires, M. J. Schulte, L. P. Marquette, and P. I. Balzola. “Combined Unsigned and Two’s Complement Squarers,” In Proceedings of the 33rd Asilomar Conference on Signals, Systems, and Computers, volume 2, pages 1215–1219, Pacific
Grove, CA, October 1999.
[3] A. A. Liddicoat, M. J. Flynn, “Parallel Square and Cube Computation” In IEEE 34th Asilomar Conference on Signals, Systems and Computers, 2000
[4] Walters, E.G., III; Schulte, M.J. “Efficient Function Approximation Using Truncated Multipliers and Squarers,” In Proceedings of the 17th IEEE Symposium on Computer Arithmetic, Cape Cod, MA, USA, 27–29 June 2005; pp. 232–239.
[5] D. Lee, R. Cheung, W. Luk, and J. Villasenor, “Hardware implementation trade-offs of polynomial approximations and interpolations,” IEEE Trans. Comput., vol. 57, no. 5, pp. 686–701, May 2008.
[6] E. G. Walters, III, “Linear and quadratic interpolators using truncated-matrix multipliers and squarers,” Computers, vol. 4, no. 4, pp. 293–321, Dec. 2015.
[7] M. Sadeghian, J. E. Stine, and E. G. Walters, III, “Optimized linear, quadratic and cubic interpolators for elementary function hardware implementation,” Electronics, vol. 5, no. 12, p. 17, Jun. 2016.
[8] Davide De Caro, E. Napoli, D. Esposito, “Minimizing Coefficients Wordlength for Piecewise-Polynomial Hardware Function Evaluation With Exact or Faithful Rounding,” IEEE Trans. on Circuit and Systems, vol. pp, no. 99, pp. 1-14, January 2017.
[9] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach,” IEEE Trans. Comput., vol. 45, no. 3, pp. 294–306, Mar. 1996.
[10] M. J. Schulte and E. E. Swartzlander Jr., “Truncated multiplication with correction constant,” VLSI Signal Processing VI, pp. 388–396, 1993
[11] Walters, E.G., III; Schulte, M.J.; Arnold, M.G. “Truncated Squarers with Constant and Variable Correction,” In Proceedings of the SPIE: Advanced Signal Processing Algorithms, Architectures, and Implementations XIV, Denver, CO, USA, 4–6 August 2004; Volume 5559, pp. 40–50.
[12] H. J. Ko and S.F. Hsia, “Design and Application of Faithfully Rounded and Truncated Multipliers with Combined Deletion, Reduction, and Rounding”, IEEE trans. Circuit system II Exp. Briefs, vol. 58, no. 5 pp. 304-308 May 2011.
[13] S. F. Hsiao, H. J. Ko, and C. S. Wen, “Two-level hardware function evaluation based on correction of normalized piecewise difference functions,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 5, pp. 292–296, May 2012.
[14] S. F. Hsiao, C. S. Wen, and P. H. Wu, “Compression of lookup table for piecewise polynomial function evaluation,” in Proc. 17th Euromicro Conf. Digit. Syst. Design (DSD), Aug. 2014, pp. 279–284.
[15] De Dinechin, F. Tisserand, A. “Multipartite Table Methods,” IEEE Trans. Comput. 2005, 54, 319–330.
[16] A.G.M. Strollo, D. De Caro, and N. Petra, “Elementary Functions Hardware Implementation Using Constrained Piecewise-Polynomial Approximations,” IEEE Trans. on Computers, vol.60, no.3, pp.418-432, Mar. 2011.
[17] D-U Lee, “Hierarchical Segmentation for Hardware Function Evaluation” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 17, No. 1, pp. 103-116 , Jan. 2009
[18] S. F. Hsiao, H. J. Ko, Y. L. Tseng, W. L. Huang, S. H. Lin, and C. S. Wen, "Design Of Hardware Function Evaluators Using Low-Overhead Non-uniform Segmentation With Address Remapping," The IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 5, pp. 875-886, May 2013.
[19] S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, and K.-C. Huang, “Hierarchical Multipartite Function Evaluation,” IEEE Transactions on Computers, Early Access Articles, 2016.
[20] A. Mohamed and A. Nadjia and B. Hamid and I. Mohamed, “Reconfigurable architecture for elementary functions evaluation,” 2009 IEEE/ACS International Conference on Computer Systems and Applications, May, 2009.
[21] K. A. C. Bickerstaff, M. Schulte, and E. E. Swartzlander, “Reduced area multipliers,” in Proc. Int. Conf. on Application-Specific Array Processors, 1993, pp. 478–489.
[22] E. J. King and E. Swartzlander, “Data-dependent truncation scheme for parallel multipliers,” in IEEE Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1178–1182, 1997.
[23] D. De Caro, et al., “A 380 MHz Direct Digital Synthesizer/Mixer with Hybrid CORDIC Architecture in 0.25 _m CMOS,” IEEE Journal of Solid-State Circuits (JSSC), vol. 42, no. 1, pp.151-160, Jan. 2007.
[24] D. De Caro, N. Petra, and A. G. M. Strollo, “Digital Synthesizer Mixer ith Hybrid CORDIC–Multiplier Architecture, Error Analysis and Optimization,” IEEE Trans. Circuit sand Systems-I, vol. 56, no. 2, pp. 364-373, Feb. 2009.
[25] D. Fu and A. N. Willson, Jr., “A Two-Stage Angle-Rotation Architecture and Its Error Analysis for Efficient Digital Mixer Implementation,” IEEE Trans on Circuits and Systems-I, vol. 53, no. 3, pp. 604-614, Mar. 2006.
[26] J.A. Pineiro, J.M. Muller, and J.D. Bruguera, “High-Speed Function Approximation Using a Minimax Quadratic Interpolator,” IEEE Trans on Computers, vol. 54, no. 3, pp. 304-318, Mar. 2005.
[27] V.G.Oklobdzija, D.Villeger, and S.S.Liu, “Improving Multiplier Design by Using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology,” IEEE Trans. VLSI Systems,vol.3, no.2,pp.292-301,June 1995.
[28] S. F. Hsiao, P. H. Wu, C. S. Wen, and P. K. Meher, “Table size reduction methods for faithfully rounded lookup-table-based multiplierless function evaluation,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 62, no. 5, pp. 466–470, May 2015.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code