國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,分層的表格為主函數近似方法,Hierarchical Multipartite Function Evaluation

論文名稱 Title	分層的表格為主函數近似方法 Hierarchical Multipartite Function Evaluation
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	105 學年度第 1 學期 The fall semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	67
研究生 Author	陳奕豪 Yi-Hau Chen
指導教授 Advisor	蕭勝夫 Shen-Fu Hsiao
召集委員 Convenor	莊作彬 Tso-Bing Juang
口試委員 Advisory Committee	邱日清, 陳坤志, 張雲南 Jih-Ching Chiu; Kun-Chih Chen; Yun-Nan Chang
口試日期 Date of Exam	2016-08-30	繳交日期 Date of Submission	2016-09-01
關鍵字 Keywords	無損壓縮、VLSI、算數運算單元、表格為主函數近似方法、bipartite table methods、multipartite table methods bipartite table methods, lossless compression, arithmetic units, table-based function evaluation, VLSI, multipartite table methods
統計 Statistics	本論文已被瀏覽 5651 次，被下載 43 次 The thesis/dissertation has been browsed 5651 times, has been downloaded 43 times.

中文摘要
在許多信號處理應用，如繪圖處理器的特殊函數單元，往往需要計算複雜函數值此種重要的算術計算，以硬體實現函數近似值的計算通常包含查表和一些簡單的乘法或是加法器單元，其中表格面積有時佔了整體面積的很大比例，尤其是當高精確度或是多種函數共用算術單元但是仍有個別自己的表格時。本論文主要是針對表格和加法之函數值計算方法，提出表格最佳化之分解。此種方法的表格可分成兩大類：存初始值的表格(Table of Initial values) 和存位移值的表格(Tables of Offset)。在過去的文獻中，Multipartite table method (MP) 相較於更早提出的symmetric bipartite table methods (SBTM) 和symmetric table addition method (STAM) 方法，在中低精確度的應用上，有更小的表格面積。本論文將提出一個廣義的MP 法，稱為階層式(Hierarchical) MP（HMP），經由多層的MP 表格分解，能找出最省整體表格面積的表格分解方式，並且搭配提出的誤差綜合考量方法，找出最佳化的位元寬度，達到最省面積的硬體設計。此外，本論文也改善最近發表的無失真表格壓縮的方法，套用在TI 表格，在不增加額外的硬體面積和時間延遲的情況下，更進一步的降低整體表格面積。ASIC 和FPGA 實驗證明，本論文提出的表格和加法的函數求值計算單元設計，可有效率的降低整體硬體面積。
Abstract
Function evaluation is an important arithmetic computation in many signal processing applications, such as special function units in modern graphics processing units (GPUs). Hardware implementations of function evaluation usually consists of lookup tables (LUT) and some simple arithmetic units of multipliers and/or adders. LUT usually takes a significant portion of total area cost, especially when function evaluators are allowed to compute several different arithmetic functions with shared arithmetic units where evaluation of each function needs separate LUT. In this thesis, we focus on the category of table-lookup-andaddition (TA) function evaluators that are composed of two types of LUT: table of initial values (TI) and table of offset values (TO), followed by a multi-operand adder. It has been shown that multipartite table method (MP) has significant improvement over prior similar designs such as symmetric bipartite table methods (SBTM) and symmetric table addition methods (STAM) for applications with low-to-medium precision requirements. This thesis presents an extension of MP, called hierarchical multipartite (HMP), which further reduces total table size by applying several levels of table decompositions. Furthermore, we perform the bit-width optimization by jointly considering the impacts of all error sources during the search of best table decompositions, leading to more efficient hardware design. Besides, a new lossless decomposition of TI is presented, resulting in additional saving of table size without incurring any extra errors. Experimental results show that the proposed design can efficiently reduce the total area cost in ASIC and FPGA implementations.

目次 Table of Contents
目錄論文口試委員審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 第一章緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 第二章研究背景與相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 函數近似方法分類. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Table-Lookup-and-Addition (TA) . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Bipartite Table Methods (BP) . . . . . . . . . . . . . . . . . . . 4 2.2.2 Symmetric Bipartite Table Methods (SBTM) . . . . . . . . . . . 7 2.2.3 Symmetric Table Addition Methods (STAM) . . . . . . . . . . . 9 2.2.4 Multipartite Table Methods (MP) . . . . . . . . . . . . . . . . . 11 2.3 Piecewise Polynomial Approximation (PPA) . . . . . . . . . . . . . . . . 15 2.4 整合誤差方法(Combined Error Methods) . . . . . . . . . . . . . . . . . 17 2.5 無損表格壓縮(Lossless ROM Compression) . . . . . . . . . . . . . . . 19 2.5.1 Two-Table Decomposition Scheme . . . . . . . . . . . . . . . . 19 iv 2.5.2 Three-Table Decomposition Scheme . . . . . . . . . . . . . . . . 21 第三章Hierarchical Multipartite (HMP). . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 函數的定義域(domain) 與值域(range) . . . . . . . . . . . . . . . . . . 25 3.2 MP 取樣方法及誤差分配(Error Budget) . . . . . . . . . . . . . . . . . 27 3.2.1 Approximation Error . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2.2 Quantization Error and Final Round Error . . . . . . . . . . . . . 30 3.3 HMP 方法概述. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 整合誤差(Combined Error) 與窮舉搜尋(Exhaustive Search) . . . . . . 41 3.5 低代價(low cost) 的無損(lossless) 壓縮方法. . . . . . . . . . . . . . 45 第四章實驗結果與比較. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 第五章結論與未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 v

參考文獻 References
參考文獻 [1] F. de Dinechin and A. Tisserand, “Multipartite table methods,” IEEE Transactions on Computers, vol. 54, pp. 319–330, March 2005. [2] Y. J. Kim, H. E. Kim, S. H. Kim, J. S. Park, S. Paek, and L. S. Kim, “Homogeneous stream processors with embedded special function units for high-utilization programmable shaders,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, pp. 1691–1704, Sept 2012. [3] D. D. Caro, N. Petra, and A. G. M. Strollo, “Reducing lookup-table size in direct digital frequency synthesizers using optimized multipartite table method,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 55, pp. 2116–2127, Aug 2008. [4] B. G. Nam, H. Kim, and H. J. Yoo, “Power and area-efficient unified computation of vector and elementary functions for handheld 3d graphics systems,” IEEE Transactions on Computers, vol. 57, pp. 490–504, April 2008. [5] D. D. Caro, N. Petra, and A. G. M. Strollo, “High-performance special function unit for programmable 3-d graphics processors,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 56, pp. 1968–1978, Sept 2009. [6] D. D. Caro, N. Petra, and A. G. M. Strollo, “Direct digital frequency synthesizer using nonuniform piecewise-linear approximation,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, pp. 2409–2419, Oct 2011. [7] J. A. Pineiro, S. F. Oberman, J. M. Muller, and J. D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Transactions on Computers, vol. 54, pp. 304–318, March 2005. [8] D. U. Lee, R. Cheung, W. Luk, and J. Villasenor, “Hardware implementation tradeoffs of polynomial approximations and interpolations,” IEEE Transactions on Computers, vol. 57, pp. 686–701, May 2008. [9] D. U. Lee and J. D. Villasenor, “Optimized custom precision function evaluation for embedded processors,” IEEE Transactions on Computers, vol. 58, pp. 46–59, Jan 2009. [10] D. U. Lee, R. C. C. Cheung, W. Luk, and J. D. Villasenor, “Hierarchical segmentation for hardware function evaluation,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, pp. 103–116, Jan 2009. [11] T. Sasao, S. Nagayama, and J. T. Butler, “Numerical function generators using lut cascades,” IEEE Transactions on Computers, vol. 56, pp. 826–838, June 2007. [12] S. F. Hsiao, H. J. Ko, Y. L. Tseng, W. L. Huang, S. H. Lin, and C. S. Wen, “Design of hardware function evaluators using low-overhead nonuniform segmentation with address remapping,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, pp. 875–886, May 2013. [13] A. G. M. Strollo, D. D. Caro, and N. Petra, “Elementary functions hardware implementation using constrained piecewise-polynomial approximations,” IEEE Transactions on Computers, vol. 60, pp. 418–432, March 2011. [14] S. F. Hsiao, H. J. Ko, and C. S. Wen, “Two-level hardware function evaluation based on correction of normalized piecewise difference functions,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, pp. 292–296, May 2012. [15] M. Chaudhary and P. Lee, “An improved two-step binary logarithmic converter for fpgas,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, pp. 476–480, May 2015. [16] D. D. Sarma and D. W. Matula, “Faithful bipartite rom reciprocal tables,” in Computer Arithmetic, 1995., Proceedings of the 12th Symposium on, pp. 17–28, Jul 1995. [17] M. J. Schulte and J. E. Stine, “Approximating elementary functions with symmetric bipartite tables,” IEEE Transactions on Computers, vol. 48, pp. 842–847, Aug 1999. [18] J. E. Stine and M. J. Schulte, “The symmetric table addition method for accurate function approximation,” Journal of VLSI signal processing systems for signal, image and video technology, vol. 21, no. 2, pp. 167–177, 1999. [19] J.-M. Muller, “A few results on table-based methods,” Reliable Computing, vol. 5, no. 3, pp. 279–288, 1999. [20] P. K. Meher, “Lut optimization for memory-based computation,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 57, pp. 285–289, April 2010. [21] W. F. Wong and E. Goto, “Fast evaluation of the elementary functions in single precision,” IEEE Transactions on Computers, vol. 44, pp. 453–457, Mar 1995. [22] J. Y. L. Low and C. C. Jong, “A memory-efficient tables-and-additions method for accurate computation of elementary functions,” IEEE Transactions on Computers, vol. 62, pp. 858–872, May 2013. [23] D. Wang, J. M. Muller, N. Brisebarre, and M. D. Ercegovac, “(m,p,k) -friendly points: A table-based method to evaluate trigonometric function,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, pp. 711–715, Sept 2014. [24] S. F. Hsiao, P. H. Wu, C. S. Wen, and P. K. Meher, “Table size reduction methods for faithfully rounded lookup-table-based multiplierless function evaluation,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, pp. 466–470, May 2015. [25] J.-M. Muller, Elementary Functions: Algorithms and Implementation, 2nd ed. Birkhauser, 2006. [26] M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann Pub, 2004. [27] B. Parhami, Algorithms and Design Methods for Digital Computer Arithmetic, International 2nd ed. Oxford University Press, 2012. [28] S.-F. Hsiao, P.-C. Wei, and C.-P. Lin, “An automatic hardware generator for special arithmetic functions using various rom-based approximation approaches,” in Circuits and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pp. 468– 471, May 2008. [29] 曾于玲, “使用位元截斷法之查表式函數求值單元自動產生器設計,” 國立中山大學資訊工程學系碩士論文, 2011. [30] 吳柏翰, “無乘法器查表法函數運算設計之表格縮減和最佳化,” 國立中山大學資訊工程學系碩士論文, 2013. [31] S. F. Hsiao, C. S. Wen, Y. H. Chen, and K. C. Huang, “Hierarchical multipartite function evaluation,” IEEE Transactions on Computers, vol. PP, no. 99, pp. 1–1, 2016.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0801116-135632.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS