Responsive image
博碩士論文 etd-0801116-135632 詳細資訊
Title page for etd-0801116-135632
論文名稱
Title
分層的表格為主函數近似方法
Hierarchical Multipartite Function Evaluation
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
67
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-08-30
繳交日期
Date of Submission
2016-09-01
關鍵字
Keywords
無損壓縮、VLSI、算數運算單元、表格為主函數近似方法、bipartite table methods、multipartite table methods
bipartite table methods, lossless compression, arithmetic units, table-based function evaluation, VLSI, multipartite table methods
統計
Statistics
本論文已被瀏覽 5651 次,被下載 43
The thesis/dissertation has been browsed 5651 times, has been downloaded 43 times.
中文摘要
在許多信號處理應用,如繪圖處理器的特殊函數單元,往往需要計算複雜函
數值此種重要的算術計算,以硬體實現函數近似值的計算通常包含查表和一些
簡單的乘法或是加法器單元,其中表格面積有時佔了整體面積的很大比例,尤
其是當高精確度或是多種函數共用算術單元但是仍有個別自己的表格時。本論
文主要是針對表格和加法之函數值計算方法,提出表格最佳化之分解。此種方
法的表格可分成兩大類:存初始值的表格(Table of Initial values) 和存位移值的表
格(Tables of Offset)。在過去的文獻中,Multipartite table method (MP) 相較於更早
提出的symmetric bipartite table methods (SBTM) 和symmetric table addition method
(STAM) 方法,在中低精確度的應用上,有更小的表格面積。本論文將提出一個
廣義的MP 法,稱為階層式(Hierarchical) MP(HMP),經由多層的MP 表格分解,
能找出最省整體表格面積的表格分解方式,並且搭配提出的誤差綜合考量方法,
找出最佳化的位元寬度,達到最省面積的硬體設計。此外,本論文也改善最近發
表的無失真表格壓縮的方法,套用在TI 表格,在不增加額外的硬體面積和時間
延遲的情況下,更進一步的降低整體表格面積。ASIC 和FPGA 實驗證明,本論文
提出的表格和加法的函數求值計算單元設計,可有效率的降低整體硬體面積。
Abstract
Function evaluation is an important arithmetic computation in many signal processing
applications, such as special function units in modern graphics processing units (GPUs).
Hardware implementations of function evaluation usually consists of lookup tables (LUT)
and some simple arithmetic units of multipliers and/or adders. LUT usually takes a significant
portion of total area cost, especially when function evaluators are allowed to compute
several different arithmetic functions with shared arithmetic units where evaluation of each
function needs separate LUT. In this thesis, we focus on the category of table-lookup-andaddition
(TA) function evaluators that are composed of two types of LUT: table of initial
values (TI) and table of offset values (TO), followed by a multi-operand adder. It has
been shown that multipartite table method (MP) has significant improvement over prior
similar designs such as symmetric bipartite table methods (SBTM) and symmetric table
addition methods (STAM) for applications with low-to-medium precision requirements.
This thesis presents an extension of MP, called hierarchical multipartite (HMP), which
further reduces total table size by applying several levels of table decompositions. Furthermore,
we perform the bit-width optimization by jointly considering the impacts of all
error sources during the search of best table decompositions, leading to more efficient
hardware design. Besides, a new lossless decomposition of TI is presented, resulting
in additional saving of table size without incurring any extra errors. Experimental results
show that the proposed design can efficiently reduce the total area cost in ASIC and FPGA
implementations.
目次 Table of Contents
目錄
論文口試委員審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
圖目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
表目錄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
第一章緒論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
第二章研究背景與相關研究. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 函數近似方法分類. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Table-Lookup-and-Addition (TA) . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Bipartite Table Methods (BP) . . . . . . . . . . . . . . . . . . . 4
2.2.2 Symmetric Bipartite Table Methods (SBTM) . . . . . . . . . . . 7
2.2.3 Symmetric Table Addition Methods (STAM) . . . . . . . . . . . 9
2.2.4 Multipartite Table Methods (MP) . . . . . . . . . . . . . . . . . 11
2.3 Piecewise Polynomial Approximation (PPA) . . . . . . . . . . . . . . . . 15
2.4 整合誤差方法(Combined Error Methods) . . . . . . . . . . . . . . . . . 17
2.5 無損表格壓縮(Lossless ROM Compression) . . . . . . . . . . . . . . . 19
2.5.1 Two-Table Decomposition Scheme . . . . . . . . . . . . . . . . 19
iv
2.5.2 Three-Table Decomposition Scheme . . . . . . . . . . . . . . . . 21
第三章Hierarchical Multipartite (HMP). . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 函數的定義域(domain) 與值域(range) . . . . . . . . . . . . . . . . . . 25
3.2 MP 取樣方法及誤差分配(Error Budget) . . . . . . . . . . . . . . . . . 27
3.2.1 Approximation Error . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Quantization Error and Final Round Error . . . . . . . . . . . . . 30
3.3 HMP 方法概述. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 整合誤差(Combined Error) 與窮舉搜尋(Exhaustive Search) . . . . . . 41
3.5 低代價(low cost) 的無損(lossless) 壓縮方法. . . . . . . . . . . . . . 45
第四章實驗結果與比較. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
第五章結論與未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
v
參考文獻 References
參考文獻
[1] F. de Dinechin and A. Tisserand, “Multipartite table methods,” IEEE Transactions
on Computers, vol. 54, pp. 319–330, March 2005.
[2] Y. J. Kim, H. E. Kim, S. H. Kim, J. S. Park, S. Paek, and L. S. Kim, “Homogeneous
stream processors with embedded special function units for high-utilization
programmable shaders,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 20, pp. 1691–1704, Sept 2012.
[3] D. D. Caro, N. Petra, and A. G. M. Strollo, “Reducing lookup-table size in direct digital
frequency synthesizers using optimized multipartite table method,” IEEE Transactions
on Circuits and Systems I: Regular Papers, vol. 55, pp. 2116–2127, Aug
2008.
[4] B. G. Nam, H. Kim, and H. J. Yoo, “Power and area-efficient unified computation
of vector and elementary functions for handheld 3d graphics systems,” IEEE Transactions
on Computers, vol. 57, pp. 490–504, April 2008.
[5] D. D. Caro, N. Petra, and A. G. M. Strollo, “High-performance special function
unit for programmable 3-d graphics processors,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 56, pp. 1968–1978, Sept 2009.
[6] D. D. Caro, N. Petra, and A. G. M. Strollo, “Direct digital frequency synthesizer
using nonuniform piecewise-linear approximation,” IEEE Transactions on Circuits
and Systems I: Regular Papers, vol. 58, pp. 2409–2419, Oct 2011.
[7] J. A. Pineiro, S. F. Oberman, J. M. Muller, and J. D. Bruguera, “High-speed function
approximation using a minimax quadratic interpolator,” IEEE Transactions on
Computers, vol. 54, pp. 304–318, March 2005.
[8] D. U. Lee, R. Cheung, W. Luk, and J. Villasenor, “Hardware implementation tradeoffs
of polynomial approximations and interpolations,” IEEE Transactions on Computers,
vol. 57, pp. 686–701, May 2008.
[9] D. U. Lee and J. D. Villasenor, “Optimized custom precision function evaluation for
embedded processors,” IEEE Transactions on Computers, vol. 58, pp. 46–59, Jan
2009.
[10] D. U. Lee, R. C. C. Cheung, W. Luk, and J. D. Villasenor, “Hierarchical segmentation
for hardware function evaluation,” IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 17, pp. 103–116, Jan 2009.
[11] T. Sasao, S. Nagayama, and J. T. Butler, “Numerical function generators using lut
cascades,” IEEE Transactions on Computers, vol. 56, pp. 826–838, June 2007.
[12] S. F. Hsiao, H. J. Ko, Y. L. Tseng, W. L. Huang, S. H. Lin, and C. S. Wen, “Design
of hardware function evaluators using low-overhead nonuniform segmentation
with address remapping,” IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 21, pp. 875–886, May 2013.
[13] A. G. M. Strollo, D. D. Caro, and N. Petra, “Elementary functions hardware implementation
using constrained piecewise-polynomial approximations,” IEEE Transactions
on Computers, vol. 60, pp. 418–432, March 2011.
[14] S. F. Hsiao, H. J. Ko, and C. S. Wen, “Two-level hardware function evaluation based
on correction of normalized piecewise difference functions,” IEEE Transactions on
Circuits and Systems II: Express Briefs, vol. 59, pp. 292–296, May 2012.
[15] M. Chaudhary and P. Lee, “An improved two-step binary logarithmic converter
for fpgas,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62,
pp. 476–480, May 2015.
[16] D. D. Sarma and D. W. Matula, “Faithful bipartite rom reciprocal tables,” in Computer
Arithmetic, 1995., Proceedings of the 12th Symposium on, pp. 17–28, Jul 1995.
[17] M. J. Schulte and J. E. Stine, “Approximating elementary functions with symmetric
bipartite tables,” IEEE Transactions on Computers, vol. 48, pp. 842–847, Aug 1999.
[18] J. E. Stine and M. J. Schulte, “The symmetric table addition method for accurate
function approximation,” Journal of VLSI signal processing systems for signal, image
and video technology, vol. 21, no. 2, pp. 167–177, 1999.
[19] J.-M. Muller, “A few results on table-based methods,” Reliable Computing, vol. 5,
no. 3, pp. 279–288, 1999.
[20] P. K. Meher, “Lut optimization for memory-based computation,” IEEE Transactions
on Circuits and Systems II: Express Briefs, vol. 57, pp. 285–289, April 2010.
[21] W. F. Wong and E. Goto, “Fast evaluation of the elementary functions in single precision,”
IEEE Transactions on Computers, vol. 44, pp. 453–457, Mar 1995.
[22] J. Y. L. Low and C. C. Jong, “A memory-efficient tables-and-additions method for
accurate computation of elementary functions,” IEEE Transactions on Computers,
vol. 62, pp. 858–872, May 2013.
[23] D. Wang, J. M. Muller, N. Brisebarre, and M. D. Ercegovac, “(m,p,k) -friendly
points: A table-based method to evaluate trigonometric function,” IEEE Transactions
on Circuits and Systems II: Express Briefs, vol. 61, pp. 711–715, Sept 2014.
[24] S. F. Hsiao, P. H. Wu, C. S. Wen, and P. K. Meher, “Table size reduction methods
for faithfully rounded lookup-table-based multiplierless function evaluation,” IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 62, pp. 466–470, May
2015.
[25] J.-M. Muller, Elementary Functions: Algorithms and Implementation, 2nd ed.
Birkhauser, 2006.
[26] M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann Pub, 2004.
[27] B. Parhami, Algorithms and Design Methods for Digital Computer Arithmetic, International
2nd ed. Oxford University Press, 2012.
[28] S.-F. Hsiao, P.-C. Wei, and C.-P. Lin, “An automatic hardware generator for special
arithmetic functions using various rom-based approximation approaches,” in Circuits
and Systems, 2008. ISCAS 2008. IEEE International Symposium on, pp. 468–
471, May 2008.
[29] 曾于玲, “使用位元截斷法之查表式函數求值單元自動產生器設計,” 國立中山
大學資訊工程學系碩士論文, 2011.
[30] 吳柏翰, “無乘法器查表法函數運算設計之表格縮減和最佳化,” 國立中山大學
資訊工程學系碩士論文, 2013.
[31] S. F. Hsiao, C. S. Wen, Y. H. Chen, and K. C. Huang, “Hierarchical multipartite
function evaluation,” IEEE Transactions on Computers, vol. PP, no. 99, pp. 1–1,
2016.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code