Responsive image
博碩士論文 etd-0730116-171534 詳細資訊
Title page for etd-0730116-171534
論文名稱
Title
支援雙重精確度之特殊函數運算單元設計及應用
Design and Application of Special Function Unit in Dual-Precision
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
96
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-07-26
繳交日期
Date of Submission
2016-08-30
關鍵字
Keywords
查表法、Multipartite Table Methods、特殊函數運算單元、函數近似法、多項式逼近法、雙重精確度、Hierarchical Multipartite Table Methods、等分切割法
function evaluation, polynomial approximation, dual-precision, table lookup methods, uniform segmentation, special function unit, multipartite table methods, hierarchical multipartite table methods
統計
Statistics
本論文已被瀏覽 5705 次,被下載 25
The thesis/dissertation has been browsed 5705 times, has been downloaded 25 times.
中文摘要
過去許多特殊函數求值的研究多數專注於表格內容的產生、面積、延遲等,針對特定函數、特定精確度產生專用的硬體,因此效能上有面積較小、延遲較短等特色。而隨著近來科技產品趨勢對功率消耗的限制更加嚴苛,在不同精確度需求的時候,就要產生相對應的專用硬體來實現,如此一來才不會有功率消耗過高的麻煩,但其實這樣的做法會導致使用上需要兩份獨立分開的硬體,部分有相同功能的運算單元,如:平方器、乘法器,其實可以透過切割輸入位元的方式來共用硬體;表格方面也可以透過共用高精確度的表格內容來達成低精確度的運算需求,就毋須再另外產生一個專用於低精確度的表格內容。在這樣的概念下,本論文提出支援雙重精確度的特殊函數求值的硬體,能夠在面積沒有大幅成長的情況下,減少非常多在低精確度運算時的功率消耗。
Abstract
Most previous researches on function evaluation focused on optimization of table content generation, area, and delay for a specific function and accuracy. Therefore, the final result achieves the best performance on area and delay only for this particular function and accuracy. Since the recent trend of technology products must satisfy more strict power consumption specification, it is necessary to design an individually optimized hardware unit for each accuracy in order to achieve best power performance. In this paper, we propose several dual-precision hardware architectures which could support operations in two different accuracy requirements, aiming to reduce the power at different accuracies. To compute different function values, parts of the hardware components such as arithmetic units of squarer, multipliers, or lookup tables can be shared to reduce hardware cost. Compared with the design with individually optimized function units of different accuracies, the proposed design can significantly reduce the power consumption in the low-precision operation mode with only small area overhead.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖目錄 vi
表目錄 ix
第1章、 導論 1
1.1 研究動機 1
1.2 論文架構 2
第2章、 研究背景與相關研究 3
2.1 浮點數標準(IEEE-754 Standard) 3
2.2 查表法(Table-lookup Methods) 5
2.3 間接查表法分類 7
2.3.1 Computed-Bound Methods 7
2.3.2 Table-Bound Methods 7
2.3.3 In-between Methods 11
2.4 分段查表法(Piecewise Table Methods) 12
2.4.1 係數產生方式 13
2.4.2 等分切割法(Uniform Piecewise Methods) 14
2.4.3 非等分切割法(Non-uniform Piecewise Methods) 15
2.5 誤差分析(Error Analysis) 17
2.5.1 誤差分配(Error Budget Assignment) 17
2.5.2 誤差綜合考量 (Combined Error Method) 19
第3章、 定點數雙重精確度架構與設計 21
3.1 PPA Dual-Precision Architecture-Ⅰ 22
3.2 PPA Dual-Precision Architecture-Ⅱ 24
3.2.1 表格部分 25
3.2.2 算術部分 26
3.3 PPA Dual-Precision Architecture-Ⅲ 28
3.3.1 表格部分 28
3.3.2 算術部分 32
3.4 HMP Dual-Precision Architecture-Ⅳ 34
3.4.1 表格部分 35
3.4.2 算術部分 39
第4章、 浮點數雙重精確度架構設計與GPU中之應用 41
4.1 浮點數雙重精確度架構-小數部分架構設計 41
4.2 GPU中Special Function Unit架構設計 43
4.2.1 各式函數浮點格式的處理 44
4.2.2 Arithmetic Unit共用 46
第5章、 實驗結果與數據比較 49
5.1 PPA雙重精確度架構(16 & 8-bit)合成數據與比較 49
5.2 PPA雙重精確度架構(24 & 10-bit)合成數據與比較 59
5.3 浮點數雙重精確度架構應用於GPU中SFU之合成數據與比較 70
5.4 HMP雙重精確度架構合成數據與比較 72
第6章、 結論與未來展望 81
6.1 結論 81
6.2 未來展望 82
參考文獻 83
參考文獻 References
[1] B.-G. Nam, H. Kim, and H.-J. Yoo, “Power and area-efficient unfied computation of vector and elementary functions for handheld 3D graphics systems,” IEEE Trans. Comput., vol. 57, no. 4, pp. 490–504, Apr. 2008.
[2] D. De Caro, N. Petra, and A. G. M. Strollo, “High-performance special function unit for programmable 3-D graphics processors, ”IEEE Trans. Circuits Syst.-I, vol. 56, no. 9, pp. 1968–1978, Sep. 2009.
[3] Y.-J. Kim, H.-E. Kim, S.-H. Kim, J.-S. Park, S. Paek, and L.-S. Kim, “Homogeneous stream processors with embedded special function units for high-utilization programmable shaders,” IEEE Trans. Very Large Scale Integr. Syst., vol. 20, no. 9, pp. 1691–1704, Sep. 2012.
[4] D. De Caro, N. Petra, and A. G. M. Strollo, “Reducing Lookup-Table Size in Direct Digital Frequency Synthesizers Using Optimized Multipartite Table Method,” IEEE Trans. Circuits and Systems-I, vol. 55, no. 7, pp. 2116-2127, July 2008.
[5] D. De Caro, N. Petra, and A. G. M. Strollo, “Direct Digital Frequency Synthesizer Using Nonuniform Picewise-Linear Approximation,” IEEE Trans. Circuits and Systems-I, vol. 58, no. 10, pp. 2409-2419, Oct. 2011.
[6] V. K. Jain and L. Lin, “Floating-point nonlinear DSP coprocessor cell-two cycle chip,” VLSI Signal Processing, IX, 1996, 1996.
[7] D. De Caro and N. Petra and A. G. M. Strollo, “A high performance floating-point special function unit using constrained piecewise quadratic approximation,” 2008 IEEE International Symposium on Circuits and Systems, May, 2008.
[8] American National Standards Institute, “IEEE Standard 754 for Binary floating-point arithmetic,” ANSI/IEEE, Standard No. 754, Washington DC, 1985.
[9] J.M. Muller, “A Few Results on Table-Based Methods,” Reliable Computing, Vol. 5, No. 3, pp. 279-288, Aug. 1999.
[10] T.-B. Juang, S.-F. Hsiao, “Low Power and Fast CORDIC Processor for Vector Rotation,” Proc. IEEE 42nd Midwest Symposium on Circuits and Systems, Aug, 8-11, 1999.
[11] S.-F. Hsiao, C.-S. Wen, H.-M. Lee, “Implementation of Floating-Point CORDIC Rotation and Vectoring Based on Look Up Tables and Multipliers,” 2010 International Symposium on Next Generation Electronics, pp. 44-47, Nov. 2010.
[12] A. Mohamed and A. Nadjia and B. Hamid and I. Mohamed, “Reconfigurable architecture for elementary functions evaluation,” 2009 IEEE/ACS International Conference on Computer Systems and Applications, May, 2009.
[13] D. Das Sarma and D. W. Matula, “Faithful Bipartite ROM Reciprocal Tables,” Proc. 12th IEEE Symp. on Computer Arithmetic(ARITH’95), pp. 17-28, 1995.
[14] M. J. Schulte and J. E. Stine, “Approximating Elementary Functions with Symmetric Bipartite Tables,” IEEE Trans. Computers, vol. 48, no. 8, pp. 842-847, Aug. 1999.
[15] J. Stine and M. J. Schulte, “The Symmetric Table Addition Method for Accurate Function Evaluation,” Journal VLSI Signal Processing, vol. 21, pp. 167-177, 1999.
[16] F. de Dinechin and A. Tisserand, “Multipartite Table Methods,” IEEE Trans. Computers, vol. 54, no. 3, pp. 319-330, Mar. 2005.
[17] S.-F. Hsiao, C.-S. Wen, Y.-H. Chen, and K.-C. Huang, “Hierarchical Multipartite Function Evaluation,” IEEE Transactions on Computers, Early Access Articles, 2016.
[18] M.J. Schulte and E.E. Swartzlander, “Hardware Designs for Exactly Rounded Elementary Functions,” IEEE Transactions on Computers, vol. 43, no. 8, pp. 964-973, Aug. 1994.
[19] J.A. Pineiro, J.M. Muller, and J.D. Bruguera, “High-Speed Function Approximation Using a Minimax Quadratic Interpolator,” IEEE Transactions on Computers, vol. 54, no. 3, pp. 304-318, Mar. 2005.
[20] D.-U. Lee, R. Cheung, W. Luk, and J. Villasenor, “Hardware implementation trade-offs of polynomial approximations and interpolations,” IEEE Trans. Comput., vol. 57, no. 5, pp. 686–701, May 2008.
[21] E. G. Walters-III and M. J. Schulte, “Efficient Function Approximation Using Truncated Multipliers and Squarers,” Proc. IEEE Intl. Symp. Computer Arithmetic, pp. 232-239, June 2005.
[22] M. Sadeghian, J. E. Stine, and E. G. Walter Ⅲ, “Optimized Linear, Quadratic, and Cubic Interpolators for Elementary Function Hardware Implementations,” electronics, 2016.
[23] D-U Lee, “Hierarchical Segmentation for Hardware Function Evaluation” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 17, No. 1, pp. 103-116 , Jan. 2009
[24] S. F. Hsiao, H. J. Ko, Y. L. Tseng, W. L. Huang, S. H. Lin, and C. S. Wen, "Design Of Hardware Function Evaluators Using Low-Overhead Non-uniform Segmentation With Address Remapping," The IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, no. 5, pp. 875-886, May 2013.
[25] 曾于玲, “使用位元截斷法之查表式函數求值單元自動產生器設計”, 國立中山大學資訊工程學系碩士論文, 2011.
[26] 曹凱翔, “GPU繪圖處理器之效能優化與低功率設計”, 國立中山大學資訊工程學系碩士論文, 2014
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code