國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,GPU加速特徵臉之人臉辨識系統,GPU Acceleration of Eigenface of the Face Recognition System

論文名稱 Title	GPU加速特徵臉之人臉辨識系統 GPU Acceleration of Eigenface of the Face Recognition System
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	106 學年度第 1 學期 The fall semester of Academic Year 106	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	80
研究生 Author	蔡安廸 An-Ti Tsai
指導教授 Advisor	李宗南 Chung-Nan Lee
召集委員 Convenor	陳昭和 Thou-Ho Chen
口試委員 Advisory Committee	陳嘉平, 盧天麒, 丁川康 Chia-Ping Chen; Tain-Chi Lu; Chuan-Kang Ting
口試日期 Date of Exam	2017-08-17	繳交日期 Date of Submission	2017-09-14
關鍵字 Keywords	GPGPU、CUDA、GPU平行運算、人臉辨識、特徵臉 Face recognition, Eigenface, CUDA, GPU parallel computing, GPGPU
統計 Statistics	本論文已被瀏覽 5749 次，被下載 476 次 The thesis/dissertation has been browsed 5749 times, has been downloaded 476 times.

中文摘要
使用GPGPU加速計算在每個即時的應用系統中是個非常重要的任務，而在本論文我們使用GPGPU加速人臉辨識系統。特徵臉是基於表徵的方法中常用來做人臉辨識的方法之一，當訓練資料量越大，不管在訓練或者測試模組皆越耗時。本論文我們使用Nvidia的CUDA平行運算架構實作GPU加速特徵臉演算法。GPU平行運算的效果取決於硬體規格以及演算法本身的複雜度和可平行性，還有程式開發者使用GPU平行化的方式。我們在特徵臉演算法的每個步驟實作GPU加速，在特定的步驟中使用不同的加速方法並且比較結果。在兩個不同的GPU硬體設備，我們在現有的實作與我們的實作方式進行效能評估。和Intel® Core™ i7-5960X相比，GTX1060在訓練模組達到平均約71.7倍的加速，在測試模組達到平均約34.1倍的加速。
Abstract
To use GPGPU to speed up the computation plays an important role in many real-time applications. In this thesis we apply GPGPU to speed up the face recognition system. Eigenface is one of the appearance based approaches which commonly used for face recognition. While the training data size becomes larger, the more time it takes for the training or test module. In this thesis, we use Nvidia’s CUDA parallel computing architecture to implement GPU-accelerated eigenface algorithms. The effectiveness of using GPU parallel operations depends on the hardware specifications, complexity and parallelism of the algorithm itself as well as the way programmers make the GPU parallel. We implement GPU acceleration at every step of the eigenface algorithm and compare different acceleration methods in some specific steps. We conduct performance evaluation for our GPGPU implementation and the existing implementation and also for two different GPU hardwares. Compared with the Intel® Core™ i7-5960X, the GTX1060 can get the average 71.7 speedup in the training module and 34.1 speedup in the testing module.

目次 Table of Contents
論文審定書 i 誌謝 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖次 vii 表次 x 第一章緒論 1 1.1 研究動機與目的 1 1.2 論文貢獻 3 1.3 論文架構 3 第二章背景知識和文獻探討 4 2.1 背景知識 4 2.1.1 Compute Unified Device Architecture 4 2.1.1.1 CUDA的程式設計模型 4 2.1.1.2 CUDA的硬體模型 5 2.1.1.3 CUDA的記憶體模型 6 2.1.1.4 CUDA平行化矩陣乘法運算範例 7 2.1.1.5 CUDA最佳平行化與限制 10 2.1.1.6 單精準度和雙精準度 12 2.1.2 特徵臉(Eigenface) 13 2.1.2.1 訓練模組演算法 13 2.1.2.2 測試模組演算法 15 2.1.3 計算特徵向量與特徵值 16 2.1.3.1 傳統Jacobi演算法 16 2.1.3.2 Cyclic Jacobi演算法 19 2.1.3.3 平行化Cyclic Jacobi演算法 20 2.2 文獻探討 20 2.2.1 特徵臉中特徵向量的保留 20 2.2.2 GPU加速特徵臉相關研究 22 第三章研究方法 25 3.1 訓練模組 27 3.2 測試模組 45 第四章實驗結果與分析 49 4-1 實驗環境與方法 49 4-2 訓練模組 50 4-3 測試模組 61 第五章結論與未來展望 65 參考文獻 66

參考文獻 References
[1] W. Zhao, R. Chellappa and P. J. Phillips, "Face recognition: A literature survey," ACM computing surveys (CSUR), Vol. 35 No. 4, pp. 399-458, 2003. [2] V. W. Lee, et al, "Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU," ACM SIGARCH computer architecture news, Vol. 38 No. 3, pp. 451-460, 2010. [3] C. Zeller, "Cuda c/c++ basics," NVIDIA Corporation, Supercomputing Tutorial, pp. 9-11, 2011. [4] S. A. Manavski, "CUDA compatible GPU as an efficient hardware accelerator for AES cryptography," Signal Processing and Communications, 2007. ICSPC 2007. IEEE International Conference on. IEEE, pp.65-68, 2007. [5] S. S. Stone, et al, "How GPUs can improve the quality of magnetic resonance imaging," Urbana, Vol. 51, 2008. [6] L. Nyland, M. Harris and J. Prins, "Fast n-body simulation with cuda," GPU gems, Vol. 3 No. 31, pp. 677-695, 2007. [7] M. B. Zhu, "Rendering pipeline," U.S. Patent No. 6, 697, 063. 24 Feb, 2004. [8] D. Guide, "CUDA C PROGRAMMING GUIDE," NVIDIA, July, 2013. [9] J. Cheng, M. Grossman and T. McKercher, Professional Cuda C Programming. John Wiley & Sons, 2014. [10] W. C. Feng and S. Xiao, "To GPU synchronize or not GPU synchronize?." Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on. IEEE, pp. 3801-3804, 2010. [11] R. Hochberg, "Matrix Multiplication with CUDA-a basic introduction to the CUDA programming model," Internet: https://www.shodor.org/media/content/petascale/mater ials/UPModules/matrixMultiplication/moduleDocument.pdf, 2012. [12] M. Harris, "Optimizing cuda," SC07: High Performance Computing With CUDA, 2007. [13] L. Sirovich and M. Kirby, "Low-dimensional procedure for the characterization of human faces," Josa a, Vol. 4, No. 3, pp. 519-524, 1987. [14] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of cognitive neuroscience, Vol. 3, No. 1, pp. 71-86, 1991. [15] K. Pearson, "LIII. On lines and planes of closest fit to systems of points in space," The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, Vol. 2, No. 11, pp. 559-572, 1901. [16] G. H. Golub and C. F. Van Loan, Matrix computations, Vol. 3. JHU Press, 2012. [17] C. G. J. Jacobi, "Über ein leichtes Verfahren die in der Theorie der Säcularstörungen vorkommenden Gleichungen numerisch aufzulösen," Journal für die reine und angewandte Mathematik, Vol. 30, pp. 51-94, 1846. [18] G. S. Sachdev, V. Vanjani and M. W. Hall, "Takagi factorization on GPU using CUDA," 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC’10), Knoxville, Tennessee, 2010. [19] A. Margaris, S. Souravlas and M. Roumeliotis, "Parallel implementations of the jacobi linear algebraic systems solver," arXiv preprint arXiv, 2014. [20] J. Demmel and K. Veselić, "Jacobi’s method is more accurate than QR," SIAM Journal on Matrix Analysis and Applications, Vol. 13, No. 4, pp. 1204-1245, 1992. [21] M. C. Seiler and F. A. Seiler, "Numerical recipes in C: the art of scientific computing," Risk Analysis, Vol. 9, No. 3, pp. 415-416, 1989. [22] B. T. Smith, J. M. Boyle, B. S. Garbow, Y. Ikebe, V. C. Klema and C. B. Moler, Matrix eigensystem routines-EISPACK guide, Vol. 6. Springer, 2013. [23] H. Rutishauser, "The Jacobi method for real symmetric matrices," Numerische Mathematik, Vol. 9, No. 1, pp. 1-10, 1966. [24] H. Moon and P. J. Phillips, "Computational and performance aspects of PCA-based face-recognition algorithms," Perception, Vol. 30, No. 3, pp. 303-321, 2001. [25] M. Slavković and D. Jevtić, "Face recognition using eigenface approach," Serbian Journal of Electrical Engineering, Vol. 9, No. 1, pp. 121-130, 2012. [26] B. A. Draper, W. S. Yambor and J. R. Beveridge, "Analyzing PCA-based face recognition algorithms: Eigenvector selection and distance measures," Empirical evaluation methods in computer vision, pp. 1-15, 2002. [27] M. Kirby, Geometric data analysis: an empirical approach to dimensionality reduction and the study of patterns, John Wiley & Sons, Inc., 2000. [28] N. Ashraf and A. Sibi, "CUDA accelerated face recognition," NeST–NVIDIA Center for GPU Computing NeST, India, 2010. [29] Y. Woo, C. Yi and Y. Yi, " Fast PCA-based face recognition on GPUs," Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, pp.2659-2663, 2013. [30] S. Görgünoğlu, K. Öz and A. Çavuşoğlu, "CUDA Based Speed Optimization of the PCA Algorithm," 2016. [31] T. Wang, et al, "Implementing the jacobi algorithm for solving eigenvalues of symmetric matrices with cuda," In Networking, Architecture and Storage (NAS), 2012 IEEE 7th International Conference on. IEEE, pp. 69-78, 2012. [32] M. U. Torun, O. Yilmaz and A. N. Akansu, "Novel GPU implementation of Jacobi algorithm for Karhunen-Loève transform of dense matrices," Information Sciences and Systems (CISS), 2012 46th Annual Conference on. IEEE, pp. 1-6, 2012. [33] G. Ruetsch and P. Micikevicius, "Optimizing matrix transpose in CUDA," Nvidia CUDA SDK Application Note, Vol. 18, 2009. [34] G. Bradski, "The OpenCV Library," Dr. Dobb's Journal: Software Tools for the Professional Programmer, Vol. 25, No. 11, pp. 120-123, 2000. [35] G. B. Huang, M. Ramesh, T. Berg and E. Learned-Miller, Labeled faces in the wild: A database for studying face recognition in unconstrained environments, Vol. 1, No. 2, pp. 3, Technical Report 07-49, University of Massachusetts, Amherst, 2007.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0615117-002830.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS