國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,分散式奇異值分解最小平方估計演算法,Distributed Algorithms for SVD-based Least Squares Estimation

論文名稱 Title	分散式奇異值分解最小平方估計演算法 Distributed Algorithms for SVD-based Least Squares Estimation
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	99 學年度第 2 學期 The spring semester of Academic Year 99	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	82
研究生 Author	彭煜庭 Yu-Ting Peng
指導教授 Advisor	李錫智 Shie-Jue Lee
召集委員 Convenor	吳志宏 Chih-Hung Wu
口試委員 Advisory Committee	侯俊良, 蔡賢亮, 歐陽振森 Chun-Liang Hou; Hsien-Liang Tsai; Chen-Sen Ouyang
口試日期 Date of Exam	2011-07-05	繳交日期 Date of Submission	2011-07-19
關鍵字 Keywords	矩陣分解、奇異值分解、最小平方估計、分散式系統、大型資料集、平行處理 CUDA, Matrix decomposition, large-scale dataset, least-squares solution, SVD, MapReduce, Distributed
統計 Statistics	本論文已被瀏覽 5758 次，被下載 1880 次 The thesis/dissertation has been browsed 5758 times, has been downloaded 1880 times.

中文摘要
奇異值分解(singular value decomposition，SVD)常被用來解最小平方估計的問題，但奇異值分解在求最小平方解時，非常耗費時間和記憶體空間。因此本論文提出疊代式分割與合併的演算法(iterative divide and merge algorithm, IDMSVD)，目的是改善奇異值分解在估計參數時非常耗費時間以及記憶體空間的問題。IDMSVD的概念是先透過奇異值分解進行資料縮減，經過數個階層的資料縮減，最後再利用奇異值分解對縮減後的資料進行參數估計。其中資料縮減包含三個步驟，首先將輸入資料分成許多個資料區塊，然後利用奇異值分解對每個資料區塊分別做分解，之後合併分解後的結果，做為下一層的輸入矩陣，重複上述三個步驟，直到縮減後的資料夠小才停止疊代，最後利用奇異值分解最小平方估計法求得最小平方解。而對於大型資料集IDMSVD的執行時間仍有改善的空間，IDMSVD在執行每一個階層時，是按順序處理每一個資料區塊；但是，每個資料區塊彼此之間是互相獨立的，如果可以同時處理所有的資料區塊，則可以節省許多時間。所以，本論文基於IDMSVD提出兩種加速IDMSVD的演算法，分別使用兩種分散式系統實作，為雲端運算的Hadoop平台以及NVIDIA的圖形處理器(graphic processing unit, GPU)。將使用Hadoopuq平台的MapReduce實作的演算法稱為分散式IDMSVD演算法，而使用GPU實作的演算法稱為平行化IDMSVD演算法。實驗結果顯示，IDMSVD可以有效的改善SVD求最小平方解耗費執行時間與記憶體空間的問題，且分散式IDMSVD演算法與平行化IDMSVD演算法亦可進一步改善IDMSVD的執行時間。
Abstract
Singular value decomposition (SVD) is a popular decomposition method for solving least-squares estimation problems. However, for large datasets, SVD is very time consuming and memory demanding in obtaining least squares solutions. In this paper, we propose a least squares estimator based on an iterative divide-and-merge scheme for large-scale estimation problems. The estimator consists of several levels. At each level, the input matrices are subdivided into submatrices. The submatrices are decomposed by SVD respectively and the results are merged into smaller matrices which become the input of the next level. The process is iterated until the resulting matrices are small enough which can then be solved directly and efficiently by the SVD algorithm. However, the iterative divide-and-merge algorithms executed on a single machine is still time demanding on large scale datasets. We propose two distributed algorithms to overcome this shortcoming by permitting several machines to perform the decomposition and merging of the submatrices in each level in parallel. The first one is implemented in MapReduce on the Hadoop distributed platform which can run the tasks in parallel on a collection of computers. The second one is implemented on CUDA which can run the tasks in parallel using the Nvidia GPUs. Experimental results demonstrate that the proposed distributed algorithms can greatly reduce the time required to solve large-squares problems.

目次 Table of Contents
論文審定書 i 致謝 iii 摘要 iv Abstract v 第一章導論 1 1.1研究動機與文獻探討 1 1.2論文架構 3 第二章最小平方估計法 4 2.1最小平方問題 4 2.2奇異值分解最小平方估計法 5 2.3遞迴式奇異值分解最小平方估計法 7 2.4最小平方估計法範例 9 2.4.1奇異值分解最小平方估計法 10 2.4.2遞迴式奇異值分解最小平方估計法 11 第三章疊代式分割與合併演算法 13 3.1 疊代式分割與合併奇異值分解最小平方估計法 13 3.2 複雜度分析與比較 19 3.3 疊代式分割與合併奇異值分解最小平方估計法範例 21 第四章分散式的疊代式分割與合併演算法 25 4.1 Hadoop 25 4.1.1 Hadoop的工作分配 26 4.1.2 Hadoop叢集架構 26 4.1.3 HDFS 28 4.2 MapReduce 28 4.2.1 MapReduce程式設計模型 28 4.3 分散式的疊代式分割與合併奇異值分解最小平方估計法 32 第五章平行化疊代式分割與合併演算法 36 5.1 通用圖形處理器 36 5.2 CUDA 36 5.2.1 CUDA架構 38 5.2.2 CUDA記憶體模型 40 5.3 平行化疊代式分割與合併奇異值分解最小平方估計法 41 第六章實驗結果 44 6.1 分散式的疊代式分割與合併奇異值分解最小平方估計法 44 6.1.1 實驗資料 44 6.1.2 實驗環境 44 6.1.3 MapReduce實驗一 45 6.1.4 MapReduce實驗二 50 6.1.5 MapReduce實驗三 52 6.2 平行化疊代式分割與合併奇異值分解最小平方估計法 53 6.2.1 實驗資料 54 6.2.2 實驗環境 54 6.2.3 GPU實驗一 54 6.2.4 GPU實驗二 61 第七章結論與未來研究方向 67 7.1 結論 67 7.2 未來研究方向 68 參考文獻 69

參考文獻 References
[1] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” Numerische Mathematik, vol. 14, no. 6, pp. 403–420, April 1970. [2] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore, MD, USA: The Johns Hopkins University Press, October 1996. [3] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to Linear Regression Analysis, 4th ed. Hoboken, N.J., USA: Wiley-Interscience, July 2006. [4] R. H. Myers, D. C. Montgomery, G. G. Vining, and T. J. Robinson, Generalized Linear Models: with Applications in Engineering and the Sciences, 2nd ed. Hoboken, N.J., USA: Wiley-Interscience, March 2010. [5] O. Bretscher, Linear Algebra With Applications, 3rd ed. Upper Saddle River, N.J., USA: Prentice Hall, July 2004. [6] A. Bjorck, Numerical Methods for Least Squares Problems, 1st ed. Philadelphia, PA , USA: SIAM: Society for Industrial and Applied Mathematics, December 1996. [7] S. S. Niu, L. Ljung, and A . Bjorck, “Decomposition methods for solving least-squares parameter estimation,” IEEE Transactions on Signal Processing, vol. 44, no. 1, pp. 2847–2862, November 1996. [8] A. Bjorck and J. Y. Yuan, “Preconditioners for least squares problems by LU factorization,” Electronic Transactions on Numerical Analysis, vol. 8, pp. 26–36, November 1999. [9] S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system modeling with self-constructing rule generation and hybrid SVD-based learning,” IEEE Transactions on Fuzzy Systems, vol. 11, no. 3, pp. 341–363, June 2003. [10] L. V. Foster, “Solving rank-deficient and ill-posed problems using UTV and QR factorizations,” SIAM Journal on Matrix Analysis and Applications, vol. 26, no. 2, pp. 682–600, February 2003. [11] C. B. Moler, Numerical Computing with Matlab. Philadelphia, PA , USA: Society for Industrial Mathematics, January 2004. [12] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed. Cambridge, UK: Cambridge University Press, October 1992. [13] L. Giraud, S. Gratton, and J. Langou, “A rank-k update procedure for reorthogonalizing the orthogonal factor from modified Gram-Schmidt,” SIAM Journal on Matrix Analysis and Applications, vol. 26, no. 4, pp. 1163–1177, April 2004. [14] V. Hari, “Accelerating the SVD block-jacobi method,” Computing, vol. 76, no. 1, pp. 27–63, March 2006. [15] Y. Yamamoto, T. Fukaya, T. Uneyama, M. Takata, K. Kimura, M. Iwasaki, and Y. Nakamura, “Accelerating the singular value decomposition of rectangular matrices with the CSX600 and the integrable SVD,” in Lecture Notes in Computer Science, vol. 4671, 2007, pp. 340–346. [16] S. Lahabar and P. J. Narayanan, “Singular value decomposition on GPU using CUDA,” in Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing, 2009, pp. 1–10. [17] T. Kondaa and Y. Nakamura, “A new algorithm for singular value decomposition and its parallelization,” Parallel Computing, vol. 36, no. 6, pp. 331–344, June 2009. [18] M. Bečka, G. Okša, M. Vajteršic, and L. Grigori, “On iterative QR pre-processing in the parallel block-jacobi SVD algorithm,” Parallel Computing, vol. 36, no. 6-6, pp. 297–307, June 2009. [19] H. Ltaief, J. Kurzak, and J. Dongarra, “Parallel two-sided matrix reduction to band bidiagonal form on multicore architectures,” IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 4, pp. 417–423, April 2010. [20] D. Peleg, Distributed Computing – A Locality-Sensitive Approach, Society for Industrial and Applied Mathematics(SIAM), Philadelphia, 2000. [21] S. Ghosh, Distributed Systems – An Algorithmic Approach, Chapman & Hall/CRC, 2006. [22] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 61, no. 1, pp. 107–113, January 2008. [23] C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun, “Map-reduce for machine learning on multicore,” in Advances in Neural Information Processing Systems, 2007, pp. 281–288. [24] W. Zhao, H. Ma, and Q. He, “Parallel k-means clustering based on mapreduce,” in Lecture Notes in Computer Science, vol. 6931, 2009, pp. 674–679. [25] A. Verma, X. Llora, D. E. Goldberg, and R. H. Campbelly, “Scaling genetic algorithms using mapreduce,” in Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, 2009, pp. 13–18. [26] J. Cohen, “Graph twiddling in a mapreduce world,” Computing in Science & Engineering, vol. 11, no. 4, pp. 29–41, January 2009. [27] S. J. Matthews and T. L. Williams, “MrsRF: an efficient mapreduce algorithm for analyzing large collections of evolutionary trees,” BMC Bioinformatics, vol. 11, no. Suppl 1, January 2010. [28] W. Fang, B. He, Q. Luo, and N. K. Govindaraju, “Mars: Accelerating mapreduce with graphics processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 4, pp. 608–620, April 2011. [29] P. Harish and P. J. Narayanan. “Accelerating Large Graph Algorithms on the GPU Using CUDA,” in Proceedings of the IEEE International Conference on High Performance Computing, December 2007. [30] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” in Proceedings of the 19th ACM Symposium on Operating Systems Principles, 2003. [31] T. White, Hadoop: The Definitive Guide, America: O’Reilly Media, 2009. [32] http://hadoop.apache.org/ [33] http://developer.nvidia.com/

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0719111-152616.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS