Responsive image
博碩士論文 etd-0902110-135238 詳細資訊
Title page for etd-0902110-135238
論文名稱
Title
GPU應用於圖演算法之計算效益分析
Performance Analysis of Graph Algorithms using Graphics Processing Units
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
75
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2010-07-23
繳交日期
Date of Submission
2010-09-02
關鍵字
Keywords
GPU、平行計算、多核心
GPU, Parallel computing, Multi-Core
統計
Statistics
本論文已被瀏覽 5647 次,被下載 1791
The thesis/dissertation has been browsed 5647 times, has been downloaded 1791 times.
中文摘要
近年來, GPU 透過增加核心的方式,使得計算能力大幅提升。
GPU 之硬體設計概念著重於處理資料的平行運算,
因此若要妥善發揮其計算效能,使用上有一定的限制。
例如,若針對高度相依性的資料,其運算無法發揮 GPU 平行化的特性,以至於無法得到效能的提升。

現今諸多 GPU 相關的研究大多著重於探討 GPU 能增加的計算效益。
因此,藉由 GPU 與多核心 CPU 之間的比較,我們試圖研究 GPU 的成本效益分析。
在善加運用 GPU 與多核心 CPU 之兩個不同的硬體架構之下,我們分別實作幾個典型的演算法,並列出其實驗結果。
此外,我們更進一步的在本論文中,提出包含時間與花費之成本效益分析。
Abstract
The GPU significantly improves the computing power by increasing the number of cores in recent years.
The design principle of GPU focuses on the parallism of data processing.
Therefore, there is some limitation of GPU application for the better computing power.
For example, the processing of highly dependent data could not be well-paralleled.
Consequently, it could not take the advantage of the computing power improved by GPU.

Most of researches in GPU have discussed the improvement of computing power.
Therefore, we try to study the cost effectiveness by the comparison between GPU and Multi-Core CPU.
By well-applying the different hardware architectures of GPU and Multi-Core CPU,
we implement some typical algorithms, respectively, and show the experimental result.
Furthermore, the analysis of cost effectiveness, including time and money spending, is also well discussed in this paper.
目次 Table of Contents
目錄
1 緒論 1
1.1 研究背景 .............................. 1
1.2 研究動機 .............................. 2
1.3 研究目的 .............................. 2
1.4 論文架構 .............................. 2
2 相關背景知識 3
2.1 CUDA .............................. 3
2.1.1 平行程式模型 ....................... 3
2.1.2 Kernel........................... 4
2.1.3 Device Memory ...................... 6
2.1.4 Shared Memory ...................... 9
2.1.5 時間測量 .......................... 10
2.1.6 Texture Memory ..................... 11
2.2 GPU 硬體架構 ........................... 13
2.2.1 GTX200 概觀 ....................... 13
2.2.2 Streaming Multiprocessor ................ 14
3 演算法的平行與實作 17
3.1 Bellman-Ford 演算法 ....................... 17
3.1.1 定義 ............................ 18
3.1.2 演算法 ........................... 18
3.1.3 平行化 Bellman-Ford 演算法 ............... 19
3.1.4 樹狀收斂 .......................... 20
3.1.5 GPU實作平行化 Bellman-Ford 演算法 ......... 21
3.2 Dijkstra’s 演算法 ......................... 27
3.2.1 定義 ............................ 27
3.2.2 演算法 ........................... 27
3.2.3 平行化 Dijkstra’s 演算法 ................. 28
3.2.4 平行化搜尋最小值 ..................... 29
3.2.5 GPU實作平行化 Dijkstra’s 演算法 ............ 30
3.3 Odd-Even Sorting 演算法 ..................... 40
3.3.1 定義 ............................ 40
3.3.2 演算法 ........................... 40
3.3.3 平行化 Odd-Even Sorting 演算法 ............. 41
3.3.4 GPU實作平行化 Odd-Even Sorting 演算法 ....... 43
4 實驗以及結果分析 46
4.1 實驗方式 .............................. 46
4.2 實驗環境設備 ........................... 47
4.3 程式效能增益 ........................... 48
4.3.1 CPU程式效能增益 .................... 48
4.3.2 GPU程式效能增益 .................... 48
4.4 平行化 Bellman-Ford 演算法實驗結果 .............. 49
4.5 平行化 Dijkstra’s 演算法實驗結果 ................. 51
4.6 平行化 Odd-Even Sorting 演算法實驗結果 ............ 53
4.7 結果分析 .............................. 55
5 結論以及未來展望 58
參考文獻 References
[1] Sorting algorithm/network. cuda:sorting [GPU Compute] http://www.smooth.url.tw/wiki/doku.phpid=cuda:sorting, 2010.
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cli.ord Stein. Introduction to Algorithms (3rd ed.). MIT Press.
[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cli.ord Stein. Introduction to Algorithms (2nd ed.). MIT Press and McGraw-Hill, 2001.
[4] Pawan Harish and P. J. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In HiPC, volume 4873 of LectureNotes in Com-puter Science, pages 197–208, 2007.
[5] Pawan Harish, Vibhav Vineet, and P. J. Narayanan. Large graph algo-rithms for massively multithreaded architectures. In Technical Report Number IIIT/TR/2009/74, 2009.
[6] Mark Harris. Parallel pre.x sum (Scan) with CUDA. http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/scan/doc/scan.pdf, 2007.
[7] Mark Harris. Optimizing cuda. In SC07: High Performance Computing With CUDA, 2007.
[8] Pedro J. Mart’.n, Roberto Torres, and Antonio Gavilanes. Cuda solutions for the sssp problem. In ICCS (1), pages 904–913, 2009.
[9] NVIDIA. Technical Brief NVIDIA GeForce.GTX 200 GPU Architec-tural Overview. Second-Generation Uni.ed GPU Architecture for Visual Computing, 2008.
[10] NVIDIA. NVIDIA CUDA Programming Guide 2.3.1. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf, 2009.
[11] NVIDIA. NVIDIA CUDA Reference Manual 2.3. http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/CUDA_Reference_Manual_2.3.pdf, 2009.
[12] NVIDIA. NVIDIA CUDA Programming Guide 3.1. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide 3.1.pdf, 2010.
[13] Vibhav Vineet, Pawan Harish, Suryakant Patidar, and P. J. Narayanan. Fast minimum spanning tree for large graphs on the gpu. In HPG ’09: Proceedings of the Conference on High Performance Graphics 2009, pages 167–171. ACM, 2009.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code