Responsive image
博碩士論文 etd-0721105-234855 詳細資訊
Title page for etd-0721105-234855
論文名稱
Title
任意資料重分配的有效方法
Efficient Methods for Arbitrary Data Redistribution
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
92
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2005-07-12
繳交日期
Date of Submission
2005-07-21
關鍵字
Keywords
資料重分配、本質週期計算、資料分配、MPI自訂型別、一般化基本週期計算
GBCC, ECC, Data Redistribution, MPI User-Defined Datatypes, Data Distribution
統計
Statistics
本論文已被瀏覽 5757 次,被下載 2454
The thesis/dissertation has been browsed 5757 times, has been downloaded 2454 times.
中文摘要
在分散式記憶體的多處理器電腦上,資料重分配技術常用來強化資料的空間局部性,以減少跨處理器的存取動作。對異質性環境而言,各處理器可用的資源可能會動態改變,非規律性的資料重分配技術可以用來調整工作負荷,以充分利用各處理器的資源。由於資料重分配的工作是在執行期進行,因此資料重分配技術的效能就顯的十分重要。
對於規律性的資料重分配動作,本論文從Indexing和Packing/Unpacking兩個角度加以探討。以Indexing角度而言,論文中提出了Generalized Basic-Cycle Calculation (GBCC)技術。GBCC技術可以有效地執行P個處理器上BLOCK-CYCLIC(s)分配方式到Q個處理器上BLOCK-CYCLIC(t)分配方式的資料重分配動作。從Packing/Unpacking角度而言,論文中使用MPI User-Defined Types (UDT)的方法來執行BLOCK-CYCLIC(s)分配方式到BLOCK-CYCLIC(t)分配方式的資料重分配動作。這種方法可以大幅降低重分配過程中所需的記憶體需求,並且減少不必要的資料搬移,以加快整個資料重分配動作的速度。至於非規律性的資料重分配,論文中提出了Essential Cycle Calculation (ECC)技術,可以有效執行任意處理器個數上任意資料分配方式之間的重分配動作。
本論文所提的各項重分配技術,除了運用在一維資料陣列上之外,也可以很容易地應用在多維資料陣列。方法是將演算法一一套用到各個維度,然後將各維度運算結果做交集,即可取得所需的結果。
Abstract
In many parallel programs, run-time data redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. For the heterogeneous computation environment, irregular data redistributions can be used to adjust data assignment. Since data redistribution is performed at run-time, there is a performance trade-off between the efficiency of the new data distribution for a subsequent phase of an algorithm and the cost of redistributing array among processors. Thus, efficient methods for performing data redistribution are of great importance for the development of distributed memory compilers for data-parallel programming languages.
For the regular data redistribution, two approaches are presented in this dissertation, indexing approach and packing/unpacking approach. In the indexing approach, we propose a generalized basic-cycle calculation (GBCC) technique to efficiently generate the communication sets for a BLOCK-CYCLIC(s) over P processors to BLOCK-CYCLIC(t) over Q processors data redistribution. In the packing/unpacking approach, we present a User-Defined Types (UDT) method to perform BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution, using MPI user-defined datatypes. This method reduces the required memory buffers and avoids unnecessary movement of data. For the irregular data redistribution, in this dissertation, an Essential Cycle Calculation (ECC) method will be presented.
The above methods are originally developed for one dimension array. However, the multi-dimension array can also be performed by simply applying these methods dimension by dimension starting from the first (last) dimension if array is in column-major (row-major).
目次 Table of Contents
1 Introduction
2 Related Work
3 Regular Data Redistribution - Indexing Approach
3.1 Preliminaries
3.2 The GBCC method for Data redistribution
3.2.1 Send phase
3.2.2 Receive phase
3.3 The GBCC method for Multi-Dimensional Data redistribution
3.4 Performance Evaluation and Experimental Results
3.4.1 Cost Models
3.4.2 Experimental Results for One-dimensional data redistributions
3.4.3 Experimental Results for Two-dimensional data redistributions
4 Regular Data Redistribution - Packing/Unpacking Approach
4.1 Processes and Algorithms for Data redistribution
4.2 Optimal Method using MPI User-Defined Datatypes
4.2.1 Computation and Communication Models
4.3 Experimental Results
5 Irregular Data Redistribution
5.1 Preliminaries
5.2 The ECC Method for Irregular Data Redistribution
5.2.1 Send phase
5.2.2 Receive phase
5.2.3 Algorithm
5.3 Performance Evaluation and Experimental Results
5.3.1 Cost Models
5.3.2 Experimental Results
6 Conclusions
Bibliography
參考文獻 References
[1] S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S.-H. Teng, “Generating Local Address and Communication Sets for Data Parallel Programs,” Journal of Parallel and Distributed Computing , Vol. 26, pp. 72-84, 1995.
[2] Y.-C. Chung, C.-H. Hsu, and S.-W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 4, pp. 359-377, April 1998.
[3] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M. Wu, “Fortran-D Language Specification,” Technical Report TR-91-170, Dept. of Computer Science, Rice University, Dec. 1991.
[4] M. Guo, L. Nakata and Y. Yamashita, “Contention-Free Communication Scheduling for Data redistribution”, Parallel Computing, Vol.26, pp.1325-1343, 2000.
[5] S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan, “On the Generation of Efficient Data Communication for Distributed-Memory Machines,” Proc. of Intl. Computing Symposium, pp. 504-513, 1992.
[6] S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan, “On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines,” Journal of Parallel and Distributed Computing , Vol. 32, pp. 155-172, 1996.
[7] High Performance Fortran Forum, “High Performance Fortran Language Specification(version 1.1),” Rice University, November 1994.
[8] S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi,” Compilation method for BLOCK-CYCLIC distribution,” In Proc. ACM Intl. Conf. on Supercomputing, pp. 392-403, July 1994.
[9] C.-H. Hsu, S.-W. Bai, Y.-C. Chung, and C.-S. Yang, "A Generalized Basic-Cycle Calculation Method for Efficient Data redistribution," IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 12, pp. 1201-1216, Dec. 2000.
[10] C.-H. Hsu, D.-L. Yang, Y.-C. Chung, and C.-R. Dow, “A Generalized Processor Mapping Technique for Data redistribution,” IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 7, pp. 743-757, July 2001.
[11] Edgar T. Kalns, and Lionel M. Ni, “Processor Mapping Method Toward Efficient Data Redistribution, ” IEEE Transactions on Parallel and Distributed Systems, vol. 6, no. 12, December 1995.
[12] E. T. Kalns and L. M. Ni, “DaReL: A portable data redistribution library for distributed-memory machines,” in Proceedings of the 1994 Scalable Parallel Libraries Conference II, Oct. 1994.
[13] S. D. Kaushik, C. H. Huang, R. W. Johnson, and P. Sadayappan, “An Approach to communication efficient data redistribution,” In Proceeding of International Conf. on Supercomputing, pp. 364-373, July 1994.
[14] S. D. Kaushik, C. H. Huang, J. Ramanujam, and P. Sadayappan, “Multi-phase data redistribution: Modeling and evaluation,” In Proceeding of International Parallel processing Symposium, pp. 441-445, 1995.
[15] S. D. Kaushik, C. H. Huang, and P. Sadayappan, “Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines,” Journal of Parallel and Distributed Computing , Vol. 38, pp. 237-247, 1996.
[16] K. Kennedy, N. Nedeljkovic, and A. Sethi, “Efficient address generation for BLOCK-CYCLIC distribution,” In Proceeding of International Conf. on Supercomputing, Barcelona, pp. 180-184, July 1995.
[17] C. Koelbel,“ Compiler-time generation of communication for scientific programs,” In Supercomputing’91, pp. 101-110, Nov. 1991.
[18] P-Z. Lee and W. Y. Chen, “Compiler methods for determining data distribution and generating communication sets on distributed-memory multicomputers,” 29th IEEE Hawaii Intl. Conf. on System Sciences, Maui, Hawaii, pp.537-546, Jan 1996.
[19] Young Won Lim, Prashanth B. Bhat, and Viktor, K. Prasanna, “Efficient Algorithms for BLOCK-CYCLIC Redistribution of Arrays,” Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, pp. 74-83, 1996.
[20] Y. W. Lim, N. Park, and V. K. Prasanna, “Efficient Algorithms for Multi-Dimensional Block-Cyclic Redistribution of Arrays,” Proceedings of the 26th International Conference on Parallel Processing, pp. 234-241, 1997.
[21] Message Passing Interface Forum, "MPI: A Message-Passing Interface Standard," 12 Oct. 1998.
[22] Message Passing Interface Forum, " MPI-2: Extensions to the Message-Passing Interface," 20 May 1998.
[23] L. Prylli and B. Tourancheau, “Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” Journal of Parallel and Distributed Computing, Vol. 45, pp. 63-72, Aug. 1997.
[24] S. Ramaswamy and P. Banerjee, “Automatic generation of efficient data redistribution routines for distributed memory multicomputers,” Frontier’95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, Mclean, VA. pp. 342-349, Feb. 1995.
[25] S. Ramaswamy, B. Simons, and P. Banerjee, “Optimization for Efficient Data redistribution on Distributed Memory Multicomputers,” Journal of Parallel and Distributed Computing , Vol. 38, pp. 217-228, 1996.
[26] P. Shivam, P. Wyckoff and D. Panda, “EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet Message Passing”, SC2001, Nov. 2001.
[27] J. M. Stichnoth, D. O’Hallaron, and T. R. Gross,” Generating communication for array statements: Design, implementation, and evaluation,” Journal of Parallel and Distributed Computing , Vol. 21, pp. 150-159, 1994.
[28] R. Thakur, A. Choudhary, and G. Fox, “Runtime data redistribution in HPF programs, ” Proc. 1994 Scalable High Performance Computing Conf. , pp. 309-316, May 1994.
[29] Rajeev. Thakur, Alok. Choudhary, and J. Ramanujam, “Efficient Algorithms for Data redistribution, ” IEEE Transactions on Parallel and Distributed Systems, vol. 7, no. 6, pp. 587-594, JUNE 1996.
[30] A. Thirumalai and J. Ramanujam, “HPF array statements: Communication generation and optimization,” 3th workshop on Languages, Compilers and Run-time system for Scalable Computers, Troy. NY, May 1995.
[31] A. Thirumalai and J. Ramanujam, “Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors,” Journal of Parallel and Distributed Computing , Vol. 38, pp. 188-203, 1996.
[32] V. Van Dongen, C. Bonello and C. Freehill,“ High Performance C - Language Specification Version 0.8.9,” Technical Report CRIM-EPPP-94/04-12, 1994.
[33] C. Van Loan,“ Computational Frameworks for the Fast Fourier Transform,” SIAM, 1992.
[34] David W. Walker, Steve W. Otto, “Redistribution of BLOCK-CYCLIC Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, Nov. 1996.
[35] Akiyoshi Wakatani and Michael Wolfe, “A New Approach to Data redistribution: Strip Mining Redistribution,” Proceeding of Parallel Architectures and Languages Europe, July 1994.
[36] Akiyoshi Wakatani and Michael Wolfe, “Optimization of Data redistribution for Distributed Memory Multicomputers, ” Parallel Computing, Vol. 21, No. 9, 1995.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code