國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,資料平行程式中產生本地記憶體存取序列與通訊集合之研究 ,A Study on the Generation of Local Memory Access Sequences and Communication Sets for Data-Parallel Programs

論文名稱 Title	資料平行程式中產生本地記憶體存取序列與通訊集合之研究 A Study on the Generation of Local Memory Access Sequences and Communication Sets for Data-Parallel Programs
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	91 學年度第 1 學期 The fall semester of Academic Year 91	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	112
研究生 Author	許良政 Liang-Cheng Shiu
指導教授 Advisor	黃宗傳 Tsung-Chuan Huang
召集委員 Convenor	陳正 Cheng Chen
口試委員 Advisory Committee	王勝德, 金仲達, 陳省隆, 曾煜棋, 施國琛, 許健平, 李丕榮 Sheng-De Wang; Chung-Ta King; Hsing-Long Chen; Yu-Chee Tseng; Timothy K. Shih; Jang-Ping Sheu; Pei-Zong Lee
口試日期 Date of Exam	2003-01-14	繳交日期 Date of Submission	2003-02-13
關鍵字 Keywords	記憶體存取序列、耦合註標、通訊集、多重引導變數 coupled subscript, local memory access sequence, communication set, multiple induction variable
統計 Statistics	本論文已被瀏覽 5726 次，被下載 2221 次 The thesis/dissertation has been browsed 5726 times, has been downloaded 2221 times.

中文摘要
分散式記憶體機器提供了高效率計算解決許多科學應用上的問題。但若用傳統的程式語言在這種機器上撰寫程式，是不能期待能達到好的效能。資料平行語言讓程式設計師有一個全域記憶體空間，免除耗時、易錯的插入處理間通訊動作。也就是由資料平行語言的編譯器來接手這些工作。一般的資料平行語言諸如等均提供對齊(alignment)和分散(distribution)兩指令，將陣列元素對齊到樣版(template)上，然後分散到每個處理機，以便程式設計者能將資料配置於各處理機之本地記憶體中。一般的平行編譯器是依據擁有者計算的原則進行計算，也就是一個計算式是由擁有左邊元素的處理機負責計算。經資料分散後，對一個處理機而言，其所擁有的資料僅是全部資料的一部份，而且並不是每一個資料元素都是活躍元素―活躍元素就是程式執行時會被存取的元素；若依序一一判斷資料元素是否會被存取是相當沒效率的，如何正確地產生本地記憶體存取序列是個關鍵的問題，因其將直接影響到程式碼之執行效率。在諸多記憶體存取序列的研究中，大多集中在簡單註標的處理上，但當註標形式為複雜註標時，計算活躍元素的過程也會變得很複雜，雖然可以利用簡單註標的方法重複引用多次來解決複雜註標的問題，但對某些形式的複雜註標並不適合，因此我們針對兩種常見的複雜註標¾耦合註標及多重引導變數，就其記憶體存取的方法仔細探討。此外，如上所述，一個處理機所擁有的資料僅是所有資料的一部份，若所負責計算之計算式其右邊元素不在其本地記憶體內時，必須由其他的處理機經由通訊的方式來取得。如吾人所知，經由處理機間的通訊取得非本地資料所需的時間大約是存取本地記憶體時間的10~100倍，因此如何有效的產生通訊集，使處理機所需的非本地資料能有效率的送達是極為重要的課題。就針對上述四個主題：本地記憶體存取序列、耦合註標及多重引導變數之記憶體存取方法與通訊集的產生，除了介紹相關技術外、本論文分別提出區塊壓縮/分解、使用較小的表格、Course Distance和區域區段距離的觀念來闡明我們的作法，並輔以實驗數據以說明方法的優越性。
Abstract
Distributed-memory multiprocessors offer very high levels of performance that are required to solve scientific applications. A traditional programming language cannot be expected to yield good performance when used to program such machines. Data-parallel languages provide programmers with a global memory and relieve them from the burden of inserting time-consuming, error-prone inter-processor communication. The compilers of these languages perform this task. Data-parallel languages also enable the programmers to establish alignment and distribution directives which specify the type of data parallelism and data mapping to the underlying parallel architecture. Parallelizing compilers distribute data and generate code according to the owner-computes rule when compiling an array statement. The array elements in a processor it owns are only a fraction of all the array elements. Not all of the array elements in the processor are active elements, so determining local memory access sequence is important. However, generating local memory access sequences becomes rather complicated when the array references involve complex subscripts. This study considers two types of complex subscript ― coupled subscripts and multiple induction variables. A processor may refer to the rhs (right-hand side) array elements owned by other processors, and the movement of data is inevitable. The overhead to access non-local data by inter-processor communication may be around 10 to 100 times more than the cost of accessing local data. Efficiently generating communication sets is important. This thesis introduces the concept of block compression/decompression, using smaller iteration tables, course distance and local block distance to solve problems of local memory access sequences, coupled scripts, MIV subscripts and communication set generation. Related work on these problems is reviewed and experimental results to demonstrate the benefit of the proposed methods.

目次 Table of Contents
ACKNOWLEDGE I ABSTRACT II 摘要 III CHAPTER 1 INTRODUCTION 1 1.1. PARALLEL COMPUTERS 1 1.2. DATA-PARALLEL LANGUAGES 3 1.2.1. PROCESSORS DIRECTIVE 6 1.2.2. TEMPLATE DIRECTIVE 6 1.2.3. ALIGNMENT DIRECTIVE 7 1.2.4. DISTRIBUTE DIRECTIVE 7 1.3. PARALLELIZING COMPILERS 9 1.4. MOTIVATIONS 14 1.5. ORGANIZATION OF THE DISSERTATION 16 CHAPTER 2 LOCAL MEMORY ACCESS SEQUENCES GENERATION USING PERMUTATION 18 2.1. PROBLEM STATEMENT 18 2.2. RELATED WORK 20 2.3. CONCEPT OF COMPRESSION AND DECOMPRESSION 21 2.3.1. BLOCK COMPRESSION 22 2.3.2. BLOCK DECOMPRESSION 25 2.4. LOCAL BLOCK SEQUENCE GENERATION 27 2.5. ALGORITHMS 31 2.6. EXPERIMENTAL RESULTS 32 2.7. CHAPTER SUMMARY 38 CHAPTER 3 COUPLED SUBSCRIPTS 41 3.1. PROBLEM STATEMENT 41 3.2. RELATED WORK 44 3.3. THE PROPOSED METHOD 45 3.4. ALGORITHMS 51 3.5. EXPERIMENTAL RESULTS 53 3.6. CHAPTER SUMMARY 54 CHAPTER 4 MULTIPLE INDUCTION VARIABLES 55 4.1. PROBLEM STATEMENT 55 4.2. RELATED WORK 57 4.3. THE PROPOSED METHOD 58 4.3.1. BRIEF REVIEW OF MEMORY ACCESS SEQUENCE BY PERMUTATION 58 4.3.2. THE COURSE DISTANCE 61 4.4. THE ALGORITHM 67 4.5. EXPERIMENTAL RESULTS 69 4.6. CHAPTER SUMMARY 71 CHAPTER 5 COMMUNICATION SETS GENERATION 73 5.1. PROBLEM STATEMENT 73 5.2. RELATED WORK 76 5.3. COMMUNICATION SET GENERATION 77 5.3.1. THE SENDING PHASE ALGORITHM 77 5.3.2. THE RECEIVE - EXECUTE PHASE ALGORITHM 81 5.4. EXPERIMENTAL RESULTS 85 5.5. CHAPTER SUMMARY 88 CHAPTER 6 CONCLUSIONS AND FUTURE WORK 89 6.1. CONCLUSIONS 89 6.2. FUTURE WORK 91 REFERENCES 92 VITA 99 PUBLICATIONS 100

參考文獻 References
[1] R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, Nov. 2001. [2] Ancourt, F. Coelho, F. Irigoin, and R. Keryell, “A linear algebra framework for static High Performance Fortran code distribution,” Scientific Programming, Vol. 6, No. 1, pages 3-28, Spring 1997. [3] ANSI. American National Standard Fortran, X3.9-1978, American National Standards Institute, New York, 1978. [4] S. P. Amarasinghe, J. M. Anderson, M. Lam and C. -W. Tseng, “An overview of the SUIF compiler for scalable parallel machines,” In Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, pages 662-667, San Francisco, CA, Feb. 1995. [5] S. P. Amarasinghe and M. S. Lam, “Communication optimization and code generation for distributed Memory Machines,” In Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation, pages 126-138, June 1993. [6] J. Anderson and M. S. Lam. “Global optimizations for parallelism and locality on scalable parallel machines,” In Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation, pages 111-121, June 1993. [7] V. Balasundaram, “A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor,” Journal of Parallel and Distributed Computing, Vol. 9, No. 2, pages 154-170, June 1990. [8] V. Balasundaram, G. C Fox, K. Kennedy and U. Kremer, “An interactive environment for data partitioning and distribution,” In Proceedings of Fifth Distributed Memory Computing Conference, pages 1160-1170, April 1990. [9] P. Banerjee, J. A. Chandy, M. Gupta, E. W. Hodges IV, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy and E. Su, “The PARADIGM compiler for distributed-memory multicomputers,” IEEE Computer, Vol. 28, No. 10, pages 37-47, Oct. 1995. [10] S. Benkner. Vienna Fortran 90 and its Compilation. Ph.D. Dissertation, Vienna University, 1994. [11] Z. Bozkus. Compiling FORTRAN 90D/HPF for Distributed Memory MIMD Computers. Ph.D. Dissertation, Syracuse University, 1995. [12] B. M. Chapman, P. Mehrotra and H. P. Zima, “Programming in Vienna Fortran,” Scientific Programming, Vol. 6, No. 1, pages 31-50, Fall 1992. [13] B. M. Chapman, P. Mehrotra and H. P. Zima. “Vienna Fortran, a Fortran language extension for distributed memory multiprocessors,” In J. Saltz and P. Mehrotra, editors, Language, Compilers and Runtime Environments for Distributed Memory Machines, pages 39-62, North-Holland, Amsterdamm, The Netherlands, 1992. [14] S. Chatterjee, J. Gilbert, F. Long, R. Schreber and S. Teng, “Generating local addresses and communication sets for data parallel programs,” Journal of Parallel and Distributed Computing, Vol. 26, No. 1, pages 72-84, 1995. [15] F. Coelho, C. Germain and J. L. Pazat, “State of the art in compiling HPF,” In The Data Parallel Programming Model, G. R. Perrin and A. Darte, Eds., pages 104-133, Springer, 1996. [16] F. Coelho. Contributions to HPF Compilation. Ph.D. Dissertation, École des mines de Paris, Oct. 1996. [17] J. L. Dekeyser and P. Marquet, “Support irregular and dynamic computations in data parallel languages,” In G. R. Perrin and A. Darte, editors, The Data Parallel Programming Model, pages 197-219, Springer, 1996. [18] S. Dutta. Compilation and Run-Time Techniques for Data Parallel Programs. M.S. Thesis, Department of Electrical and Computer Engineering, Louisiana State University, Dec. 1997. [19] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. W. Tseng, and M. Wu. Fortran D language specification. Technical Report TR-91-170, Department of Computer Science, Rice University, Dec. 1991. [20] K. Gallivan, W. Jalby and D. Gannon, “On the problem of optimizing data transfers for complex memory systems,” In Proceedings of the Second ACM International Conference on supercomputing, pages 238-253, Saint Malo, France, 1988. [21] J. Garcia. Automatic Distribution for Massively Parallel Processors. Ph.D. Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, Feb. 1997. [22] M. Gerndt. Automatic Parallelization for Distributed-Memory Multiprocessing Systems. Ph. D. Dissertation, University of Bonn, Dec. 1989. [23] M. Gerndt, “Updating distributed variables in local computations,” Concurrency:Pratice and Experience, Vol. 2, No. 3, pages 171-193, Sept 1990. [24] M. Gupta and P. Banerjee, “Demonstration of automatic data partitioning techniques for parallelizing compilers on Multicomputers,” IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No 2, pages 179-193, March 1992. [25] S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan, “On compiling array expressions for efficient execution on distributed-memory machines,” Journal of Parallel and Distributed Computing, Vol. 32, No.2, pages 155-172, 1996. [26] High Performance Fortran Forum. High Performance Fortran Language Specification. Nov. 1994. (Version 1.1). [27] S. Hiranandani, K. Kennedy and C. W. Tseng, “Evaluation of compiler optimizations for Fortran D on MIMD distributed-memory machines,” In Proceedings of Sixth ACM International Conference on Supercomputing, pages 1-14, Washington D.C., July 1992. [28] S. Hiranandani, K. Kennedy, J. Mellor-Crummey, “Compilation techniques for block-cyclic distribution,” In Proceedings of the 1994 ACM International Conference on Supercomputing, pages 392-403, Manchester, England, July 1994. [29] C. H. Hsu, S. W. Bai, Y. C. Chung, and C. S. Yang, “A generalized Basic-Cycle calculation method for array redistribution,” IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 12, pages 1201-1216, Dec. 2000. [30] K. Kennedy, N. Nedeljkovic and A. Sethi, “A linear-time algorithm for computing the memory access sequence in data-parallel programs,” In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 102-111, Santa Barbara, CA, July 1995. [31] K. Kennedy, N. Nedeljkovic and A. Sethi, “Efficient address generation for block-cyclic distributions,” In Proceedings ACM International Conference on Supercomputing, pages 180-184, Barcelona, Spain, July 1995. [32] K. Kennedy, N. Nedeljkovic, and A. Sethi, “Communication generation for cyclic(k) distributions,” In B. Szymanski and B. Sinharoy, editors, Languages, Compilers, and Run-Time Systems for Scalable Computers (Kluwer Academic Publishers, 1995). [33] D. E. Knuth. The Art of Computer Programming. Addison Welsely Publishing Company, Vol. 2, Second Edition, 1981. [34] C. Koelbel. “Compile-time generation of regular communication patterns,” In Proc. of Supercomputing, pages 101-110, Albuquerque, New Mexico, Nov. 1991. [35] C. Koelbel and P. Mehrotra, “Compiling global name-space parallel loops for distributed execution,” IEEE Transaction on Parallel and Distributed System, Vol. 2 No. 4, pages 440-451, Oct. 1991. [36] C. Koelbel, D. B. Loveman, R. S. Schreiber, G. L. Steele Jr., and M. E. Zosel, The High Performance of Fortran Handbook. The MIT Press Cambridge, MA, 1994. [37] X. Li. Runtime Oriented HPF Compilation. Technical Report CRPC-TR97694, Department of Computer Science, Rice University, Feb. 1997. [38] J. Li and M. Chen, “The data alignment phase in compiling programs for distributed-memory machines,” Journal of Parallel and Distributed Computing, Vol. 13, No. 2, pages 213-221, Oct. 1991. [39] D. E. Maydan, S. P. Amarasinghe and M. S. Lam, “Array data flow analysis and its use in array privatization,” In Proceedings 20th Annual ACM Symposium on Principles of Programming Languages, pages 2-15, Charleston, SC, Jan. 1993. [40] S. Midkiff, “Local iteration set computation for block-cyclic distributions,” In Proceedings of International Conference on Parallel Processing, Vol. 2, pages 77-84, Aug. 1995. [41] S. Midkiff, “Computing the local iteration set of a block-cyclically distributed reference with affine subscripts,” In Proceedings Sixth Workshop on Compilers for Parallel Computers, Aachen, Germany, Dec. 1996. [42] S. Midkiff, “Optimizing the representation of local iteration sets and access sequence for block-cyclic distributions,” In Proceedings of Languages and Compilers for Parallel Computing, San Jose, CA, August 1996. Also available in D. Sehr, et al., Eds., Lecture Notes in Computer Science, Vol. 1239, pages 420-434, Springer-Verlag, 1997. [43] Office of Science and technology Policy, Grand Challenges 1993: High Performance Computing and Communications. Washington, D.C.: National Science Foundation, 1992. [44] D. J. Palermo. Compiler Techniques for Optimizing Communication and Data Distribution for Distributed-Memory Multicomputers. Ph.D. Thesis, University of Illinois at Urbana-Champaign, 1996. [45] A. P. Petitet and J. J. Dongarra, “Algorithmic redistribution methods for block cyclic decompositions,” IEEE Transactions on Parallel and Distributed Computing, Vol. 7, No. 12, pages 201-220, 1999. [46] J. Ramanujam, S. Dutta and A. Venkatachar, “Code generation for complex subscripts in data-parallel programs,” In Languages and Compilers for Parallel Computing, Z. Li et al., Eds., Lecture Notes in Computer Science, Volume 1366, pages 49-63, Springer-Verlag, 1998. [47] J. Ramanujam and P. Sadayappan, “Compile-time techniques for data distribution in distributed memory machines,” IEEE Transactions on Parallel and Distributed Systems, Vol. 2, No 4, pages 472-481, Oct. 1991. [48] H. Richardson. High Performance Fortran History, Overview and Current Status, Version 1.2, Edinburgh Parallel Computing Centre, The University of Edinburgh, Sept. 1995. [49] A. Rogers and Pingali, “Compiling for distributed memory architectures,” IEEE Transactions on Parallel and Distributed Systems, Vol.5, No. 2, pages 281-298, Feb. 1994. [50] J. J. Rotman. An Introduction to the Theory of Groups. Fourth Edition, Springer-Verlag, New York, 1994. [51] A. Sethi. Communication Generation for Data-Parallel Languages. Ph.D. Thesis, Department of Computer Science, Rice University, Dec. 1996. [52] K. P. Shih, J. P. Sheu, C. H. Huang, and C.Y. Chang, “Efficient index generation for compiling two-level mappings in data-parallel programs, ” Journal of Parallel and Distributed Computing, Vol. 24, pages 189-216, 2000. [53] J. M. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie-Mellon University, Feb. 1993. [54] J. M. Stichnoth, D. O'Hallaron, and T. Gross. “Generating communication for array statements: design, implementation, and evaluation,” Journal of Parallel and Distributed Computing, Vol. 21, No. 1, Pages 150-159, 1996. [55] E. Su, D. J. Palermo, and P. Banerjee, “Processor tagged descriptors: a data structure for compiling for distributed-memory multicomputers,” In Proceedings of the 1994 International Conference on Parallel Architectures and Compilation Techniques, pages 123-132, Montréal, Canada, Aug. 1994. [56] E. Su, A. Lain, S. Ramaswamy, D. J. Palermo, E. W. Hodges IV, and P. Banerjee, “Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers,” In Proceedings of the 9th ACM International Conference on Supercomputing, pages 424-433, Barcelona, Spain, July 1995. [57] A. Thirumalai and J. Ramanujam, “Fast address sequence generation for data parallel programs using integer lattices,” In Languages and Compilers for Parallel Computing, C. H. Huang et al., Eds, Lecture Notes in Computer Science, Vol. 1033, pages 191-208, Springer-Verlag, 1996. [58] A. Thirumalai and J. Ramanujam, “Efficient computation of address sequences in data parallel programs using closed form for basis vectors,” Journal of Parallel and Distributed Computing. Vol. 38, No. 2, pages 188-203, Nov 1996. [59] C. W. Tseng. An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines. Ph.D. Thesis, Rice University, 1993. [60] A. Venkatachar, J. Ramanujam and A. Thirumalai, “Communication generation for block-cyclic distributions,” Parallel Processing Letters, Vol. 7, No. 2 pages 195-202, June 1997. [61] W. H. Wei, K. P. Shih, and J. P. Sheu, “Compiling array references with affine functions for data-parallel programs,” Journal of Information Science and Engineering, Vol. 14, No. 4, pages 695-723, Dec. 1998. [62] M. Wolfe. High Performance Compilers for Parallel Computing. Redwood City, CA: Addison-Wesley, 1996. [63] M. Y. Wu and D. D. Gajski. “A programming aid for message-passing systems,” Parallel Processing for Scientific Computing, pages 328-332, 1989. [64] H. Zima, H. J. Bast and H. M. Gerndt, “SUPERB: a tool for semi-automatic MIMD/SIMD parallelization,” Parallel Computing, Vol. 6, pages 1-18, 1988. [65] H. Zima, P. Brezany, and B. M. Chapman, “SUPERB and Vienna Fortran,” Parallel Computing, Vol. 20, pages 1487-1517, 1994. [66] H. Zima and B. Chapman, “Compiling for distributed-memory systems.” In Proceedings of the IEEE, Special Section on Languages and Compilers for Parallel Machines, pages 264-287, Feb. 1993.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0213103-150004.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS