Responsive image
博碩士論文 etd-0820109-103248 詳細資訊
Title page for etd-0820109-103248
論文名稱
Title
動作估測演算法在Cell處理器架構上之最佳化設計
The Optimal Design for Motion Estimation Algorithm on Cell Processor Architecture
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
67
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2009-07-24
繳交日期
Date of Submission
2009-08-20
關鍵字
Keywords
處理器、動作估測演算法
Cell, Motion Estimation
統計
Statistics
本論文已被瀏覽 5660 次,被下載 0
The thesis/dissertation has been browsed 5660 times, has been downloaded 0 times.
中文摘要
由於當前網路的發展以及視訊傳輸技術的演進,網路多媒體的應用已經是不可或缺的一個部份。視訊資料編碼是視訊傳輸技術中相當重要的一環,而動作估測演算法是資料編碼過程中一個重要的動作。目前常見的視訊壓縮標準如MPEG-2、H.263、H.264等都有採用動作估測演算法。動作估測演算法的特性是處理大量的原始資料,若能以平行化的方式進行運算將可縮短資料編碼的時間以利在嵌入式系統上實現。
目前較常見的動作估測演算法有Full Search、Three-Step Search、Diamond Search等。我們將會針對Three-Step Search以及Diamond Search兩者做演算法的最佳化。IBM Cell 處理器平台架構具一顆PPE以及八顆SPE所組成,是異質型 (heterogeneous) 的多核心系統,且可以在thread-level與data-level進行高度的平行化處理,此外它還具有快速的記憶體系統解決資料平行化處理後資料頻寬需求大增的問題。除此之外IBM公司提供了相當方便的虛擬平台,所以我們選擇以此作為實現探討的平台。
本論文實現最佳化的方式是根據Cell處理器的特性如高速的資料通道、向量的指令、NUMA (Non-Uniform Memory Access) 的異質性多核心架構為基礎,設計出multiple buffering的DMA (Direct Memory Access) 資料存取機制、以SIMD的向量計算機制取代原本的純量運算機制與以減少branch指令數的方式來避免branch miss造成的penalty,再加上設計適切的資料分割排程方式來對動作估測演算法加速,期望在異質性多核心的環境下設計出動作估測的最佳演算法架構。依據以上的加速機制,我們設計出的動作估測演算法在使用CIF (352*288) 大小影像並且使用五張參考影像的情況下實驗結果可以達到每張影像13.26ms的處理速度。
Abstract
Multimedia on network has been an integral part of our life because of the development of network and the evolution of video transmission technology. Motion estimation algorithm is an important part of video transmission technology. If we can parallelize the calculation, the efficiency of will be raised in order to be realized on embedded system.
At present, the more common estimation algorithm is Full Search, Three-Step Search, Diamond Search and so on. We will optimize the Three-Step Search and the Diamond Search. IBM Cell platform architecture with a PPE and eight SPE is a heterogeneous type multi-core system. It can be thread-level and data-level to a high degree of parallel processing, and has a rapid memory parallel system of information processing demand for data bandwidth problem. In addition, IBM provides a very convenient virtual platform, so we have chosen to explore as a means of bringing the platform.
In this paper, the best way to optimize the algorithm is based on characteristics of Cell processor such as the and high-speed data channel, vector instructions, NUMA (Non-Uniform Memory Access) the heterogeneity of multi-core architecture, the design of multiple buffering of DMA (Direct Memory Access) mechanisms and using vector SIMD computer mechanism to replace the original scalar computing mechanism and to reduce the number of branch instructions to avoid causing the penalty due to branch miss. Based on the acceleration of the above mechanism, we design algorithms for motion estimation in the use of CIF image size and the use of reference images of five cases the experimental results can be achieved for each image processing speed of 13.26ms.
目次 Table of Contents
摘要 I
目錄 IV
圖目錄 VI
表目錄 VIII
第一章 簡介 1
1-1研究動機 1
第二章 相關研究 3
2-1 Cell B.E. 處理器架構 3
2-2 Motion Estimation 演算法 16
2-2-1 Full Search Algorithm 16
2-2-2 Three-Step Search Algorithm 18
2-2-3 Diamond Search Algorithm 19
2-3 Cell B.E.平台上運算加速的方法 22
2-3-1 以多顆SPE平行運算來加速 22
2-3-2 以SIMD的方式進行運算 23
2-3-3 Multiple Buffering 24
2-3-4 減少branch的指令數 25
第三章 27
Cell平台上動作估測演算法設計 27
3-1 Three-Step Search 在Cell平台上的實現 27
3-2以8顆SPE平行運算進行加速 29
3-3以SIMD運算進行加速 32
3-4以Multiple Buffering進行加速 34
3-5減少branch指令數進行加速 36
3-6 Diamond Search 在Cell平台上的實現 40
第四章 模擬與分析 43
4-1模擬平台介紹 43
4-2各加速機制之模擬結果 44
4-2-1 以單顆SPE來執行程式 44
4-2-2 以8顆SPE來執行程式 45
4-2-3 使用8顆SPU並以SIMD方式來執行程式 46
4-2-4使用8顆SPU加上以SIMD方式來計算並減少branch指令再加上multiple buffering的DMA傳輸 47
4-2-5使用8顆SPU加上以SIMD方式來計算並減少branch指令 48
4-2-6效能分析 49
4-2-7 Diamond Search效能分析 50
4-2-8 使用不同來源影像的效能分析 52
第五章 結論 53
參考文獻 54
參考文獻 References
[1] http://www.ibm.com/developerworks/power/cell/
[2] http://www.research.ibm.com/cell/heterogeneousCMP.html
[3] Jim Kahle, “Cell Architecture”, IBM Fellow
[4] J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy, "Introduction to the Cell Multiprocessor," IBM Journal of Research and Development, Vol. 49, No. 4/5, July/Sept. 2005.
[5] Peter Hofstee, “Cell Today and Tomorrow”, Ph.D., Cell Chief Scientist and Chief Architect
[6] IBM , “Synergistic Processor Unit Instruction Set Architecture” Version 1.2
[7] Redbooks, “Programming the Cell Broadband Engine™ Architecture: Examples and Best Practices”, published on 8 August 2008
[8] Michael Gschwind, H. Peter Hofstee, Brian Flachs, Martin Hopkins,IBM ,Yukio Watanabe, Toshiba ,Takeshi Yamazaki, Sony Computer Entertainment, “SYNERGISTIC PROCESSING IN CELL’S MULTICORE ARCHITECTURE” , published by the IEEE Computer Society
[9] T. Koga, K. Iinurna, A. Hirano, Y.Iijima, and T. Ishiguro, “Motion-compensated interframe coding for video conferencing,” in Proc. NTC 81, New Orleans, LA, Nov./Dec. 1981.
[10] Renxiang Li, Bing Zeng, and Ming L. Liou,” A New Three-Step Search Algorithm for Block Motion Estimation”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 4, NO. 4, AUGUST 1994
[11] Jo Yew Tham, Surendra Ranganath, Maitreya Ranganath, and Ashraf Ali Kassim, “A Novel Unrestricted Center-Biased Diamond Search Algorithm for Block Motion Estimation”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 8, NO. 4, AUGUST 1998
[12] Shan Zhu and Kai-Kuang Ma, “ A New Diamond Search Algorithm for Fast Block-MatchingMotion Estimation”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 9, NO. 2, FEBRUARY 2000
[13] Chi-Wai Lam , Lai-Man Po and Chun Ho Cheung, ” A NEW CROSS-DIAMOND SEARCH ALGORITHM FOR FAST BLOCK MATCHING MOTION ESTIMATION”
[14] Pedro Trancoso, Despo Othonos, and Artemakis Artemiou,” Data Parallel Acceleration of Decision Support QueriesUsing Cell/BE and GPUs ”, CF’09, May 18–20, 2009, Ischia, Italy.
[15] Tao Liu, Haibo Lin, Tong Chen, John Kevin O'Brien, Ling Shao, “DBDB: optimizing DMA transfer for the Cell BE Architecture”, ICS ‘09
[16] Konstantis Daloukas, Christos D. Antonopoulos, Nikolaos Bellas, “Implementation of a wide-angle lens distortion correction algorithm on the cell broadband engine “, ICS ’09
[17] Daniele Paolo Scarpazza, Gregory F. Russell, ” High-performance regular expression scanning on the Cell/B.E. processor”, ICS ’09.
[18] Svetislav Momcilovic, Leonel Sousa, “A PARALLEL ALGORITHM FOR ADVANCED VIDEO MOTION ESTIMATION ON MULTICORE ARCHITECTURES”, International Conference on Complex, Intelligent and Software Intensive Systems 2008.
[19] SVETISLAV MOMCILOVIC AND LEONEL SOUSA,” PARALLEL ADVANCED VIDEO CODING: MOTION ESTIMATION ON MULTI-CORES”, 2008 SCPE.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.217.139.141
論文開放下載的時間是 校外不公開

Your IP address is 18.217.139.141
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code