國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,應用於H.264國際視訊編碼標準基於視覺注目性分析之視訊轉換編碼演算法,Video Transcoding Algorithm through Visual Attention Model Analysis for H.264/AVC

論文名稱 Title	應用於H.264國際視訊編碼標準基於視覺注目性分析之視訊轉換編碼演算法 Video Transcoding Algorithm through Visual Attention Model Analysis for H.264/AVC
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	96 學年度第 2 學期 The spring semester of Academic Year 96	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	99
研究生 Author	陳世孟 Shih-meng Chen
指導教授 Advisor	葉家宏 Chia-Hung Yeh
召集委員 Convenor	李明穗 Ming-Sui Lee
口試委員 Advisory Committee	郭致宏, 張軒庭, 邱日清 C. H. Kuo; Hsuan-Ting Chang; Jih-Ching Chiu
口試日期 Date of Exam	2008-06-13	繳交日期 Date of Submission	2008-07-24
關鍵字 Keywords	注目性分析、視訊轉換編碼 visual attention, Transcoding
統計 Statistics	本論文已被瀏覽 5644 次，被下載 0 次 The thesis/dissertation has been browsed 5644 times, has been downloaded 0 times.

中文摘要
none
Abstract
The proposed transcoding system consists of the spatial-resolution reduction and the temporal-resolution reduction method via visual attention model analysis. In the spatial domain, the visual attention model can be used to obtain the visual attention region. Then, the bitrate can be reduced since we can extract attention region of the original frame. The attention region conveys the same concept as that of the original frame. In the temporal domain, a frame skipping algorithm is proposed for reducing the temporal resolution to fit the channel target bitrate. The visual attention model is employed to measure the frame complexity in order to determine whether the frames should be skipped or not. Then, we can preserve the significant frames to avoid jerky effect. After combining with the motion vector composition algorithm, we can speedup the transcoding process with slight quality degradation.

目次 Table of Contents
CHAPTER 1 Introduction…………………………………………………………...1 1.1 Overview of Video Coding…………………………………………………1 1.2 Overview of the H.264/AVC Video Coding Standard……………………...6 1.3 Motivation………………………………………………………………...12 1.4 The Organization of the Thesis…………………………………………...15 CHAPTER 2 Backgrounds Review………………………………………………..16 2.1 Previous Works in Video Transcoding……………………………………16 2.2 A Generic Framework of User Attention Model and Its Application in Video Summarization [21]……………………………………………………...18 2.3 Variable Frame Rate Transcoding Considering Motion Information [25]..23 2.4 Motion Vector Composition in Video Transcoding………………………28 2.4.1 Bilinear Interpolation [26]…………………………………………29 2.4.2 Forward Dominant Vector Selection (FDVS) [26]………………...30 2.4.3 Activity-Dominant Vector Selection (ADVS) [26]………………..31 2.4.4 Comparison of Motion Composition Algorithms………………….32 CHAPTER 3 Proposed Video Transcoding Algorithm…………………………….33 3.1 Visual Attention Model…………………………………………………...36 3.1.1 Color quantization…………………………………………………37 3.1.2 Color space transformation………………………………………..39 3.1.3 Contrast value calculation…………………………………………42 3.2 Proposed Video Transcoding in The Spatial Domain……………………..44 3.2.1 Visual attention region extraction………………………………….44 3.2.2 Proposed spatial resolution reduction……………………………..46 3.3 Proposed Video Transcoding in The Frequency Domain…………………49 3.3.1 H.264 rate control mechanism……………………………………..49 3.3.2 Window length decision…………………………………………...51 3.3.3 Non-skipping frame selection……………………………………..53 3.3.4 Frame Skipping Operation………………………………………...54 3.4 Motion Vector Composition………………………………………………57 CHAPTER 4 Experimental Results..........................................................................61 4.1 Experimental Results of Spatial Resolution Reduction..............................64 4.2 Experimental Results of Frame Skipping Algorithm..................................73 4.2.1 PSNR Comparison………………………………………………...73 4.2.2 Frame Rate Comparison…………………………………………...75 4.3 Experimental Results of Frame Skipping Transcoding with Motion Vector Composition.........................................................................................................77 4.3.1 PSNR Comparison………………………………………………...77 4.3.2 Encoding Time Comparison……………………………………….79 CHAPTER 5 Conclusions and Future Work……………………………………….81 5.1 Conclusions…………………………………………………………….....81 5.2 Future Work……………………………………………………………….83 Bibliography………………………………………………………………………….84 Fig. 1-1 Architecture of video transcoding…………………………………………….2 Fig. 1-2 Transcoding scheme………………………………………………………….2 Fig. 1-3 Details of transcoder………………………………………………………….3 Fig. 1-4 Detailed scheme of (a) Encoder and (b) Decoder…………………………….5 Fig. 1-5 The typical video coding and decoding chain………………………………..6 Fig. 1-6 Macroblock partitions: 16x16, 8x16, 16x8, 8x8, and 8x8, 4x8, 8x4, 4x4……8 Fig. 1-7 Concept of multiple reference frames………………………………………...9 Fig. 1-8 Concept of spatial resolution reducing method……………………………..13 Fig. 1-9 Temporal resolution reduction methods (a) Original frame structure, (b) Regular frame skipping, and (c) Dynamic frame skipping………………….14 Fig. 2-1 Architecture of user attention model………………………………………...18 Fig. 2-2 Motion change analysis (a) non frame skipping (b) frame skipping………..24 Fig. 2-3 Predicted window length……………………………………………………27 Fig. 2-4 Motion vector composition scheme…………………………………………29 Fig. 2-5 Interpolation of motion vector………………………………………………29 Fig. 2-6 Forward dominant vector selection composition scheme…………………...30 Fig. 2-7 Concept of the ADVS algorithm……………………………………………32 Fig. 3-1 Block diagram of proposed video transcoding algorithm…………………...35 Fig. 3-2 Contrast behind color, texture, shape perception……………………………36 Fig. 3-3 Example of color quantization………………………………………………38 Fig. 3-4 Color quantization for (a) FantasticFour, (b) BaseballGame, (c) LakePlacid, and (d)KungFu……………………………………………………………..39 Fig. 3-5 Process of color space transformation………………………………………40 Fig. 3-6 Chromaticity diagram of XYZ color space………………………………….40 Fig. 3-7 Chromaticity diagram of L.u.v color space…………………………………41 Fig. 3-8 Results of saliency map (a) FantasticFour, (b) BaseballGame, (c) LakePlacid, and (d)KungFu……………………………...…………………43 Fig. 3-9 Results of visual attention region extraction: (a) FantasticFour, (b) BaseballGame, (c) LakePlacid, and (d) KungFu…………………………...46 Fig. 3-10 Comparisons of the original frame and the visual attention: (a) Original frame and (b) Visual attention region……………………………………...47 Fig. 3-11 Proposed spatial-resolution reduction in different channel conditions…….48 Fig. 3-12 Comparisons of regular and dynamic frame skipping methods for #1 – #7 of Foreman video sequence…………………………………………………..51 Fig. 3-13 Adaptive sliding window length………………………………………….52 Fig. 3-14 Frame skipping operation…………………………………………………..56 Fig. 3-15 MV composition……………………………………………………………58 Fig. 4-1 Extracted region by the proposed algorithm for Hairspary………………….68 Fig. 4-2 Extracted region by the proposed algorithm for FantasticFour……………..69 Fig. 4-3 Extracted region by the proposed algorithm for LakePlacid…………………70 Fig. 4-4 Extracted region by the proposed algorithm for KungFu…………………….71 Fig. 4-5 Extracted region by the proposed algorithm for BaseballGame…………….72 Table 2-1 Relation between frame rate, GOP length, predicted window length, and coded frame number……………………………………………………….26 Table 4-1 Parameter setting in the reference software JM12.3………………………63 Table 4-2 Specification of the test platform………………………………………….63 Table 4-3 Bit rate comparison between original and transcoded sequences ……………………………………………………………………………..65 Table 4-4 Bit rate comparison between original and transcoded sequences ……………………………………………………………………………..66 Table 4-5 Bit rate comparison between original and transcoded sequences ……………………………………………………………………………..67 Table 4-6 PSNR comparison for CIF size transcoded from 512kbps to 256kbps……74 Table 4-7 PSNR comparison for CIF size transcoded from 512kbps to 170kbps…....74 Table 4-8 PSNR comparison for QCIF size transcoded from 128kbps to 64kbps…...74 Table 4-9 Frame rate comparison for CIF size transcoded from 512kbps to 256kbps ……………………………………………………………………………..75 Table 4-10 Frame rate comparison for CIF size transcoded from 512kbps to 170kbps ……………………………………………………………………………..76 Table 4-11 Frame rate comparison for QCIF size transcoded from 128kbps to 64kbps ……………………………………………………………………………..76 Table 4-12 PSNR comparison for CIF size transcoded from 512kbps to 256kbps…..78 Table 4-13 PSNR comparison for CIF size transcoded from 512kbps to 170kbps…..78 Table 4-14 PSNR comparison for QCIF size transcoded from 128kbps to 64kbps….78 Table 4-15 Encoding time comparison for CIF size transcoded from 512kbps to 256kbps…………………………………………………………………..79 Table 4-16 Total encoding time comparison for CIF size transcoded from 512kbps to 170kbps…………………………………………………………………..80 Table 4-17 Total encoding time comparison for QCIF size transcoded from 128kbps to 64kbps…………………………………………………………………80

參考文獻 References
[1] A. Vetro, C. Christopoulos and H. Sun, “Video transcoding architectures and techniques: an overview,” IEEE Signal Processing Magazine, vol. 20, pp. 18-29, March 2003. [2] J. Xin, C.-W. Lin, and M.-T. Sun, “Digital video transcoding,” in Proceedings of the IEEE, vol. 93, pp. 84-97, January 2005. [3] I. Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang, “Video transcoding: an overview of various techniques and research issues,” IEEE Transactions on Multimedia, vol. 7, pp. 793-804, October 2005. [4] Iain E. G. Rchardson, “H.264 and MPEG-4 video compression video coding for next-generation multimedia,” WILEY. [5] Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, “Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 \| ISO/IEC 14496-10 AVC),” document JVT-G050d35.doc, March 2003. [6] W. James, The Principles of Psychology. Cambridge, MA: Harvard Univ. Press, 1890. [7] K. Lee, H.-S. Chang, S.-S. Chun, H. Choi, and S. Sull, “Perceptionbased image transcoding for universal multimedia access,” in Proceedings of 8th Int. Conference Image Process, vol. 2, pp. 475–478, 2001. [8] L.-Q. Chen, X. Xie, X. Fan, W.-Y. Ma, H.-J. Zhang, and H.-Q. Zhou, “A visual attention model for adapting images on small displays,” Multimedia Systems, vol. 9, no. 4, pp. 353–364, October 2003. [9] L. Itti, C. Koch, and E. Niebur, “Amodel of saliency-based visual attention for rapid scene analysis,” IEEE Transactions Pattern Analysis Machine Intelligence, vol. 20, no. 11, pp. 1254–1259, 1998. [10] C.-M. Privitera and L.-W. Stark, “Algorithms for defining visual regions-of-interest: Comparison with eye fixations,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 22, no. 9, pp. 970–982, 2000. [11] W.-H. Cheng, W.-T. Chu, and J.-L. Wu, “A visual attention based region-of-interest determination framework for video sequences,” IEICE Transactions on Information System, vol. E-88D, no. 7, pp. 1578–1586, July 2005. [12] C.-W. Lin, Y.-C. Chen, and M.-T. Sun, “Dynamic region of interest transcoding for multipoint video conferencing,” IEEE Transactions on Circuits System Video Technology, vol. 13, no. 10, pp. 982–992, Oct. 2003. [13] C.-C. Ho, J.-L. Wu, and W.-H. Cheng, “A practical foveation-based rate-shaping mechanism for MPEG videos,” IEEE Transactions on Circuits System Video Technology, vol. 15, no. 11, pp. 1365–1372, Nov. 2005. [14] M.-M. Hannuksela, Y.-K. Wang, and M. Gabbouj, “Isolated regions in video coding,” IEEE Transactions on Multimedia, vol. 6, no. 2, pp. 259–267, Apr. 2004. [15] J.-N. Hwang and T.-D. Wu, “Motion vector re-estimation and dynamic frame-skipping for video transcoding,” in Proceedings of IEEE International Conference on Signals, System and Computer, vol. 2, pp. 1606-1610, 1998. [16] H.-M. Hang and J.-J. Chen, “Source model for transform video coder and its application-part ii: variable frame rate coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, pp. 299-311, April 1997. [17] J.-R. Corbera and S. Lei, “Rate control in DCT video coding for low-delay communications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 172-185, February 1999. [18] H. Song and C.-C. Jay Kuo, “Rate control for low-bit-rate video via variable-encoding frame rates,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, pp. 512-521, April 2001. [19] A. Vetro, Y. Wang and H. Sun, “Estimating distortion of coded and non-coded frames for frameskip-optimized video coding,” in Proceedings of IEEE International Conference on Multimedia and Expo, vol. 1, pp. 419-422, August 2001. [20] T.-Y. Kuo, Y. Liang and C.-C. Chu, “Variable frame skipping scheme based on estimated quality of non-coded frames at decoder for real-time block-based video coding,” in Proceedings of IEEE International Conference on Multimedia and Expo, vol. 2, pp. 1127-1130, June 2004. [21] Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A generic framework of user attention model and its application in video summarization,” IEEE Transactions on multimedia, vol. 7, no. 5, pp. 907-919, October 2005. [22] S.-Z. Li et al., “Statistical learning of multi-view face detection,” in Proceedings of European Conference Computer Vision, vol. 4, pp. 67–81, May 27–Jun. 2, 2002. [23] D.-J. Lan, Y.-F. Ma, and H.-J. Zhang, “A novel motion-based representation for video mining,” in Proceedings of IEEE International Conference Multimedia and Expo, vol. 3, pp. 469–472, Jul. 6–9, 2003. [24] Y.-F. Ma, L. Lu, H.-J. Zhang, and M.-J. Li, “A user attention model for video summarization,” in Proceedings of Association for Computing Machinery Multimedia conference, pp. 533–542, 2002. [25] H. Shu and L.-P. Chau, “Variable frame rate transcoding considering motion information,” in Proceedings of IEEE International Symposium on Circuits and Systems, vol. 3, pp. 2144-2147, May 2005. [26] I. Ahmad, X. Wei, Y. Sun and Y.-Q. Zhang, “Video transcoding: an overview of various techniques and research issues,” IEEE Transactions on Multimedia, vol. 7, pp. 793-804, October 2005. [27] Y.-F. Ma, H.-J. Zhang, “Contrast-based image attention analysis by using fuzzy growing,” in Proceedings of Association for Computing Machinery Multimedia conference, pp.374-381, 2003. [28] V. Patil and R. Kumar, “An effective motion re-estimation in frame-skipping video transcoding,” in Proceedings of International Conference on Computing: Theory and Applications, 2007. [29] C.-Y. Chen, C.-T. Hsu, C.-H. Yeh, and M.-J. Chen, “ Arbitrary frame skipping transcoding through spatial-temporal complexity analysis”, in Proceedings of TENCON 2007, 2007. [30] L. Zusne, “Contemporary theory of visual form perception: III,” The global Theories, chapter 4, p108-174, 1970.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.147.205.154 論文開放下載的時間是校外不公開 Your IP address is 3.147.205.154 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS