國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於人臉辨識的視訊摘要系統,Video Summarization based on Face Recognition

論文名稱 Title	基於人臉辨識的視訊摘要系統 Video Summarization based on Face Recognition
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	106 學年度第 1 學期 The fall semester of Academic Year 106	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	61
研究生 Author	林瑞嚴 Reui-yan Lin
指導教授 Advisor	李宗南 Chung-Nan Lee
召集委員 Convenor	郭耀煌 Yau-Hwang Kuo
口試委員 Advisory Committee	柯秀民, 陳志成 Shiu-Ming Ko; Jyh-Cheng Chen
口試日期 Date of Exam	2017-10-29	繳交日期 Date of Submission	2017-12-10
關鍵字 Keywords	卷積神經網路、人臉校正、人臉辨識、人臉偵測、視訊摘要 Convolution Neural Network, Face Alignment, Face Recognition, Face Detection, Video Summarization
統計 Statistics	本論文已被瀏覽 5686 次，被下載 78 次 The thesis/dissertation has been browsed 5686 times, has been downloaded 78 times.

中文摘要
近年來資安意識逐漸抬頭，各大公司都不願見到自家的機密資料遭到其他公司盜竊，因此多數公司會設置出入口管控系統及監視攝影機，但是過度龐大的監視攝影機數量，導致事件發生時，需要花費大量人力與時間搜尋全體員工資料才有辦法找出嫌疑人資訊。有鑑於此，本篇論文提出一套基於人臉識別的視訊摘要系統，我們使用人臉偵測、人臉辨識搜尋目標人物出現的時間與地點，並使用視訊摘要記錄這些資訊並整理出供使用者快速瀏覽的摘要影片，除此之外，在偵測以及辨識方面，我們採用深度學習的方式對物件進行分類與辨識，根據實驗結果，我們的系統在人臉偵測得到81%的準確度，而在人臉辨識方面，分別測試LFW資料庫得到96.82%的準確度，在YTF資料庫得到99.16%的準確度，並且結合各模組的視訊摘要效果達到92.45%之辨識準確與98.98%的召回率。
Abstract
In recent year, the video surveillance for person identification has attracted increasing attention. Most of companies are reluctant to detect possible misbehavior from employee or intruder. They usually set up some control systems and surveillance cameras at entrance and exit. When the event of criminal had happened, one might need to spend huge human resource and a lot of time to identify the suspect information from the huge number of surveillance cameras. In view of above, this thesis presents a video summarization system based on face recognition. The system uses the face detection and recognition methods to find the time and the place where the target person appears. Moreover, the summarization method is used to record the information and organize out a summary video for quick view. To improve the accuracy of face detection and face recognition, we use the YOLO algorithm to process the face object detection and use the VGG-Face algorithm to recognize person object. Experimental results show that the proposed system has 81% accuracy in face detection. In face recognition, the proposed system has 96.82% accuracy in LFW dataset and has 99.16% accuracy in YTF dataset. Finally, the proposed system has the 92.45% precision rate and 98.98% recall rate in video summarization.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v 圖目錄 vii 表目錄 viii 第一章簡介 1 1.1 論文概述 1 1.2 論文貢獻 3 1.3 論文架構 4 第二章文獻探討 5 2.1 人臉偵測 (Face Detection) 5 2.2 人臉辨識 (Face Recognition) 12 2.3 人臉追蹤 (Face Tracking) 13 2.4 視訊摘要 (Video Summarization) 14 第三章研究方法 17 3.1. 人臉偵測 19 3.1.1. YOLO 20 3.1.2. 神經網路架構 21 3.1.3. 訓練資料庫 22 3.2. 人臉校正 22 3.2.1. PDM 23 3.2.2. LNF 24 3.3. 人臉辨識 25 3.3.1. 卷積神經網路 25 3.3.2. 神經網路架構 28 3.4. 人臉追蹤 30 第四章實驗結果 32 4.1. 人臉偵測 33 4.2. 人臉校正 35 4.3. 人臉辨識 37 4.4. 視訊摘要 40 4.5. 產品比較 44 4.5.1. 辨識準確度 44 4.5.2. 產品的應用與辨識結果呈現 45 4.5.3. 辨識速度 47 第五章結論 48 參考文獻 49

參考文獻 References
[1] A. Rajpurohit, A. Agarwal, M. Gaikwad, K. Garg and V. Inamdar, “Securing Public Places Using Intelligent Motion Detection, ”2012 IEEE International Conference on Engineering Education: Innovative Practices and Future Trends, pp. 01-04, 2012 [2] C. P. Papageorgiou, M. Oren and T. Poggio, “A General Framework for Object Detection, ”Sixth International Conference on Computer Vision, pp. 555-562, 1998 [3] E. Emami and M. Fathy, “Object Tracking Using Improved CAMShift Algorithm Combined with Motion Segmentation, ”2011 7th Iranian Conference on Machine Vision and Image Processing, pp. 1-4, 2011 [4] G. B. Huang, M. Ramesh, T. Berg and E. Learned-Miller, “Labeled Faces in The Wild: A Database for Studying Face Recognition in Unconstrained Environments, ”Technical Report UMass, pp. 07-49, 2007 [5] F. C. Cheng and S. J. Ruan, “Accurate Motion Detection Using a Self-Adaptive Background Matching Framework, ”IEEE Transactions on Intelligent Transportation Systems, vol. 13, pp. 671-679, 2012 [6] F. Schroff, D. Kalenichenko and J. Philbin, “FaceNet : A Unified Embedding for Face Recognition and Clustering, ”2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 815-823, 2015 [7] G. Guan, Z. Wang, K. Yu, S. Mei, M. He, and D. Feng, “Video Summarization with Global and Local Features, ”2012 IEEE International Conference on Multimedia and Expo Workshops, pp. 570-575, 2012 [8] G. Farneback, “Two-Frame Motion Estimation Based on Polynomial Expansion, ” Image Analysis Lecture Notes in Computer Science, vol. 2749, pp. 363-370, 2003 [9] J. R. R. Uijlings, K. E.A. van de Sande, T. Gevers and A. W. M. Smeulders, “Selective Search for Object Recognition, ”International Journal of Computer Vision, vol.104, pp. 154-171, 2013 [10] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look Once: Unified, Real-time Object Detection, ”arXiv preprint arXiv:1506.02640, vol. 5, 2015 [11] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks, ”Computer Vision and Pattern Recognition, pp. 818-833, 2014 [12] M. H. Yang, D. J. Kriegman and N. Ahuja, “Detecting Faces in Images: A Survey, ”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 34-58, 2002 [13] M. S. Aydemir, U. Ergul, A. Guclu and M. E. Karsligil, “Video Summarization Using Simple Action Patterns, ”International Conference on Pattern Recognition, pp. 2047-2050, 2012 [14] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection, ”IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, vol. 1, pp. 886-893, 2005 [15] O. M. Parkhi, A. Vedaldi and A. Zisserman, “Deep Face Recognition, ” British Machine Vision Conference, 2015 [16] P. Viola and M. Jones, “Rapid Object Detection Using a Boosted Cascade of Simple Features, ”IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition, vol. 1, pp.515-518, 2001 [17] R. Duda and P. Hart, “Pattern Classification and Scene Analysis, ”IEEE Transaction on Automatic Control, vol. 19, pp. 462-463, 1974 [18] R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, ”2014 IEEE Conference on Computer Vision and Pattern Recognition, pp 580-587, 2014 [19] R. Girshick, “Fast R-CNN, ”2015 IEEE International Conference on Computer Vision, pp 1440-1448, 2015 [20] R. Girshick, S.Ren, K. He and J. Sun, “Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks, ”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp.1137-1149, 2015 [21] S. Lu, I. King and M. R. Lyu, “Video Summarization by Video Structure Analysis and Graph Optimization, ”2004 IEEE International Conference on Multimedia and Expo, vol. 3, pp. 1959-1962, 2004 [22] S. Sangeetha and S. Deepa, “A Survey on Video Summarization Using Face Recognition Methods, ”International Journal of Advance Research in Computer Science and Management Studies Special Issue, 2014 [23] S. Yi, X. Wang and X. Tang, “Deep Learning Face Representation from Predicting 10,000 Classes, ”2014 IEEE Conference on Computer Vision and Pattern Recognition, pp.1891-1898, 2014 [24] S. Yi, Y. Chen and X. Wang, “Deep Learning Face Representation by Joint Indetification Verification, ”Advances in Neural Information Processing Systems, pp. 1988-1996, 2014 [25] S. Yi, X. Wang and X. Tang, “Deeply Learned Face Representations are Sparse, Selective, and Robust, ”arXiv preprint arXiv:1412.1265, 2014 [26] T. Baltrusaitis, L. P. Morency and P. Robinson, “Constrained Local Neural Fields for Robust Facial Landmark Detection in The Wild, ”IEEE International Conference on Computer Vision Workshops, pp. 354-361, 2013 [27] T. Ojala, M. Pietikäinen and D. Harwood, “A Comparative Study of Texture Measures with Classification based on Featured Distributions, ”Pattern Recognition, vol. 29, pp. 51-59, 1996 [28] T. Hu, X. Chen, X. Zhu and W. Guo, “A Vehicle Tracing Approach Based on Video and Road Network, ”International Symposium on Computational Intelligence and Design, vol. 2, pp. 376-380, 2012 [29] V. Jain and E. Learned-Miller, “FDDB: A Benchmark for Face Detection in Unconstrained Settings, ”Technical Report UMass, 2010 [30] V. S. Bhat, J.D. Pujari and Bhavana, “A Hybrid Skin Color Model for Face Detection, ”International Journal of Engineering Research and General Science, vol. 2.2, 2014 [31] Y. F. Ma and H. J. Zhang, “A Model of Motion Attention for Video Skimming, ”International Conference on Image Processing, vol. 1, pp. 129-132, 2002 [32] Y. Freund and R. E. Schapire, “A Decision Theoretic Generalization of On-Line Learning and an Application to Boosting, ”Journal of Computer and System Sciences, vol. 55.1, pp. 119-139, 1997 [33] Y. S. Lee, C. Y. Hsu, P. C. Lin, C. Y. Chen and J. C. Wang, “Video Summarization Based on Face Recognition and Speaker Verification, ”IEEE 10th Conference on Industrial Electronics and Applications, pp. 1821-1824, 2015 [34] Y. Taigman, M. Yang, M. Ranzato and L. Wolf, “Deep-Face: Closing The Gap to Human-level Performance in Face Verification, ”IEEE Conference on Computer Vision and Pattern Recognition, pp.1701-1708, 2014. [35] Y. Wang, T. Bao, C. Ding and M. Zhu, “Face Recognition in Real-world Surveillance Videos with Deep Learning Method, ”2017 2nd International Conference on Image, Vision and Computing, pp. 239-243, 2017

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-1110117-150128.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS