Responsive image
博碩士論文 etd-1130114-154343 詳細資訊
Title page for etd-1130114-154343
論文名稱
Title
一個新的方法用以處理包含遺失資料的時間序列
A Novel Forecasting Method for Time Series Data with Missing Values
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
62
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-12-10
繳交日期
Date of Submission
2014-12-30
關鍵字
Keywords
時間序列預測、遺失值、局域時間指標 (LTI)、時間資訊、最小平方支持向量機 (LSSVM)
Time series prediction, missing values, local time index (LTI), temporal information, least squares support vector machine (LSSVM)
統計
Statistics
本論文已被瀏覽 5693 次,被下載 94
The thesis/dissertation has been browsed 5693 times, has been downloaded 94 times.
中文摘要
近年來,時間序列的預測在不同的領域裡都得到不少的關注,例如:天氣預測、控制理論、財經分析及工業製程監控等。但處理大部分現實世界的資料時,可能會由於感應裝置的故障,或是其他人為的疏失,使得我們得到的資料或多或少都有漏失。傳統的做法是簡單的將遺失的資訊忽略,或是用其他插補的方法將漏失的值補上。然而忽略掉遺失值會導致時間序列在時間連續性上受到破壞,這對我們的分析及預測並不有利。而插補的方法則對原始的時間序列進行更動,若插補的方式是基於某種特定預測的機制,則時間序列預測的結果多半會受到此插補機制的好壞來決定。在本研究中,我們以最小平方支持向量機 (LSSVM) 的架構為基礎,來開發出解決此問題的方法。我們定義局域時間指標來表示時序性的資訊,並加入於我們預測系統的輸入中。如此一來,核函數(kernel function)不僅計算樣本之間的數值資訊,還計算樣本間的時序的關係。我們對於不同的資料集進行實驗,並比較其他插補的方式。我們的方法有所長處亦有所限制,但總體表現十分優異,值得對此做更深入的研究。
Abstract
Time series prediction has become more popular in various kinds of applications such as heather prediction, control engineering, financial analysis, industrial monitoring and etc. To deal with real-world problems, we are often faced with missing values in the data due to sensor malfunctions or human errors. Traditionally, the missing values are simply omitted or replaced by means of imputation methods. However, omitting those missing values may cause temporal discontinuity which is not favorable in time series prediction. Imputation methods on the other hand alter the original time series. If they are based on certain estimation mechanism, they may affect the forecasting performance due to the nature of its estimation. In this study, we propose a novel forecasting method based on least squares support vector machine (LSSVM). We employ the input patterns with the temporal information which is defined as local time index (LTI). Therefore, the kernel function considers not only the values of the input samples, but also the temporal information. We compare the forecasting performance of our method with other imputation methods. The proposed method have its cons and pros and it is very promising for further investigations.
目次 Table of Contents
誌謝 iii
摘要 iv
Abstract v
圖目錄 ix
表目錄 x
第一章 簡介 1
1.1 研究背景 1
1.2 問題定義 4
1.3 研究目的 5
1.4 論文架構 5
第二章 文獻探討 7
2.1 時間序列問題 7
2.2 遺失值插補方法 8
2.2.1 成列式刪除法 9
2.2.2 均値插補 9
2.2.3 鄰近法 10
2.2.4 線性插值 10
2.2.5 自迴歸最小平方法 11
2.2.6 簡述多重插補及最大似然估計 12
2.3 最小平方支持向量機 13
2.4 最小平方支持向量機的信賴區間運算 15
2.5 梯度強化的最小平方支持向量機 17
第三章 研究方法 20
3.1 研究動機 20
3.2 我們的方法 21
3.2.1 數值優先取法(backward) 23
3.2.2 時序優先取法(forward) 23
3.4 演算法流程 25
3.4.1 LTI-b演算法 26
3.4.2 LTI-f演算法 27
3.5 實際範例 28
3.5.1 單變數範例 28
3.5.2 多變數範例 29
第四章 實驗結果與分析 31
4.1 實驗一:函數產生的時間序列預測 32
4.1.1 Sine function資料集 32
4.1.2 Sinc function資料集 34
4.1.3 Mackey-Glass Chaotic Time Series資料集 34
4.2 實驗二:真實世界資料集預測 36
4.2.1 Laser資料集 36
4.2.2 Poland electricity load資料集 36
4.2.3 Sunspot資料集 37
4.2.4 Jenkins-Box資料集 37
4.2.5 EUNITE競賽資料集 39
4.3 實驗三:預測結果評估 40
4.3.1 LTI對預測模型的影響 40
4.3.3 預測模型的信賴區間 42
第五章 結論與未來展望 45
5.1 結論 45
5.2 未來研究方向 45
參考文獻 46
參考文獻 References
[1] G.E.P. Box and G. Jenkins, “Time Series Analysis, Forecasting and Control,” Holden-Day, San Francisco, CA, 1970.
[2] M. Baxter and R. G. King, “Measuring Business Cycles: approximate band-pass filters for economic time series,” The Review of Economics and Statistics, Vol. 81, No. 4, pp. 575-593, 1999.
[3] R. J. Hodrick and E. C. Prescott, “Postwar U.S. Business Cycles: An Empirical Investigation,” Journal of Money, Credit and Banking, Vol. 29, No. 1, pp. 1-16, Feb., 1997.
[4] M. O. Ravn and H. Uhlig, “On Adjusting the Hodrick-Prescott Filter for the Frequency of Observations,” The Review of Economics and Statistics, Vol. 84, No. 2, pp. 371-376, Mar., 2002.
[5] S. Zhou, K. K. Lai and J. Yen, “A Dynamic Meta-Learning Rate-Based Model for Gold Market Forecasting,” Expert Systems with Applications, Vol. 39, No. 1, pp. 6168-6173, Nov., 2012.
[6] L. Zhang, N. Liu and P. Yu, “A Novel Instrantaneous Freqency Algorithm and Its Application in Stock Index Movement Prediction,” IEEE Journal of Selected Topic in Signal Processing, Vol. 6, No. 4, pp. 311-318, Aug., 2012.
[7] V. Zarnowitz and A. Ozyildirim, “Time series decomposition and measurement of business cycles, trend and growth cycles,” Journal of Monetary Economics, pp. 1717-1739, May, 2006.
[8] M. Paliwal and U. A. Kumar, “Neural networks and statistical techniques: A review of applications,” Expert Systems with Applications, Vol. 36, No. 1, pp. 2-17, Jan., 2009.
[9] J. P. Donate, G. G. Sanchez and A. S. de Miguel, “Time Series Forecasting. A Comparative Study Between an Evolving Artificial Neural Networks System and Statistical Methods,” International Journal on Artificial Intelligence Tools, 21, 1, 1250010, 2012.
[10] G. Makridou, G. S. Atsalakis, C. Zopounidis and K. Andriosopoulos, “Gold price forecasting with a neuro-fuzzy-based inference system,” International Journal of Financial Engineering and Risk Management, Vol. 1, No. 1, 2013.
[11] E. Alpaydm, “Introduction to Machine Learning,” Cambridge, Massachusetts: MIT press, 2004.
[12] K. Hornik, M. Stinchcombe and H. White, “Universal Approximation of an Unknown Mapping and Its Derivatives Using Multilayer Feedfoward Networks,” Neural Networks, Vol. 3, pp. 551-560, 1990.
[13] J. –S R. Jang,“ANFIS: Adaptive-network-based fuzzy inference systems,”IEEE Transactions on System, Man, and Cybernetics, 23(3):665-685, 1993.
[14] J. -S. R. Jang, C. -T. Sun and E. Mizutani, “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence,” Upper Saddle River, N.J.: Prentice Hall, 1997.
[15] Z. Yun, Z. Quan, S. Caixin, L. Shaolan, L. Yuming and S. Yang,“RBF Neural Network and ANFIS-Based Short-Term Load Forecasting Approach in Real-Time Price Environment,” IEEE Transactions on Power Systems, Vol. 23, No. 3, Aug., 2008.
[16] P. Melin, J. Soto, O. Castillo and J. Soria,“A new approach for time series prediction using ensembles of ANFIS models,” Expert System with Applications, Vol. 39, No. 3, Feb., 2012.
[17] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, “Least Squares Support Vector Machines,” World Scientific Pub. Co., Singapore, 2002.
[18] J. A. K. Suykens, J. De Brabanter, L. Lukas and J. Vandewalle, “Weighted least squares support vector machines: robustness and sparse approximation,” Neurocomputing, 48, 85-105, 2002.
[19] Harris Drucker, Christopher J. C. Burges, Linda Kaufman, Alexander J. Smola and Vladimir N. Vapnik, “Support Vector Regression Machines,” Advances in Neural Information Processing Systems 9, NIPS 1996, 155–161, MIT Press.
[20] P. D. Allison, “Missing Data,” Thousand Oaks, CA: Sage Publications, 2001.
[21] A. N. Baraldi and C. K. Enders, “An introduction to modern missing data analysis,” Journal of School Psychology, 48, 5-37, 2010.
[22] R. J. A. Little and D. B. Rubin, “Statistical analysis with missing data 2nd Ed.,” Hoboken, NJ: Wiley, 2002.
[23] J. L. Peugh and C. K. Enders, “Missing data in educational research: A review of reporting practices and suggestions for improvement,” Review of Educational Research, 74, 525-556, 2004.
[24] T. E. Bodner, “Missing data: Prevalence and reporting practices,” Psychological Reports, 99, 675-680, 2006.
[25] A. M. Wood, I. R. White and S. G. Thompson, “Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals,” Clinical Trials Review, 1, 368-376, 2004.
[26] D. B. Rubin, “Multiple imputation for nonresponse in surveys,” Hoboken, NY: Wiley, 1987.
[27] D. B. Rubin, “Multiple imputation after 18+ years,” Journal of the American Statistical Association.
[28] World-wide competition within the EUNITE network:
http://neuron-ai.tuke.sk/competition/
[29] C. J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, 2, 121-167, 1998.
[30] R. Fletcher, “Practical Methods of Optimization, 2nd Ed.,” John Wiley and Sons, Inc., 1987.
[31] J. A. K. Suykens and J. Vandewalle, “Least Squares Support Vector Machine Classifiers,” Neural Processing Letters, 9, 293-300, 1999.
[32] S. Stridevi, Dr. S. Rajaram, C. SibiArasan and C. Swadhikar, “Imputation for the Analysis of Missing Values and Prediction of Time Series Data,” IEEE International Conference on Recent Trends in Information Technology, 2011.
[33] Poland Electricity Load Dataset website:
http://research.ics.aalto.fi/eiml/datasets.shtml
[34] Sunspot Dataset website:
http://sidc.oma.be/sunspot-data
[35] Jenkins-Box Gas Furnace Dataset website:
http://www.stat.wisc.edu/~reinsel/bjr-data/
[36] George E.P. Box and Gwilym M. Jenkins, “Time Series Analysis: Forecasting and Control,” Hoboken, N.J.: Wiley, 2008.
[37] M. Mackey, L. Glass,“Oscillation and chaos in physiological control systems,” Science, Vol. 197, pp. 287-289 2008.
[38] K. De Brabanter, J. De Brabanter, J. A.K. Suykens, and B. De Moor, “Approximate Confidence and Prediction Intervals for Least Squares Support Vector Regression,” IEEE Transactions on Neural Networks, vol. 22, issue 1, Jan. 2011.
[39] X. Zhou, Y. Ma, L. Liu, and J. Wang, “Gradient-enhanced Least Squares Support Vector Regression,” Journal of Nanjing University of Science and Technology (Natural Science), vol. 35, no. 1, Feb. 2011.
[40] Laser Dataset website:
http://www-psych.stanford.edu/andreas/Time-Series/SantaFe.html
[41] T. Do, V. Nguyen and F. Poulet, “Speed Up SVM Algorithm for Massive Classification Tasks,” Advanced Data Mining and Applications, pp. 147-157, 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code