Responsive image
博碩士論文 etd-0721117-114750 詳細資訊
Title page for etd-0721117-114750
論文名稱
Title
以機器學習分析與預測大數據時間序列資料
Analysis and prediction of time series big data by using machine learning
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
82
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-08-11
繳交日期
Date of Submission
2017-08-26
關鍵字
Keywords
R語言、Auto-Regressive Integrated Moving Average Model、Cross-Correlation、Time Series、RStudio
RStudio, R language, Auto-Regressive Integrated Moving Average Model, Cross-Correlation, Time Series
統計
Statistics
本論文已被瀏覽 5694 次,被下載 18
The thesis/dissertation has been browsed 5694 times, has been downloaded 18 times.
中文摘要
本研究為健保資料數據、金融股市和氣候環境之分析與預測,目前已建置之資料處理平台(Hadoop)與靜態資料(即歷史資料)儲存設計為基礎,進行疾病與相關因子結合以及股票市場去做分析與預測。
過去普遍研究都只能分析歷史資訊、模擬估計,卻忽略了時間前後的影響性,然而我們利用時間序列的特性(平穩性、趨勢性、季節性),並與高相關係數因子做結合,提出一個可分析性與預測的模型,針對不同情況例如:疾病、氣候或地區等等外在因素,設計模型與檢驗準確度並預測未來走勢。
首先我們會尋找相關因素來探討初步的特徵,並找尋相關因子與研究目標的關聯性,接著檢驗目標時間序列是否符合模型要求,在依據ACF和PACF去找尋一個最佳組合,並用於ARIMA(Auto-Regressive Integrated Moving Average Model),分別做回歸修正,最後再檢驗模型可靠性,以應用於後續的預測與分析。研究中,我們也將呈現系統各種不同修正模型情況並提供修正後的預測結果。
Abstract
The study is the analysis and prediction of the health insurance data , financial stock market and climate. Based on the data platform (Hadoop) and the storage of static data (i.e. historical data), through the thesis, we analyzed and tried to predict the correlation among disease, related factors and stock market.
In the past, studies focused on historical information and the estimation of stimulation, but ignored the influence of time. That is to say, if we concentrate on the characteristics of time series (smoothness, trend, seasonal), with high correlation coefficient factor, we can issue a model which is predictable and analyzes properly. According to different situations, such as diseases, climate, region and some other external factors, such a model can check its accuracy and, therefore, predict future trend.
First of all, we confirm the relevant factors to explore the initial characteristics, finding the correlation. Secondly, we test and check whether or not the target time series meet the model requirements, and find out the best combination, which is based on ACF and PACF. The result will be used on ARIMA (Auto-Regressive Integrated Moving Average Model) to do the regression correction respectively.
Last but not least, we test the reliability of the model again. In the thesis, we will also present a variety of different correction model cases and show revised predictions.
目次 Table of Contents
論文審定書 i
誌謝 ii
摘要 iii
Abstract iv
目錄 vi
圖次 viii
表次 x
第一章 序論 1
1.1 研究動機和目的 1
1.2 論文架構 2
第二章 研究背景 3
2.1 相關研究 3
2.2 R language以及Rstudio介紹 4
2.2.1 R language 4
2.2.2 RStudio 4
2.3 資料介紹 5
第三章 研究方法 6
3.1 Cross-Correlation Function 6
3.2 Augmented Dickey-Fuller Test 8
3.3 ARIMA Model 9
3.4 Seasonal ARIMA Model 12
3.5 Time series Models 13
3.5.1 SMA Model and EMA Model 13
3.5.2 VAR Model and TAR Model 15
3.5.3 RNN Model 17
3.6 Ljung-Box 檢定 17
3.7 預測效能評估 19
第四章 預測模型操作 20
4.1 實驗介紹 20
4.2 實驗運作流程 21
4.3 實驗設計一 28
4.4 實驗設計二 34
4.5 實驗設計三 46
第五章 結論以及未來展望 50
參考文獻 51
附錄 54
參考文獻 References
[1] Nochai, R. and T. Nochai. 2006. ARIMA model for forecasting oil palm price. Pro. 2nd IMT-GT Regional Con. Mathematics, Statistics and Applications, Uni. Sains, Penang, Malaysia.
[2] Bandyopadhyay, G. (2016). Gold Price Forecasting Using ARIMA Model. Journal of Advanced Management Science, pp.117-121.
[3] Kalid Yunus, Torbjörn Thiringer, Peiyuan Chen, ARIMA-Based Frequency-Decomposed Modeling of Wind Speed Time Series. 2016 IEEE.
[4] Rich Caruana, Alexandru Niculescu-Mizil. ICML '06 Proceedings of the 23rd international conference on Machine learning , Pages 161-168 .
[5] Rich Caruana, Nikos Karampatziakis, Ainur Yessenalina. ICML '08 Proceedings of the 25t hinternational conference on Machine learning , Pages 96-103.
[6] “R: The R Project for Statistical Computing”, https://www.r-project.org/ , 2017/03/20.
[7] “R language”, https://zh.wikipedia.org/wiki/R语言 2017/03/20.
[8] “Introduction-to-R”, https://blog.gtwang.org/r/introduction-to-r-language/ 2017/3/20.
[9] “RStudio”, https://www.rstudio.com/ , 2017/03/20.
[10] “Apache Hadoop”, http://hadoop.apache.org/ 2017/03/20.
[11] “HDFS”, https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html 2017/03/20.
[12] “Cross Correlation Functions”, https://onlinecourses.science.psu.edu/stat510/node/74 , 2017/03/20.
[13] “Stationarity and differencing”, https://www.otexts.org/fpp/8/1 ,2017/03/20.
[14] Tsay, R. (2013). Analysis of financial time series. 1st ed. Hoboken, N.J.: Wiley, pp.24-96.
[15] Adebiyi, A., Adewumi, A. and Ayo, C. (2014). Comparison of ARIMA and Artificial Neural Networks Models for Stock Price Prediction. Journal of Applied Mathematics, 2014, pp.1-7.
[16] SHUMWAY, R. (2017). TIME SERIES ANALYSIS AND ITS APPLICATIONS. 3rd ed. [S.l.]: SPRINGER ,pp.17-22.
[17] Hans Pratyaksa, Adhistya Erna Permanasari, Silmi Fauziati, Ida Fitriana. ARIMA Implementation to Predict the Amount of Antiseptic Medicine Usage in Veterinary Hospital,2016 IEEE.
[18] Williams, B. and Hoel, L. (2003). Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. Journal of Transportation Engineering, 129(6), pp.664-672.
[19] Adhistya Erna Permanasari, Indriana Hidayah, Isna Alfi Bustoni. SARIMA ( Seasonal ARIMA) Implementation on Time Series to Forecast The Number of Malaria Incidence.2013 IEEE.
[20] “Moving average”, https://en.wikipedia.org/wiki/Moving_average.2017/03/20.
[21] A. Ian McLeod, Hao Yu, Esam Mahdi. Time Series Analysis with R. pp.41-42.
[22] S. R. Huang. Short-term load forecasting using threshold autoregressive models.1997 IEEE.
[23] “RNN”, https://en.wikipedia.org/wiki/Recurrent_neural_network .2017/03/20.
[24] “Ljung-Box Test”, https://en.wikipedia.org/wiki/Ljung–Box_test. 2017/03/20.
[25] A. Geetha, G. M. Nasira. Time series modeling and forecastingtropical cyclone prediction using arima model. 2016 IEEE.
[26] Ting Zhu, Li Luo, Xinli Zhang, Yingkang Shi, and Wenwu Shen . Time-Series Approaches for Forecasting the Number of Hospital Daily Discharged Inpatients .2015 IEEE.
[27] Lee, H., Tsai, S. and Lin, H. (2007). Seasonal variations in bipolar disorder admissions and the association with climate: A population-based study. Journal of Affective Disorders, 97(1-3), pp.61-69.
[28] Medici, C., Vestergaard, C., Hadzi-Pavlovic, D., Munk-Jørgensen, P. and Parker, G. (2016). Seasonal variations in hospital admissions for mania: Examining for associations with weather variables over time. Journal of Affective Disorders, 205, pp.81-86.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code