國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,迴歸模型具有時間序列干擾項的統計推論—逆自我共變異數矩陣估計與高維度選模,Inference for regression models with time series errors — Inverse autocovariance matrix estimation and high dimensional model selection

論文名稱 Title	迴歸模型具有時間序列干擾項的統計推論—逆自我共變異數矩陣估計與高維度選模 Inference for regression models with time series errors — Inverse autocovariance matrix estimation and high dimensional model selection
系所名稱 Department	應用數學系 Department of Applied Mathematics
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	英文 English
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	112
研究生 Author	邱海唐 Hai-Tang Chiou
指導教授 Advisor	銀慶剛, 郭美惠 Ching-Kang Ing; Meihui Guo
召集委員 Convenor	羅夢娜 Mong-Na Lo Huang
口試委員 Advisory Committee	黃士峰, 蔡恆修, 俞淑惠, 黃信誠 Shih-Feng Huang; Henghsiu Tsai; Shu-Hui Yu; Hsin-Cheng Huang
口試日期 Date of Exam	2017-07-03	繳交日期 Date of Submission	2017-07-17
關鍵字 Keywords	修正的喬萊斯基分解、長記憶過程、位置擴散模型、異質變異、正交貪婪演算法 modified Cholesky decomposition, long-memory processes, heteroscedasticity, location-dispersion model, orthogonal greedy algorithm
統計 Statistics	本論文已被瀏覽 5716 次，被下載 26 次 The thesis/dissertation has been browsed 5716 times, has been downloaded 26 times.

中文摘要
在實務應用，線性迴歸是一個常用來建立解釋變數與反應變數關係的方法。本論文探討具有時間序列干擾項的迴歸模型的統計推論，主要包含以下兩部分。論文的第一部分，我們探討長記憶過程的逆自我共變異數矩陣的估計問題。首先我們利用修正的喬萊斯基分解與一個遞增階次的自我迴歸模型，提出一個估計長記憶過程的逆自我共變異數矩陣的方法，並得證此逆矩陣估計量具有一致性。接著，我們考慮具有長記憶時間序列干擾項的線性迴歸模型。由於干擾項是無法觀測的，我們利用最小平方的殘差估計干擾項，進而估計其逆自我共變異數矩陣。我們證明此方法所獲得的逆矩陣估計量依然具有一致性，並應用到具有長記憶時間序列干擾項迴歸模型的係數估計。最後我們以模擬研究檢驗所推導的理論性質。本論文的第二部分，我們探討高維度稀疏迴歸模型的選模議題。當迴歸模型的干擾項為獨立且同分佈時，文獻中已有多種具有一致性的選模方法。然而，鮮少研究是探討迴歸模型的干擾項同時具有異質變異與時間相關性的選模推論。本研究主要的目標是對此類模型，提供具有一致性的選模方法。我們考慮一個具有時間序列干擾項(包括短記憶過程與長記憶過程)的高維度稀疏迴歸模型，此模型包含位置擴散模型。我們利用正交貪婪演算法來依序選取解釋變數，並使用高維度訊息準則移除無相關的解釋變數，以達到選模的一致性。我們利用模擬研究來測試所提出選模方法之可行性。在實證分析方面，我們將此選模方法應用到晶圓測試的資料，尋找造成晶圓品質不良的問題機台。
Abstract
Linear regression is a well-known method to establish relationship between responses and explanatory variables, and has been used extensively in practical applications. This dissertation consists of two parts focus on statistical inference for linear regression models with time series errors. The first part concerns the problem of estimating inverse autocovariance matrices of long-memory processes admitting a linear representation. A modified Cholesky decomposition and an increasing order autoregressive model are adopted to construct the inverse autocovariance matrix estimate. We show that the proposed estimate is consistent in spectral norm. We further extend the result to linear regression models with long-memory time series errors. In particular, the same approach still works well based on the estimated least squares errors when our goal is to consistently estimate the inverse autocovariance matrix of the error process. Applications of this result to estimating unknown parameters in the aforementioned regression model are also given. Simulation studies are performed to confirm the theoretical results. In the second study of this dissertation, we consider model selection in sparse high-dimensional regression. High-dimensional model selection with independent and identically distributed errors is a much studied problem. However, little attention has been focused on heteroscedasticity and time series errors. This work aims at providing a consistent model selection procedure for high-dimensional sparse regression models with time series errors. We propose a high-dimensional sparse regression model with short- or long- range dependent errors. Moreover, our proposed model includes the location-dispersion model. The first step in our model selection procedure is to sequentially select predictors via an orthogonal greedy algorithm (OGA). To achieve consistent selection, we use a high-dimensional information criterion (HDIC) to remove irrelevant predictors. Simulation studies are conducted to illustrate our theoretical findings. In addition, we apply the approach to wafer acceptance test (WAT) data, and investigate and identify problematic tools.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 1 Introduction 1 2 Estimation of inverse autocovariance matrices for long-memory processes 4 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Long-memory model and our proposed estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 2.1 Long-memory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Our proposed estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Bias analysis of banded Cholesky factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 3.2 Estimation analysis of banded Cholesky factors . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 4 Some extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1 The proposed estimate based on the least squares residuals . . . . . . . . . . . . . . . . . 17 4.2 The finite predictor coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 4.3 The rate of convergence of the FGLSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 5 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 5.1 Selection of the banding parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2 Finite sample performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 3 Model selection for high-dimensional sparse regression model with time series errors 30 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2 Model and methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.1 Regression model with time series errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3 Theoretical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1 Selection consistency for our proposed method . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Extension to location-dispersion model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 4 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 4.1 Parameter setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Simulation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 5 Real example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 4 Future work 49 References 53 Appendices 56 A Proofs in Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 B Proofs in Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72

參考文獻 References
Basu, S., Das, S., Michailidis, G. and Purnanandam, A. K. (2017). A system-wide approach to measure connectivity in the financial sector. Available at SSRN: https://ssrn.com/abstract=2816137 or http://dx.doi.org/10.2139/ssrn.2816137. Beran, J. (1994). Statistics for long-memory processes. New York: Chapman and Hall. Beran, J., Feng, Y., Ghosh, S. and Kulik, R. (2013). Long-memory processes: probabilistic properties and statistical methods. New York: Springer. Berk, K. N. (1974). Consistent autoregressive spectral estimates. The Annals of Statistics, 2, 489–502. Bickel, P. J. and Gel, Y. R. (2011). Banded regularization of autocovariance matrices in application to parameter estimation and forecasting of time series. Journal of the Royal Statistical Society, Series B, 73, 711–728. Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36, 199–227. Brockwell, P. J. and Davis, R. A. (1991). Time series: theory and methods, 2nd edition. New York: Springer. Caraux, G. and Gascuel, O. (1992). Bounds on distribution functions of order statistics for dependent variates. Statistics & Probability Letters, 14, 103–105. Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771. Cheng, T.-C. F., Ing, C.-K. and Yu, S.-H. (2015a). Inverse moment bounds for sample autocovariance matrices based on detrended time series and their applications. Linear Algebra and Its Applications, 473, 180–201. Cheng, T.-C. F., Ing, C.-K. and Yu, S.-H. (2015b). Toward optimal model averaging in regression models with time series errors. Journal of Econometrics, 189, 321–334. Daye, J., Chen, J. and Li, H. (2012). High-dimensional heteroscedastic regression with an application to eQTL data analysis. Biometrics, 68, 316–326. Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space (with discussion). Journal of the Royal Statistical Society, Series B, 70, 849–911. Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statistica Sinica, 20, 101–148. Findley, D. F. and Wei, C.-Z. (1993). Moment bounds for deriving time series CLT’s and model selection procedures. Statistica Sinica, 3, 453–480. Godet, F. (2010). Prediction of long memory processes on same-realisation. Journal of Statistical Planning and Inference, 140, 907–926. Ing, C.-K and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513. Ing, C.-K. and Wei, C.-Z. (2003). On same-realization prediction in an infinite-order autoregressive process. Journal of Multivariate Analysis, 85, 130–155. Ing, C.-K. and Wei, C.-Z. (2006). A maximal moment inequality for long range dependent time series with applications to estimation and model selection. Statistica Sinica, 16, 721–740. Inoue, A. and Kasahara, Y. (2006). Explicit representation of finite predictor coefficients and its applications. The Annals of Statistics, 34, 973–993. Javanmard, A. and Montanari, A. (2014). Confidence intervals and hypothesis testing for highdimensional regression. The Journal of Machine Learning Research, 15, 2869–2909. McMurry, T. L. and Politis, D. N. (2010). Banded and tapered estimates for autocovariance matrices and the linear process bootstrap. Journal of Time Series Analysis, 31, 471–482. Palma, W. and Pourahmadi, M. (2012). Banded regularization and prediction of longmemory time series. Working Paper. Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. The Annals of Statistics, 8, 147–164. Wei, C.-Z. (1987). Adaptive prediction by least squares predictors in stochastic regression models with application to time series. The Annals of Statistics, 15, 1667–1682. Wu, W. B., Michailidis, G. and Zhang, D. (2004). Simulating sample paths of linear fractional stable motion. IEEE Transactions on Information Theory, 50, 1086–1096. Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika, 90, 831–844. Wu, W. B. and Pourahmadi, M. (2009). Banding sample covariance matrices of stationary processes. Statistica Sinica, 19, 1755–1768. Xiao, H. and Wu, W. B. (2012). Covariance matrix estimation for stationary time series. The Annals of Statistics, 40, 466–493. Yajima, Y. (1988). On estimation of a regression model with long-memory stationary errors. The Annals of Statistics, 16, 791–807. Yajima, Y. (1991). Asymptotic properties of the LSE in a regression model with longmemory stationary errors. The Annals of Statistics, 19, 158–177. Yang, C.-Y. (2012). Estimation of linear regression models with serially correlated errors. Journal of Data Science, 10, 723–755. Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research, 7, 2541–2563. Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017). Network vector autoregression. The Annals of Statistics, to appear. Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0617117-113628.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS