Responsive image
博碩士論文 etd-0127111-184625 詳細資訊
Title page for etd-0127111-184625
論文名稱
Title
二元反應變數非線性模型下有關統計推論之最適設計研究
Optimal designs for statistical inferences in nonlinear models with bivariate response variables
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
129
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2010-12-29
繳交日期
Date of Submission
2011-01-27
關鍵字
Keywords
單體離勢模型、序列程序、最適設計、Clayton關聯模型、二維二元反應數據、有理數近似、比例數據
sequential procedure, simplex dispersion model, rational approximation, optimal design, Bivariate binary data, proportional data, Clayton copula model
統計
Statistics
本論文已被瀏覽 5715 次,被下載 998
The thesis/dissertation has been browsed 5715 times, has been downloaded 998 times.
中文摘要
在實務應用上,從單一實驗單位可獲取具有相關性的二維或多維數據之樣本。當實驗者關心的是兩個具有相關性的反應變數之表現,如成對器官或兩種慢性病等問題的探討時,此類觀察有時可能無法正確獲知其真正的失效時間,因此數據型態可用一觀測點$x$和代表失效時間在觀測點是否發生之指標函數來描述。此種資料即稱為二維二元反應數據(bivariate binary data)。在此研究中,二維二元反應資料以{x, δ1=I(X1≤ x), δ2=I(X2≤ x)}表之,並探討在此種資料型態下關於參數估計之最適設計的問題。

對於數據中包含解釋變數之多維反應變數,其個別之邊際分佈可能來自不同的分佈假設,因此可以關聯模型來建構此類多維反應資料之聯合分佈,且關聯模型亦可用來描述成對反應變數之間的相關性。由於關聯模型之彈性,在實務上經常被使用來建構二維二元反應資料之模型。在本論文中,假設反應變數之邊際分佈為指數或韋伯分佈,且聯合分佈分別為獨立或具有相關性之假設。若二維二元數據具有相關性,則以Clayton關聯模型作為聯合分佈。

在探討關聯模型下之二維二元反應數據之參數估計時,利用D-最適設計可決定最佳之觀測點。而Ds-最適設計主要用來估計Clayton模型中重要的相關性參數。透過一般化的等價定理及數值演算,可獲得在關聯模型下之 D-及Ds-最適設計。從數值結果可知在不同之模型假設下,D-最適設計之支撐點數最多等於其模型的參數個數。當邊際分佈之差異及相關性皆為顯著時,相關性成為一重要影響因素且支撐點數會變多。

依據最適設計之結果的模擬研究中,所獲得之相關的參數估計值的表現皆不錯。在存活實驗中,實驗者常考慮在數個特定點做試驗,如分佈之25、50或75百分位數。因此我們考慮在特定的三或四個實驗點做試驗時,探討其設計之效率(efficiency)表現。雖然實務上常在數個百分位數上做試驗,然而樣本數的分配比例會影響效率表現的好壞。

對於局部最適設計(locally optimal design),常需要事先對模型或參數有相當程度的了解。當對模型或參數的先前資訊不夠充分了解時,利用序列實驗,透過數個階段來獲得資訊是較為有穩健的作法。因此在穩定性的考量下,在此提出ㄧ序列程序(sequential procedure)的作法,此程序結合獨立或具有相關性分佈下之 D-及Ds-最適設計並將其納入不同階段的實驗過程中。將序列程序之模擬結果與單一步驟程序下之結果互相比較,當最適設計之先驗的參數或分佈之資訊不正確時,單一步驟程序之效率表現不甚理想。從序列程序所獲得之參數估計值的樣本平均及相關的最適設計,皆靠近真實值且相關的效率亦接近1。

在Huster (1989)之論文中,分析成對存活資料的模型配適之相關問題,並針對美國國家眼睛研究院對糖尿病視網膜病變(Diabetic Retinopathy Study)所收集之數據,作一深入之探討。Huster (1989)所考慮之邊際分佈分別為指數或韋伯分佈,且聯合分佈為Clayton關聯模型。糖尿病視網膜病變之研究,是用來評估雷射激光延緩糖尿病視網膜病變之病人失明的程度,可將此實驗視為先驗資訊,並且提供實驗者作為未來相關實驗的參考指標。依據糖尿病視網膜病變之研究數據,我們給出用來收集適當資料和提供估計模型未知參數之相關的最適設計結果。

此研究第二部份主要探討比例型反應值數據之參數估計的最適設計相關問題。在此類數據類型,可考慮Jorgensen (1997)所提出名為離勢模型(dispersion model)之非線性模型。此模型可應用在二元、可數或比例型的反應值數據。對於其反應值介於(0,1)區間之連續比例數據,在此考慮的模型為單體離勢模型(simplex dispersion odel)。透過一般化的等價定理及數值運算,可獲得在此模型下之D-最適設計。在古典最適設計理論中,加權多項式迴歸模型(weighted polynomial regression model)扮演一重要腳色。對於單體離勢模型之最適設計的建構問題,可視為是一特殊的變異數函數之加權多項式迴歸模型。在單體離勢模型中,其資訊矩陣(information matrix)中之加權函數(weighted function)為一有理函數之型式。由於此加權函數較為複雜,本論文因此利用近似方法,給出一較為簡化之近似函數。在不同參數下,可獲得加權函數之近似函數及在此近似函數假設下之最適設計結果,並與在原始加權函數下之最適設計結果互相比較。
Abstract
Bivariate or multivariate correlated data may be collected on a sample of unit in many applications. When the experimenters concern about the failure times of two related subjects for example paired organs or two chronic diseases, the bivariate binary data is often acquired. This type of data consists of a observation point x and indicators which represent whether the failure times happened before or after the observation point. In this work, the observed bivariate data can be written with the following form {x, δ1=I(X1≤ x), δ2=I(X2≤ x)}.The corresponding optimal design problems for parameter estimation under this type of bivariate data are discussed.

For this kind of the multivariate responses with explanatory variables, their marginal distributions may be from different distributions. Copula model is a way to formulate the relationship of these responses, and the association between pairs of responses. Copula models for bivariate binary data are considered useful in practice due to its flexibility. In this dissertation for bivariate binary data, the marginal functions are assumed from exponential or Weibull distributions and two assumptions, independent or correlated, about the joint function between variables are considered. When the bivariate binary data is assumed correlated, the Clayton copula model is used as the joint cumulative distribution function.

There are few works addressed the optimal design problems for bivariate binary data with copula models. The D-optimal designs aim at minimizing the volume of the confidence ellipsoid for estimating unknown parameters including the association parameter in bivariate copula models. They are used to determine the best observation points. Moreover, the Ds-optimal designs are mainly used for estimation of the important association parameter in Clayton model.

The D- and Ds-optimal designs for the above copula model are found through the general equivalence theorem with numerical algorithm. Under different model assumptions, it is observed that the number of support points for D-optimal designs is at most as the number of model parameters for the numerical results. When the difference between the marginal distributions and the association are significant, the association becomes an influential factor which makes the number of supports gets larger.

The performances of estimation based on optimal designs are reasonably well by simulation studies. In survival experiments, the experimenter customarily takes trials at some specific points such as the position of the 25, 50 and 75 percentile of distributions. Hence, we consider the design efficiencies when the design points for trials are at three or four particular percentiles. Although it is common in practice to take trials at several quantile positions, the allocations of the proportion of sample size also have great influence on the experimental results.

To use a locally optimal design in practice, the prior information for models or parameters are needed. In case there is not enough prior knowledge about the models or parameters, it would be more flexible to use sequential experiments to obtain information in several stages. Hence with robustness consideration, a sequential procedure is proposed by combining D- and Ds-optimal designs under independent or correlated distribution in different stages of the experiment. The simulation results based on the sequential procedure are compared with those by the one step procedures. When the optimal designs obtained from an incorrect prior parameter values or distributions, those results may have poor efficiencies. The sample mean of estimators and corresponding optimal designs obtained from sequential procedure are close to the true values and the corresponding efficiencies are close to 1.

Huster (1989) analyzed the corresponding modeling problems for the paired survival data and applied to the Diabetic Retinopathy Study. Huster (1989) considered the exponential and Weibull distributions as possible marginal distributions and the Clayton model as the joint function for the Diabetic Retinopathy data. This data was conducted by the National Eye Institute to assess the effectiveness of laser photocoagulation in delaying the onset of blindness in patients with diabetic retinopathy. This study can be viewed as a prior experiment and provide the experimenter some useful guidelines for collecting data in future studies. As an application with Diabetic Retinopathy Study, we develop optimal designs to collect suitable data and information for estimating the unknown model parameters.

In the second part of this work, the optimal design problems for parameter estimations are considered for the type of proportional data. The nonlinear model, based on Jorgensen (1997) and named the dispersion model, provides a flexible class of non-normal distributions and is considered in this research. It can be applied in binary or count responses, as well as proportional outcomes. For continuous proportional data where responses are confined within the interval (0,1), the simplex dispersion model is considered here. D-optimal designs obtained through the corresponding equivalence theorem and the numerical results are presented. In the development of classical optimal design theory, weighted polynomial regression models with variance functions which depend on the explanatory variable have played an important role. The problem of constructing locally D-optimal designs for simplex dispersion model can be viewed as a weighted polynomial regression model with specific variance function. Due to the complex form of the weight function in the information matrix is considered as a rational function, an approximation of the weight function and the corresponding optimal designs are obtained with different parameters. These optimal designs are compared with those using the original weight function.
目次 Table of Contents
論文審定書
中文摘要 i
Abstract iii
1 Introduction 1
1.1 Literature reviews 3
2 Preliminary 7
2.1 Copula models 7
2.2 Dispersion models 10
2.3 Score function and Fisher information matrix 11
2.4 Algorithms 14
3 Optimal designs for bivariate binary data 17
3.1 Introduction 17
3.1.1 General information matrix for bivariate binary data 19
3.2 Optimal designs for bivariate binary data 21
3.2.1 Under independence assumption 22
3.2.2 Under association assumption 26
3.2.3 Ds-optimal designs for association parameter 27
3.3 Simulation study 28
3.3.1 Generating the bivariate binary data based on a copula model 28
3.3.2 Simulation studies about model robustness of D-optimal designs 29
3.3.3 Efficiency study under D-optimal designs 32
3.3.4 Sequential procedure 33
3.4 Case study of diabetic retinopathy 37
4 Optimal designs for continuous percentage data 41
4.1 Introduction 41
4.1.1 Information matrix for simplex dispersion model 42
4.2 Optimal designs for linear logit link function with two parameters 44
4.2.1 Numerical results for D-optimal designs 45
4.2.2 Rational approximation for D-optimal designs 53
5 Conclusion and discussion 57
Reference 64
A Summary of D-optimal designs with independence assumption 65
B Summary of optimal designs under association assumption 69
C Simulation results 75
C.1 Tables: under independent assumption 76
C.2 Tables: under association assumption 79
C.3 Tables: Ds-criterion for association parameter 83
C.4 Tables: efficiency studies 84
C.5 Tables: robustness study 86
D Figures 89
D.1 Figures: under independent assumption 89
D.2 Figures: under association assumption 89
D.3 Figures: Ds-optimal designs for association parameter 91
D.4 Figures: efficiency study 91
D.5 Figures: simulation study 91
D.6 Figures: rational approximation in SDM 92
參考文獻 References
Abdelbasit, K. M. and Plackett, R. L. (1983). Experimental design for binary data. Journal of the American Statistical Association, 78:90–98.
Antille, G., Dette, H., and Weinberg, A. (2003). A note on optimal designs in weighted polynomial regression for the classical efficiency functions. Journal of Statistical Planning and Inference, 113:285–292.
Atkinson, A. C. and Donev, A. N. (1989). Optimum Experimental Designs. Oxford University Press, USA.
Biedermann, S., Dette, H., and Zhu, W. (2006). Geometric construction of optimal designs for dose-response models with two parameters. Journal of the American Statistical Association, 101:747–759.
Box, G. and Lucas, H. (1959). Design of experiments in non-linear situations. Biometrika, 46:77–90.
Chang, F. C. (2005). D-optimal designs for weighted polynomial regression-a functionalalgebraic approach. Statistica Sinica, 15:154–163.
Chang, F. C. and Lin, G. C. (1997). D-optimal designs for weighted polynomial regression. Journal of Statistical Planning and Inference, 62:317–331.
Chernoff, H. (1972). Sequential Analysis and Optimal Design. Society for Industrial Mathematics.
Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65:141–151.
Cook, R. D. and Nachtsheim, C. J. (1980). A comparison of algorithms for constructing exact D-optimal designs. Technometrics, 22:315–324.
Curtis, A. and Osborne, M. R. (1966). The construction of minimax rational approximations to functions. The Computer Journal, 9:286–293.
Dette, H. and Haines, L. M. (1994). E-optimal designs for linear and nonlinear models with two parameters. Biometrika, 81:739–754.
Dette, H., Haines, L. M., and Imhof, L. (1999). Optimal designs for rational models and weighted polynomial regression. Annals of Statistics, 27:1272–12193.
Dette, H. and Melas, V. (2003). Optimal designs for estimating individual coefficients in fourier regression models. Annals of Statistics, 131:1669–1692.
Dette, H., Melas, V., and Pepelyshev, A. (2002). D-optimal designs for trigonometric regression models on a partial circle. Annals of the Institute of Statistical Mathematics, 54:945–959.
Dette, H., Melas, V., and Pepelyshev, A. (2004). Optimal designs for estimating individual coefficients in polynomial regression–a functional approach. Journal of Statistical Planning and Inference, 118:201–209.
Dette, H. and Melas, V. B. (2002). E-optimal designs in fourier regression models on a partial circle. Mathematical Methods of Statistics, 11:259–296.
Dette, H. and Trampisch, M. (2010). A general approach to D-optimal designs for weighted univariate polynomial regression models. Journal of the Korean Statistical Society, 39:1–26.
Draper, N. R. and Hunter, W. G. (1966). Design of eperiments for parameter estimation in multireponse situations. Biometrika, 53:525–533.
Elfving, G. (1952). Optimum allocation in linear regression theory. The Annals of Mathematical Statistics, 23:255–262.
Fedorov, V. V. (1972). Theory of Optimal Experiments. Academic Press, New York.
Ford, I. (1992). The use of a canonical form in the construction of locally optimal designs for non-linear problems. Journal of the Royal Statistical Society, Series B, 54:569–583.
Ford, I., Titterington, D. M., and Kitsos, C. P. (1989). Recent advances in nonlinear
experimental design. Technometrics, 31:49–60.
Fornius, E. F. and Nyquist, H. (2009). Using the canonical design space to obtain coptimal designs for the quadratic logistic model. Communications in Statistics-Theory and Methods, 39:144–157.
Genest, C., Ghoudi, K., and Rivest, L. P. (1995). A semiparametric estimation procedure for dependence parameters in multivariate families of distributions. Biometrika, 82:543–552.
Genest, C. and MacKay, R. J. (1986a). Archimedean copulas and bivarate families with continuous marginals. The Canadian Journal of Statistics, 14:145–159.
Genest, C. and MacKay, R. J. (1986b). The joy of copulas: bivariate distributions with uniform marginals. American Statistician, 40:280–283.
Giovagnoli, A., Pukelsheim, F., and Wynn, H. (1987). Group invariant orderings and experimental designs. Journal of Statistical Planning and Inferenece, 17:159–171.
Haines, L. M. (1987). The application of the annealing algorithm to the construction of exact optimal designs for linear-regression models. Technometrics, 29:439–447.
Hedayat, A. S., Zhong, J., and Nie, L. (2004). Optimal and efficient designs for 2-parameter nonlinear models. Journal of Statistical Planning and Inference, 124:205–217.
Heise, M. A. and Myers, R. H. (1996). Optimal designs for bivariate logistic regression. Biometrics, 52:613–624.
Huang, M.-N. L., Chang, F. C., and Wong, W. K. (1995). D-optimal designs for polynomial regression without an intercept. Statistica Sinica, 5:441–458.
Huster, W. J., Brookmeyer, R., and Self, S. G. (1989). Modelling paired survival data with covariates. Biometrics, 45:145–156.
Imhof, L., Krafft, O., and Schaefer, M. (1988). D-optimal designs for polynomial regression with weight function x/(1 + x). Statistica Sinica, 8:1271–1274.
Jørgensen, B. (1997). The Theory of Dispersion Models. Chapman and Hall, London.
Karlin, S. and Studdent, W. J. (1966). Optimal experimental designs. The Annals of Mathematical Statistics, 37:783–815.
Khan, M. K. and Yazdi, A. A. (1988). On D-optimal deisngs for binary data. Journal of Statistical Planning and Inference, 18:83–91.
Kiefer, J. (1959). Optimum experimental designs. Journal of the Royal Statistical Society, Series B, 21:272–319.
Kiefer, J. (1961). Optimal designs in regression problems, II. The Annals of Mathematical Statistics, 32:298–325.
Kpamegan, E. (1998). D-optimal designs given a bivariate probit response function. Lecture Notes-Monograph Series, 34:62–72.
Liang, K. Y.and Self, S. G. and Chang, Y. C. (1993). Modelling marginal hazards in multivariate failure time data. Journal of the Royal Statistical Society, Series B, 55:441–453.
Lin, D. Y. (1994). Cox regression analysis of multivariate failure time data: the marginal approach. Statistics in Medicine, 13:2233–2247.
Mathew, T. and Sinha, B. K. (2001). Optimal designs for binary data under logistic regression. Journal of Statistical Planning and Inference, 93:295–307.
Maxim, L. D., Hendrickson, A. D., and Cullen, D. E. (1977). Experimental design for sensitivity testing: the Weibull model. Technometrics, 19:405–412.
Melas, V. (1978). Optimal designs for exponential regression. Statistics, 9:45–59.
Melas, V. (2000). Analytic theory of E-optimal designs for polynomial regression. Advances in Stochastic Simulation Methods.
Melas, V. (2001). Analytical properties of locally D-optimal designs for rational models.In: Atkinson, A. C. Hackel, P. M¨uller, W. J. (Eds.), MODA6–Advances in Model-Oriebted Design and Analysis, Physica-Verlag, Heidelberg:210–210.
Melas, V. (2005). On the functional approach to optimal designs for nonlinear models. Journal of Statistical Planning and Inference, 132:93–116.
Minkin, S. (1987). Optimal designs for binary data. Journal of the American Statistical Association, 82:1098–1103.
Mitchell, T. J. (1974). An algorithm for the construction of D-optimal experimental designs. Technometrics, 16:203–211.
Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135:370–384.
Nelsen, R. B. (1999). An Introduction to Copulas. Springer, New York.
Oakes, D. (1989). Bivariate survival models induced by frailties. Journal of the American Statistical Association, 84:487–493.
Qiu, Z., Song, P.-K., and Tan, M. (2008). Simplex mixed-effects models for longitudinal proportional data. Scandinavian Journal of Statistics, 35:577–596.
Schweizer, B. and Wolff, E. F. (1981). On nonparametric measure of dependence for random variables. Annals of Statistics, 9:879–885.
Shih, J. H. and Louis, T. A. (1995). Inferences on the association parameter in copula models for bivariate survival data. Biometrics, 51:1384–1399.
Sibson, R. (1972). Contribution to discussion of ‘results in the theory and construction of D-optimum experimental designs’ by H.P. Wynn. Journal of the Royal Statistical Society, Series B, 34:181–183.
Silvey, S. D. (1972). Contribution to discussion of ‘results in the theory and construction of D-optimum experimental designs’ by H.P. Wynn. Journal of the Royal Statistical Society, Series B, 34:174–175.
Silvey, S. D. (1980). Optimal Design: An Introduction to the Theory for Parameter Estimation. Chapman and Hall, London.
Sitter, R. R. and Forbes, B. E. (1997). Optimal two-stage designs for binary response experiments. Statistica Sinica, 7:941–955.
Sitter, R. R. and Torsney, B. (1995). Optimal designs for binary response experiments with two design variables. Statistica Sinica, 5:405–419.
Song, P.-K. and Tan, M. (2000). Marginal models for longitudinal continuous proportional data. Biometrics, 56:496–502.
Song, P. X.-K. (2007). Correlated Data Analysis: Modeling, Analytics and Applications. Springer, New York.
Studden, W. J. (1982). Some robust-type D-optimal designs in polynomial regression. Journal of the American Statistical Association, 77:916–921.
Studden, W. J. (2005). Elfving’s theorem revisited. Journal of Statistical Planning and Inference, 130:85–94.
Torsney, B. and Mandal, S. (2006). Two classes of multiplocative algorithms for construction optimizing distributions. Computational Statistics and Data Analysis, 51:1591–1601.
Torsney, B. and Mart´ın-Mart´ın, R. (2009). Multiplicative algorithms for computing optimum designs. Journal of Statistical Planning and Inference, 139:3947–3961.
Ucinki, D. and Bogacka, B. (2005). T-optimal design for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society, Series B, 67:3–18.
Wang, W. and Ding, A. A. (2000). On assessing the association for bivariate current status data. Biometrika, 87:879–893.
Wenqing, H. and Jerald, F. L. (2003). Flexible maximum likelihood methods for bivariate proportional hazards models. Biometrics, 59:837–848.
White, L. V. (1973). An extension of the general equivalence theorem to nonlinear models. Biometrika, 60:345–348.
Whittle, P. (1973). Some general points in the theory of optimal experimental design. Journal of the Royal Statistical Society, Series B, 35:123–130.
Wu, C. F. J. and Wynn, H. P. (1978). The convergence of general step-length algorithms for regular optimal design criteria. Annals of Statistics, 6:1273–1285.
Wynn, H. P. (1970). The sequential generation of D-optimum experimental designs. The Annals of Mathematical Statistics, 41:1655–1664.
Wynn, H. P. (1972). Results in the theory and construction of D-optimum experimental designs. Journal of the Royal Statistical Society, Series B, 34:133–147.
Yang, B. M. and Stufken, J. (2009). Support points of locally optimal designs for nonlinear models with two parameters. Annals of Statistics, 37:518–541.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code