國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以交叉分類多層次試題反應模式探究差異試題功能成因,Investigating the Sources of Differential Item Functioning by Using Cross-Classification Multilevel Item Response Model

論文名稱 Title	以交叉分類多層次試題反應模式探究差異試題功能成因 Investigating the Sources of Differential Item Functioning by Using Cross-Classification Multilevel Item Response Model
系所名稱 Department	教育研究所 Institute of Education
畢業學年期 Year, semester	106 學年度第 1 學期 The fall semester of Academic Year 106	語文別 Language	中文 Chinese
學位類別 Degree	博士 Ph.D.	頁數 Number of pages	87
研究生 Author	孫國瑋 Guo-wei Sun
指導教授 Advisor	施慶麟 Ching-Lin Shih
召集委員 Convenor	鄭中平 Chung-Ping Cheng
口試委員 Advisory Committee	鄭英耀, 陳承德, 鄭雯 Ying-Yao Cheng; Cheng-Te Chen; Wen Cheng
口試日期 Date of Exam	2018-01-31	繳交日期 Date of Submission	2018-02-13
關鍵字 Keywords	交叉分類多層次試題反應模式、多層次模式、試題反應理論、差異試題功能成因、差異層面功能、差異試題功能 differential facet functioning, differential item functioning, multilevel model, item response theory, the sources of differential item functioning, cross-classification multilevel item response model
統計 Statistics	本論文已被瀏覽 5648 次，被下載 5 次 The thesis/dissertation has been browsed 5648 times, has been downloaded 5 times.

中文摘要
當前在教育以及心理計量等領域上，差異試題功能(differential item functioning, DIF)的評估已成為不可或缺的一部份。研究者對於DIF研究的思維也隨著時代不斷演進，近年來，研究者逐漸開始重視對於DIF成因的探討(Zumbo, 2007)。多層次模式上的交叉分類多層次試題反應模式(cross-classification multilevel item response model, CCMIRT)除了能進行差異層面功能(differential facet functioning, DFF)的檢測之外，由於設定試題為隨機效果，有研究者提出也可將DIF效果設定為隨機效果(Van den Noortgate & De Boeck, 2005)，可在分析上得到更多訊息，其中最大的助益則是可以藉由固定效果的添加而觀察隨機DIF效果的變異情形。在使用試題特徵探討DIF成因時，可將隨機DIF效果與DFF檢測進行結合，鑒於在實務上經常遭遇同時有數個試題特徵皆與試題難度相關的情況，本研究認為在分析模式中應同時考量數個試題特徵的主要效果。因此本研究提出在CCMIRT模式結合隨機DIF效果下可分析數個試題特徵的模式，同時操弄與DIF檢測有關的情境，以期瞭解各種變項在此架構下對於解釋DIF成因的效能以及可能會產生的影響。
Abstract
Differential item functioning (DIF) analyses are important in terms of test fairness and test validaity. As Zumbo (2007) states, “Third Generation DIF” is best characterized by a subtle but extremely important change in how we think of DIF. The matter of wanting to know why DIF occurs is an early sign of the third generation of DIF. To detect differential facet functioning (DFF) in such a multilevel setting, the cross-classification multilevel item response model (CCMIRT) can be adapted. When the group main effect and item-by-group interaction effects are included in the CCMIRT, the random effects of group over items represent the DIF residual. The CCMIRT can be further extended by adding item characteristics predictors to explain the DIF. The purpose of this study is to investigate the sources of DIF by using the CCMIRT combining with DFF. In the simulation study, the variable about DIF effect were manipulated to better understand the performance of the CCMIRT. For fitting real test situation, the model including multiple item properties was suggested in this study.

目次 Table of Contents
論文審定書 i 謝辭 ii 摘要 iii Abstract iv 目錄 v 圖次 vi 表次 vii 第一章緒論 1 第一節研究背景與動機 1 第二節研究目的 5 第二章文獻探討 7 第一節差異試題功能與檢測方法 7 第二節差異試題功能成因 11 第三節多層次試題反應模式 17 第四節交叉分類多層次試題反應模式 22 第三章研究方法與設計 28 第一節研究方法 28 第二節研究設計 35 第四章研究結果 41 第一節 DIF與DFF檢測結果 41 第二節以解釋係數探討DIF效果解釋力 53 第三節以模式適配度指標掌握DIF解釋模式 57 第五章結論與建議 61 第一節結論 61 第二節後續研究建議 72 參考文獻 74

參考文獻 References
Abbot, M. L. (2007). A conﬁrmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24, 7–36. Abedi, J. (2002). Standardized achievement tests and English language learners: Psychometric issues. Educational Assessment, 8, 231-257. Abedi, J., Bailey, A., Butler, F., Castellon-Wellington, M., Leon, S., & Mirocha, J. (2005). The Validity of Administering Large-Scale Content Assessments to English Language Learners: An Investigation from Three Perspectives. CSE Report 663. National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Abedi, J., Lord, C., & Plummer, J. R. (1997). Final report of language background as a variable in NAEP mathematics performance. Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, Graduate School of Education & Information Studies, University of California, Los Angeles. Adams, R. J., & Wilson, M. (1996). Formulating the Rasch model as a mixed coefﬁcients multinomial logit. In G. Englhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 143-166). Norwood, NJ: Ablex. Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76. Albano, A. D., & Rodriguez, M. C. (2013). Examining differential math performance by gender and opportunity to learn. Educational and Psychological Measurement, 73, 836–856. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing.Washington, DC: American Psychological Association. Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Hillside, NJ: Lawrence Erlbaum. Banks, K. (2009). Using DDF in a post hoc analysis to understand sources of DIF. Educational Assessment, 14, 103–118 Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r- project. org/book. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1–7. 2014.Institute for Statistics and Mathematics of WU website. http://CRAN. R-project. org/package= lme4. Accessed March, 18. Beretvas, S. N., Cawthon, S. W., Lockhart, L. L., & Kaye, A. D. (2012). Assessing Impact, DIF, and DFF in Accommodated Item Scores A Comparison of Multilevel Measurement Model Parameterizations. Educational and Psychological Measurement, 72(5), 754-773. Beretvas, S. N., & Walker, C. M. (2012). Distinguishing differential testlet functioning from differential bundle functioning using the multilevel measurement model. Educational and Psychological Measurement, 72(2), 200-2 Beretvas, S. N., & Williams, N. J. (2004). The use of HGLM as an item dimensionality assessment. Journal of Educational Measurement, 41, 379-395. Bolt, D. (2002). Studying the potential of nuisance dimensions using bundle DIF and multidimensional IRT analyses. In annual meeting of the National Council on Measurement in Education, New Orleans: LA. Tallahassee, FL. Cai, L. (2015). Examining sources of gender DIF using cross-classification multilevel IRT models. Unpublished Masters thesis. University of Nebraska-Lincoln. Camilli, G.L., & Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oakes, CA:Sage. Cheong, Y. F. (2001). Detecting ethnic differences in externalizing problem behavior items via multilevel and multidimensional Rasch models. In annual meeting of the American Educational Research Association, Seattle, WA. Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for children’s problem behaviors. Psychological Methods, 5, 477-495. Chu, K. L., & Kamata, A. (2004). Test equating in the presence of DIF items. Journal of applied measurement, 6(3), 342-354. Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133-148. De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559 De Boeck, P., Cho, S.-J., & Wilson, M. (2011). Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement, 38, 583–603. Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel- Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.35-66). Hillsdale, NJ: Lawrence Erlbaum Associates Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23, 355–368. Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item‐Bundle DIF Hypothesis Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning. Journal of Educational Measurement,33(4), 465-484. Engelhard, G. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5, 171-191. Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29(6), 543-553. Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3-4), 199-215. Ercikan, K., Arim, R. G., Law, D. M., Lacroix, S., Gagnon, F., & Domene, J. F. (2010). Application of think-aloud protocols in examining sources of differential item functioning. Educational Measurement: Issues and Practice, 29(2), 24–35. Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of English and French versions of Canada’s national achievement tests, Applied Measurement in Education, 17, 301–321. Ercikan, K., & Lyons-Thomas, J. (2013). Adapting tests for use in other languages and cultures. In K. Geisinger (Ed.), APA handbook testing and assessment in psychology (Vol. 3; pp. 545–569). Washington, DC: American Psychological Association Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374. Fox, J. P. & Glas, C. A. W. (1998). Multi-level IRT with measurement error in the predictor variables. Research Report 98-16, University of Twente: The Netherlands. Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269–286. Fox, J.-P., & Glas, C.A.W. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68, 169–191. Geranpayeh, A., & Kunnan, A. J. (2007). Differential Item Functioning in Terms of Age in the Certificate in Advanced English Examination∗. Language Assessment Quarterly, 4(2), 190-222. Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying Content and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the Multidimensionality‐Based DIF Analysis Paradigm.Journal of Educational Measurement, 40(4), 281-306. Gierl, M. J., & Bolt, D. M. (2001). Illustrating the use of nonparametric regression to assess differential item and bundle functioning among multiple groups.International Journal of Testing, 1(3-4), 249-270. Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement,38(2), 164-187. Goldstein, H. (1987). Multilevel models in educational and social research. London: Griﬃn. Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning.Journal of Educational Measurement, 26(2), 147-160. Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mentel- Haenszel procedure. In H. Holland & H.I. Braun (Eds.), Test validity (pp. 129-145).Hillsdale, NJ:Erlbaum. Kamata, A. (1998). Some generalizations of the Rasch model: an application of the hierarchical generalized linear model. Unpublished doctoral dissertation. Michigan State University. Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79-93. Lepik, M. (1990). Algebraic word problems: Role of linguistic and structural variables. Educational Studies in Mathematics, 21(1), 83-90. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge. Luppescu, S. (2002). DIF detection in HLM item analysis. Paper presented at the Annual meeting of the American Eductional Research Association, New Orleans. Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443-451. Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289-304. Meulders, M., & Xie, Y. (2004). Person-by-item predictors. In Explanatory item response models (pp. 213-240). Springer New York. Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy- Stout's test for DIF. Journal of Educational Measurement, 30(4), 293-311. Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Jouranl of Psychological Assessment, 18, 9-15. Oliveri, M. E., & Ercikan, K. (2011). Do different approaches to examining construct comparability lead to similar conclusions? Applied Measurement in Education, 24, 1–18. Oliveri, M. E., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272-293. Pae, T. I. (2004). DIF for examinees with different academic backgrounds. Language testing, 21(1), 53-73. Plake, B.S. (1981). An ANOVA methodology to identify biased test items that takes instructional level into account. Educational and Psychological Measurement, 41, 365-368. R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502. Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational and Behavioral Statistics, 18(4), 321-349. Ravand, H. (2015). Assessing Testlet Effect, Impact, Differential Testlet, and Item Functioning Using Cross-Classified Multilevel Measurement Modeling.SAGE Open, 5(2), 2158244015585607. Roth, W. M., Ercikan, K., Simon, M., & Fola, R. (2015). The assessment of mathematical literacy of linguistic minority students: Results of a multi-method investigation. The Journal of Mathematical Behavior, 40, 88-105. Roth, W.-M., Oliveri, M. E., Sandilands, D., Lyons-Thomas, J., & Ercikan, K. (2013). Investigating sources of differential item functioning using expert think-aloud protocols. International Journal of Science Education, 35, 546–576. Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371. Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16(3), 143-152. Shealy, R., & Stout, W.F. (1993a). An item response theory for test bias. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale, NJ:Erlbaum. Shealy, R., & Stout, W. F. (1993b). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF.Psychometrika, 58, 159-194. Sireci, S. G., Fitzgerald, C., & Xing, D. (1998). Adapting credentialing examinations for international uses. Laboratory of Psychometric and Evaluative Research report No. 329. Amherst: University of Massachusetts, School of Education Skrondal, A., & Rabe-Hesketh, S. (2004).Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Crc Press. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational measurement,27(4), 361-370. Swanson, D. B., Clausesr, B. E., Case, S. M., Nungester, R. J., & Featherman, C. (2002). Analysis of differential item functioning using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27, 53–75. Thissen, D, Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169).Hillsdale, NJ: Lawrence Erlbaum. Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443–464. Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369-386. Wang, W. C., & Su, Y. -H. (2004). Effect of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113-144. Walker, C. M. (2011). Why the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment, 29, 364-376. Williams, N. J., & Beretvas, N. S. (2006). DIF identiﬁcation using HGLM for polytomous items. Applied Psychological Measurement, 30, 22–42. Xie, Y., & Wilson, M. (2008). Investigating DIF and extensions using an LLTM approach and also an individual differences approach: an international testing context. Psychology Science, 50(3), 403. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Ottawa, Ontario,Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Zumbo, B. D. (2007). Three generation of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly: An International Journal. 4, 223-233. Zumbo, B. D., & Gelin, M. N. (2005). A Matter of Test Bias in Educational Policy Research: Bringing the Context into Picture by Investigating Sociological/Community Moderated (or Mediated) Test and Item Bias. Journal of Educational Research & Policy Studies, 5(1), 1-23. Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136-1.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0113118-145229.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS