Responsive image
博碩士論文 etd-0113118-145229 詳細資訊
Title page for etd-0113118-145229
論文名稱
Title
以交叉分類多層次試題反應模式探究差異試題功能成因
Investigating the Sources of Differential Item Functioning by Using Cross-Classification Multilevel Item Response Model
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
87
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2018-01-31
繳交日期
Date of Submission
2018-02-13
關鍵字
Keywords
交叉分類多層次試題反應模式、多層次模式、試題反應理論、差異試題功能成因、差異層面功能、差異試題功能
differential facet functioning, differential item functioning, multilevel model, item response theory, the sources of differential item functioning, cross-classification multilevel item response model
統計
Statistics
本論文已被瀏覽 5648 次,被下載 5
The thesis/dissertation has been browsed 5648 times, has been downloaded 5 times.
中文摘要
當前在教育以及心理計量等領域上,差異試題功能(differential item functioning, DIF)的評估已成為不可或缺的一部份。研究者對於DIF研究的思維也隨著時代不斷演進,近年來,研究者逐漸開始重視對於DIF成因的探討(Zumbo, 2007)。多層次模式上的交叉分類多層次試題反應模式(cross-classification multilevel item response model, CCMIRT)除了能進行差異層面功能(differential facet functioning, DFF)的檢測之外,由於設定試題為隨機效果,有研究者提出也可將DIF效果設定為隨機效果(Van den Noortgate & De Boeck, 2005),可在分析上得到更多訊息,其中最大的助益則是可以藉由固定效果的添加而觀察隨機DIF效果的變異情形。
在使用試題特徵探討DIF成因時,可將隨機DIF效果與DFF檢測進行結合,鑒於在實務上經常遭遇同時有數個試題特徵皆與試題難度相關的情況,本研究認為在分析模式中應同時考量數個試題特徵的主要效果。因此本研究提出在CCMIRT模式結合隨機DIF效果下可分析數個試題特徵的模式,同時操弄與DIF檢測有關的情境,以期瞭解各種變項在此架構下對於解釋DIF成因的效能以及可能會產生的影響。
Abstract
Differential item functioning (DIF) analyses are important in terms of test fairness
and test validaity. As Zumbo (2007) states, “Third Generation DIF” is best characterized by a subtle but extremely important change in how we think of DIF. The matter of wanting to know why DIF occurs is an early sign of the third generation of DIF.
To detect differential facet functioning (DFF) in such a multilevel setting, the cross-classification multilevel item response model (CCMIRT) can be adapted. When the group main effect and item-by-group interaction effects are included in the CCMIRT, the random effects of group over items represent the DIF residual. The CCMIRT can be further extended by adding item characteristics predictors to explain the DIF.
The purpose of this study is to investigate the sources of DIF by using the CCMIRT combining with DFF. In the simulation study, the variable about DIF effect were manipulated to better understand the performance of the CCMIRT. For fitting real test situation, the model including multiple item properties was suggested in this study.
目次 Table of Contents
論文審定書 i
謝 辭 ii
摘 要 iii
Abstract iv
目 錄 v
圖 次 vi
表 次 vii
第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的 5
第二章 文獻探討 7
第一節 差異試題功能與檢測方法 7
第二節 差異試題功能成因 11
第三節 多層次試題反應模式 17
第四節 交叉分類多層次試題反應模式 22
第三章 研究方法與設計 28
第一節 研究方法 28
第二節 研究設計 35
第四章 研究結果 41
第一節 DIF與DFF檢測結果 41
第二節 以解釋係數探討DIF效果解釋力 53
第三節 以模式適配度指標掌握DIF解釋模式 57
第五章 結論與建議 61
第一節 結論 61
第二節 後續研究建議 72
參考文獻 74
參考文獻 References
Abbot, M. L. (2007). A confirmatory approach to differential item functioning on an
ESL reading assessment. Language Testing, 24, 7–36.
Abedi, J. (2002). Standardized achievement tests and English language learners:
Psychometric issues. Educational Assessment, 8, 231-257.
Abedi, J., Bailey, A., Butler, F., Castellon-Wellington, M., Leon, S., & Mirocha, J.
(2005). The Validity of Administering Large-Scale Content Assessments to English Language Learners: An Investigation from Three Perspectives. CSE Report 663. National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
Abedi, J., Lord, C., & Plummer, J. R. (1997). Final report of language background as a
variable in NAEP mathematics performance. Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, Graduate School of Education & Information Studies, University of California, Los Angeles.
Adams, R. J., & Wilson, M. (1996). Formulating the Rasch model as a mixed
coefficients multinomial logit. In G. Englhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 143-166). Norwood, NJ: Ablex.
Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An
approach to errors in variables regression. Journal of Educational and Behavioral
Statistics, 22, 47–76.
Albano, A. D., & Rodriguez, M. C. (2013). Examining differential math performance
by gender and opportunity to learn. Educational and Psychological Measurement,
73, 836–856.
American Educational Research Association, American Psychological Association, &
National Council on Measurement in Education. (1999). Standards for educational
and psychological testing.Washington, DC: American Psychological Association.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P.
W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Hillside,
NJ: Lawrence Erlbaum.
Banks, K. (2009). Using DDF in a post hoc analysis to understand sources of DIF.
Educational Assessment, 14, 103–118
Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r-
project. org/book.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects
models using Eigen and S4. R package version 1.1–7. 2014.Institute for Statistics and Mathematics of WU website. http://CRAN. R-project. org/package= lme4. Accessed March, 18.
Beretvas, S. N., Cawthon, S. W., Lockhart, L. L., & Kaye, A. D. (2012). Assessing
Impact, DIF, and DFF in Accommodated Item Scores A Comparison of Multilevel
Measurement Model Parameterizations. Educational and Psychological Measurement, 72(5), 754-773.
Beretvas, S. N., & Walker, C. M. (2012). Distinguishing differential testlet functioning
from differential bundle functioning using the multilevel measurement model. Educational and Psychological Measurement, 72(2), 200-2
Beretvas, S. N., & Williams, N. J. (2004). The use of HGLM as an item dimensionality
assessment. Journal of Educational Measurement, 41, 379-395.
Bolt, D. (2002). Studying the potential of nuisance dimensions using bundle DIF and
multidimensional IRT analyses. In annual meeting of the National Council on Measurement in Education, New Orleans: LA. Tallahassee, FL.
Cai, L. (2015). Examining sources of gender DIF using cross-classification multilevel
IRT models. Unpublished Masters thesis. University of Nebraska-Lincoln.
Camilli, G.L., & Shepard, L.A. (1994). Methods for identifying biased test items.
Thousand Oakes, CA:Sage.
Cheong, Y. F. (2001). Detecting ethnic differences in externalizing problem behavior
items via multilevel and multidimensional Rasch models. In annual meeting of the American Educational Research Association, Seattle, WA.
Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for
children’s problem behaviors. Psychological Methods, 5, 477-495.
Chu, K. L., & Kamata, A. (2004). Test equating in the presence of DIF items. Journal of
applied measurement, 6(3), 342-354.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify
differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44.
Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item
functioning. Journal of Educational Measurement, 42(2), 133-148.
De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item
functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559
De Boeck, P., Cho, S.-J., & Wilson, M. (2011). Explanatory secondary dimension
modeling of latent differential item functioning. Applied Psychological
Measurement, 38, 583–603.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-
Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential
item functioning (pp.35-66). Hillsdale, NJ: Lawrence Erlbaum Associates
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization
approach to assessing unexpected differential item performance on the scholastic
aptitude test. Journal of Educational Measurement, 23, 355–368.
Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item‐Bundle DIF Hypothesis
Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning. Journal of Educational Measurement,33(4), 465-484.
Engelhard, G. (1992). The measurement of writing ability with a many-faceted Rasch
model. Applied Measurement in Education, 5, 171-191.
Ercikan, K. (1998). Translation effects in international assessments. International
Journal of Educational Research, 29(6), 543-553.
Ercikan, K. (2002). Disentangling sources of differential item functioning in
multilanguage assessments. International Journal of Testing, 2(3-4), 199-215.
Ercikan, K., Arim, R. G., Law, D. M., Lacroix, S., Gagnon, F., & Domene, J. F. (2010).
Application of think-aloud protocols in examining sources of differential item functioning. Educational Measurement: Issues and Practice, 29(2), 24–35.
Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of
bilingual versions of assessments: Sources of incomparability of English and French versions of Canada’s national achievement tests, Applied Measurement in Education, 17, 301–321.
Ercikan, K., & Lyons-Thomas, J. (2013). Adapting tests for use in other languages and
cultures. In K. Geisinger (Ed.), APA handbook testing and assessment in psychology (Vol. 3; pp. 545–569). Washington, DC: American Psychological Association
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational
research. Acta Psychologica, 37, 359-374.
Fox, J. P. & Glas, C. A. W. (1998). Multi-level IRT with measurement error in the
predictor variables. Research Report 98-16, University of Twente: The Netherlands.
Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using
Gibbs sampling. Psychometrika, 66, 269–286.
Fox, J.-P., & Glas, C.A.W. (2003). Bayesian modeling of measurement error in
predictor variables using item response theory. Psychometrika, 68, 169–191.
Geranpayeh, A., & Kunnan, A. J. (2007). Differential Item Functioning in Terms of Age
in the Certificate in Advanced English Examination∗. Language Assessment Quarterly, 4(2), 190-222.
Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying Content
and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the Multidimensionality‐Based DIF Analysis Paradigm.Journal of Educational Measurement, 40(4), 281-306.
Gierl, M. J., & Bolt, D. M. (2001). Illustrating the use of nonparametric regression to
assess differential item and bundle functioning among multiple groups.International Journal of Testing, 1(3-4), 249-270.
Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle
functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement,38(2), 164-187.
Goldstein, H. (1987). Multilevel models in educational and social research. London:
Griffin.
Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential
distractor functioning.Journal of Educational Measurement, 26(2), 147-160.
Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mentel-
Haenszel procedure. In H. Holland & H.I. Braun (Eds.), Test validity (pp. 129-145).Hillsdale, NJ:Erlbaum.
Kamata, A. (1998). Some generalizations of the Rasch model: an application of the
hierarchical generalized linear model. Unpublished doctoral dissertation. Michigan State University.
Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal
of Educational Measurement, 38, 79-93.
Lepik, M. (1990). Algebraic word problems: Role of linguistic and structural variables.
Educational Studies in Mathematics, 21(1), 83-90.
Lord, F. M. (1980). Applications of item response theory to practical testing problems.
Routledge.
Luppescu, S. (2002). DIF detection in HLM item analysis. Paper presented at the
Annual meeting of the American Eductional Research Association, New Orleans.
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on
the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443-451.
Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in
mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289-304.
Meulders, M., & Xie, Y. (2004). Person-by-item predictors. In Explanatory item
response models (pp. 213-240). Springer New York.
Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy-
Stout's test for DIF. Journal of Educational Measurement, 30(4), 293-311.
Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Jouranl of Psychological Assessment, 18, 9-15.
Oliveri, M. E., & Ercikan, K. (2011). Do different approaches to examining construct
comparability lead to similar conclusions? Applied Measurement in Education, 24, 1–18.
Oliveri, M. E., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class
differential item functioning in international assessments. International Journal of
Testing, 13(3), 272-293.
Pae, T. I. (2004). DIF for examinees with different academic backgrounds. Language
testing, 21(1), 53-73.
Plake, B.S. (1981). An ANOVA methodology to identify biased test items that takes
instructional level into account. Educational and Psychological Measurement, 41, 365-368.
R Core Team (2015). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53,
495–502.
Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data with
applications in cross-sectional and longitudinal research. Journal of Educational and Behavioral Statistics, 18(4), 321-349.
Ravand, H. (2015). Assessing Testlet Effect, Impact, Differential Testlet, and Item
Functioning Using Cross-Classified Multilevel Measurement Modeling.SAGE Open, 5(2), 2158244015585607.
Roth, W. M., Ercikan, K., Simon, M., & Fola, R. (2015). The assessment of
mathematical literacy of linguistic minority students: Results of a multi-method investigation. The Journal of Mathematical Behavior, 40, 88-105.
Roth, W.-M., Oliveri, M. E., Sandilands, D., Lyons-Thomas, J., & Ercikan, K. (2013).
Investigating sources of differential item functioning using expert think-aloud protocols. International Journal of Science Education, 35, 546–576.
Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm.
Applied Psychological Measurement, 20, 355-371.
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational
Measurement, 16(3), 143-152.
Shealy, R., & Stout, W.F. (1993a). An item response theory for test bias. In P.W.
Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale,
NJ:Erlbaum.
Shealy, R., & Stout, W. F. (1993b). A model-based standardization approach that
separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF.Psychometrika, 58, 159-194.
Sireci, S. G., Fitzgerald, C., & Xing, D. (1998). Adapting credentialing examinations
for international uses. Laboratory of Psychometric and Evaluative Research report No. 329. Amherst: University of Massachusetts, School of Education
Skrondal, A., & Rabe-Hesketh, S. (2004).Generalized latent variable modeling:
Multilevel, longitudinal, and structural equation models. Crc Press.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using
logistic regression procedures. Journal of Educational measurement,27(4), 361-370.
Swanson, D. B., Clausesr, B. E., Case, S. M., Nungester, R. J., & Featherman, C.
(2002). Analysis of differential item functioning using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27, 53–75.
Thissen, D, Steinberg, L., & Wainer, H. (1988). Use of item response theory in the
study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169).Hillsdale, NJ: Lawrence Erlbaum.
Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential
item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443–464.
Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification
multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369-386.
Wang, W. C., & Su, Y. -H. (2004). Effect of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113-144.
Walker, C. M. (2011). Why the DIF? Why differential item functioning analyses are an
important part of instrument development and validation. Journal of Psychoeducational Assessment, 29, 364-376.
Williams, N. J., & Beretvas, N. S. (2006). DIF identification using HGLM for
polytomous items. Applied Psychological Measurement, 30, 22–42.
Xie, Y., & Wilson, M. (2008). Investigating DIF and extensions using an LLTM
approach and also an individual differences approach: an international testing context. Psychology Science, 50(3), 403.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item
functioning (DIF). Ottawa, Ontario,Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2007). Three generation of DIF analyses: Considering where it has been,
where it is now, and where it is going. Language Assessment Quarterly: An International Journal. 4, 223-233.
Zumbo, B. D., & Gelin, M. N. (2005). A Matter of Test Bias in Educational Policy
Research: Bringing the Context into Picture by Investigating Sociological/Community Moderated (or Mediated) Test and Item Bias. Journal of Educational Research & Policy Studies, 5(1), 1-23.
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K.
(2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136-1.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code