Adapting cognitive diagnosis computerized adaptive testing item selection rules to traditional item response theory

English version

Autoři: Miguel A. Sorrel ^aff001; Juan R. Barrada ^aff002; Jimmy de la Torre ^aff003; Francisco José Abad ^aff001
Působiště autorů: Department of Social Psychology and Methodology, Universidad Autónoma de Madrid, Spain ^aff001; Department of Psychology and Sociology, Universidad de Zaragoza, Spain ^aff002; Faculty of Education, The University of Hong Kong, Hong Kong ^aff003
Vyšlo v časopise: PLoS ONE 15(1)
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pone.0227196

Souhrn

Currently, there are two predominant approaches in adaptive testing. One, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT), is based on cognitive diagnosis models, and the other, the traditional CAT, is based on item response theory. The present study evaluates the performance of two item selection rules (ISRs) originally developed in the CD-CAT framework, the double Kullback-Leibler information (DKL) and the generalized deterministic inputs, noisy “and” gate model discrimination index (GDI), in the context of traditional CAT. The accuracy and test security associated with these two ISRs are compared to those of the point Fisher information and weighted KL using a simulation study. The impact of the trait level estimation method is also investigated. The results show that the new ISRs, particularly DKL, could be used to improve the accuracy of CAT. Better accuracy for DKL is achieved at the expense of higher item overlap rate. Differences among the item selection rules become smaller as the test gets longer. The two CD-CAT ISRs select different types of items: items with the highest possible a parameter with DKL, and items with the lowest possible c parameter with GDI. Regarding the trait level estimator, expected a posteriori method is generally better in the first stages of the CAT, and converges with the maximum likelihood method when a medium to large number of items are involved. The use of DKL can be recommended in low-stakes settings where test security is less of a concern.

Klíčová slova:

Simulation and modeling – Algorithms – Probability distribution – Personality traits – Psychometrics – Bayesian method – Personality tests – Monte Carlo method

Zdroje

1. Magis D, Yan D, von Davier AA. Computerized adaptive and multistage testing with R. Cham: Springer International Publishing; 2017. doi: 10.1007/978-3-319-69218-0

2. Gibbons RD, Weiss DJ, Frank E, Kupfer D. Computerized adaptive diagnosis and testing of mental health disorders. Annu Rev Clin Psychol. 2016;12 : 83–104. doi: 10.1146/annurev-clinpsy-021815-093634 26651865

3. Barney M, Fisher WP. Adaptive measurement and assessment. Annu Rev Organ Psychol Organ Behav. 2016;3 : 469–490. doi: 10.1146/annurev-orgpsych-041015-062329

4. Stark S, Chernyshenko OS, Drasgow F, Nye CD, White LA, Heffner T, et al. From ABLE to TAPAS: A new generation of personality tests to support military selection and classification decisions. Mil Psychol. 2014;26 : 153–164. doi: 10.1037/mil0000044

5. Gibbons RD, Weiss DJ, Pilkonis PA, Frank E, Moore T, Kim JB, et al. Development of a computerized adaptive test for depression. Arch Gen Psychiatry. 2012;69 : 1104–1112. doi: 10.1001/archgenpsychiatry.2012.14 23117634

6. Zhan P, Jiao H, Liao D. Cognitive diagnosis modelling incorporating item response times. Br J Math Stat Psychol. 2018;71 : 262–286. doi: 10.1111/bmsp.12114 28872185

7. Ma W. A diagnostic tree model for polytomous responses with multiple strategies. Br J Math Stat Psychol. 2019;72 : 61–82. doi: 10.1111/bmsp.12137 29687453

8. Cheng Y. When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika. 2009;74 : 619–632. doi: 10.1007/s11336-009-9123-2

9. Hsu C-L, Wang W-C, Chen S-Y. Variable-length computerized adaptive testing based on cognitive diagnosis models. Appl Psychol Meas. 2013;37 : 563–582. doi: 10.1177/0146621613488642

10. Kaplan M, de la Torre J, Barrada JR. New item selection methods for cognitive diagnosis computerized adaptive testing. Appl Psychol Meas. 2015;39 : 167–188. doi: 10.1177/0146621614554650 29881001

11. Xu G, Wang C, Shang Z. On initial item selection in cognitive diagnostic computerized adaptive testing. Br J Math Stat Psychol. 2016;69 : 291–315. doi: 10.1111/bmsp.12072 27435032

12. Yigit HD, Sorrel MA, de la Torre J. Computerized adaptive testing for cognitively based multiple-choice data. Appl Psychol Meas. 2018; 014662161879866. doi: 10.1177/0146621618798665 31235984

13. Lehmann EL, Casella G. Theory of point estimation. New York: Springer-Verlag; 1998. doi: 10.1007/b98854

14. Lord FM. Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum; 1980.

15. Barrada JR, Abad FJ, Olea J. Varying the valuating function and the presentable bank in computerized adaptive testing. Span J Psychol. 2011;14 : 500–508. doi: 10.5209/rev_sjop.2011.v14.n1.45 21568205

16. Bradley RA, Gart JJ. The asymptotic properties of ML estimators when sampling from associated populations. Biometrika. 1962;49 : 205–214. doi: 10.2307/2333482

17. Barrada JR, Olea J, Ponsoda V, Abad FJ. Item selection rules in computerized adaptive testing: Accuracy and security. Methodology. 2009;5 : 7–17. doi: 10.1027/1614-2241.5.1.7

18. Barrada JR, Olea J, Ponsoda V, Abad FJ. A method for the comparison of item selection rules in computerized adaptive testing. Appl Psychol Meas. 2010;34 : 438–452. doi: 10.1177/0146621610370152

19. Chang H-H, Ying Z. A global information approach to computerized adaptive testing. Appl Psychol Meas. 1996;20 : 213–229. doi: 10.1177/014662169602000303

20. de la Torre J, Chiu C-Y. A general method of empirical Q-matrix validation. Psychometrika. 2016;81 : 253–273. doi: 10.1007/s11336-015-9467-8 25943366

21. Bock RD, Mislevy RJ. Adaptive EAP estimation of ability in a microcomputer environment. Appl Psychol Meas. 1982;6 : 431–444. doi: 10.1177/014662168200600405

22. Samejima F. Estimation of latent ability using a response pattern of graded scores. Psychometrika. 1969;34 : 1–97. doi: 10.1007/BF03372160

23. Wang T, Vispoel WP. Properties of ability estimation methods in computerized adaptive testing. J Educ Meas. 1998;35 : 109–135. doi: 10.1111/j.1745-3984.1998.tb00530.x

24. van der Linden WJ. Bayesian item selection criteria for adaptive testing. Psychometrika. 1998;63 : 201–216. doi: 10.1007/BF02294775

25. Way WD. Protecting the integrity of computerized testing item pools. Educ Meas Issues Pract. 1998;17 : 17–27. doi: 10.1111/j.1745-3992.1998.tb00632.x

26. Chen S-Y, Ankenmann RD, Chang H-H. A comparison of item selection rules at the early stages of computerized adaptive testing. Appl Psychol Meas. 2000;24 : 241–255. doi: 10.1177/01466210022031705

27. Dodd BG. The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model. Appl Psychol Meas. 1990;14 : 355–366. doi: 10.1177/014662169001400403

28. Veerkamp WJJ, Berger MPF. Some new item selection criteria for adaptive testing. J Educ Behav Stat. 1997;22 : 203–226. doi: 10.3102/10769986022002203

29. van der Linden WJ, Veldkamp BP. Constraining item exposure in computerized adaptive testing with shadow tests. J Educ Behav Stat. 2004;29 : 273–291. doi: 10.3102/10769986029003273

30. Barrada JR, Veldkamp BP, Olea J. Multiple maximum exposure rates in computerized adaptive testing. Appl Psychol Meas. 2009;33 : 58–73. doi: 10.1177/0146621608315329

31. Chen S-Y, Ankenmann RD, Spray JA. The relationship between item exposure and test overlap in computerized adaptive testing. J Educ Meas. 2003;40 : 129–145. doi: 10.1111/j.1745-3984.2003.tb01100.x

32. van der Linden WJ, Pashley PJ. Item selection and ability estimation in adaptive testing. In: van der Linden WJ, Glas CAW, editors. Elements of Adaptive Testing. New York, NY: Springer New York; 2010. pp. 3–30. doi: 10.1007/978-0-387-85461-8_1

33. Deshpande P, Sudeepthi Bl, Rajan S, Abdul Nazir C. Patient-reported outcomes: A new era in clinical research. Perspect Clin Res. 2011;2 : 137–144. doi: 10.4103/2229-3485.86879 22145124

34. Cella D, Riley W, Stone A, Rothrock N, Reeve B, Yount S, et al. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J Clin Epidemiol. 2010;63 : 1179–1194. doi: 10.1016/j.jclinepi.2010.04.011 20685078

35. Sawatzky R, Ratner PA, Kopec JA, Wu AD, Zumbo BD. The accuracy of computerized adaptive testing in heterogeneous populations: A mixture item-response theory analysis. Rapallo F, editor. PLOS ONE. 2016;11: e0150563. doi: 10.1371/journal.pone.0150563 26930348

36. Kearney N, McCann L, Norrie J, Taylor L, Gray P, McGee-Lennon M, et al. Evaluation of a mobile phone-based, advanced symptom management system (ASyMS) in the management of chemotherapy-related toxicity. Support Care Cancer. 2009;17 : 437–444. doi: 10.1007/s00520-008-0515-0 18953579

37. Weiss DJ, Kingsbury GG. Application of computerized adaptive testing to educational problems. J Educ Meas. 1984;21 : 361–375. doi: 10.1111/j.1745-3984.1984.tb01040.x

38. Magis D, Barrada JR. Computerized adaptive testing with R: Recent updates of the package catR. J Stat Softw. 2017;76. doi: 10.18637/jss.v076.c01

39. Barrada JR, Olea J, Ponsoda V, Abad FJ. Incorporating randomness in the Fisher information for improving item-exposure control in CATs. Br J Math Stat Psychol. 2008;61 : 493–513. doi: 10.1348/000711007X230937 17681109

40. Barrada JR, Abad FJ, Veldkamp BP. Comparison of methods for controlling maximum exposure rates in computerized adaptive testing. Psicothema. 2009;21 : 313–320. 19403088

41. Gierl MJ, Lai H, Turner SR. Using automatic item generation to create multiple-choice test items. Med Educ. 2012;46 : 757–765. doi: 10.1111/j.1365-2923.2012.04289.x 22803753

42. Olea J, Barrada JR, Abad FJ, Ponsoda V, Cuevas L. Computerized adaptive testing: The capitalization on chance problem. Span J Psychol. 2012;15 : 424–441. doi: 10.5209/rev_sjop.2012.v15.n1.37348 22379731

43. van der Linden WJ, Glas CAW. Capitalization on item calibration error in adaptive testing. Appl Meas Educ. 2000;13 : 35–53. doi: 10.1207/s15324818ame1301_2

44. Patton JM, Cheng Y, Yuan K-H, Diao Q. The influence of item calibration error on variable-length computerized adaptive testing. Appl Psychol Meas. 2013;37 : 24–40. doi: 10.1177/0146621612461727

45. Wang T, Hanson BA, Lau C-MA. Reducing bias in CAT trait estimation: A comparison of approaches. Appl Psychol Meas. 1999;23 : 263–278. doi: 10.1177/01466219922031383

46. Wang C, Chang H-H, Boughton KA. Kullback–Leibler information and its applications in multi-dimensional adaptive testing. Psychometrika. 2011;76 : 13–39. doi: 10.1007/s11336-010-9186-0

Adapting cognitive diagnosis computerized adaptive testing item selection rules to traditional item response theory

Souhrn

Klíčová slova:

Zdroje

PLOS One