Factoring a 2 x 2 contingency table
Autoři:
Stanley Luck aff001
Působiště autorů:
Science, Technology and Research Institute of Delaware, Wilmington, DE, United States of America
aff001
Vyšlo v časopise:
PLoS ONE 14(10)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0224460
Souhrn
We show that a two-component proportional representation provides the necessary framework to account for the properties of a 2 × 2 contingency table. This corresponds to the factorization of the table as a product of proportion and diagonal row or column sum matrices. The row and column sum invariant measures for proportional variation are obtained. Geometrically, these correspond to displacements of two point vectors in the standard one-simplex, which are reduced to a center-of-mass coordinate representation, ( δ , μ ) ∈ R 2. Then, effect size measures, such as the odds ratio and relative risk, correspond to different perspective functions for the mapping of (δ, μ) to R 1. Furthermore, variations in δ and μ will be associated with different cost-benefit trade-offs for a given application. Therefore, pure mathematics alone does not provide the specification of a general form for the perspective function. This implies that the question of the merits of the odds ratio versus relative risk cannot be resolved in a general way. Expressions are obtained for the marginal sum dependence and the relations between various effect size measures, including the simple matching coefficient, odds ratio, relative risk, Yule’s Q, ϕ, and Goodman and Kruskal’s τc|r. We also show that Gini information gain (IGG) is equivalent to ϕ2 in the classification and regression tree (CART) algorithm. Then, IGG can yield misleading results due to the dependence on marginal sums. Monte Carlo methods facilitate the detailed specification of stochastic effects in the data acquisition process and provide a practical way to estimate the confidence interval for an effect size.
Klíčová slova:
Algorithms – Normal distribution – Data acquisition – Nursing homes – Decision trees – Linkage disequilibrium – Contingency tables
Zdroje
1. Yule GU. On the Methods of Measuring Association Between Two Attributes. Journal of the Royal Statistical Society. 1912;75(6):579–652. doi: 10.2307/2340126
2. Goodman LA, Kruskal WH. Measures of Association for Cross Classifications. J Amer Statis Assoc. 1954;49:732–764. doi: 10.1080/01621459.1954.10501231
3. Hedrick P. Gametic disequilibrium measures: proceed with caution. Genetics. 1987;341:331–341.
4. Davenport EC, El-Sanhurry NA. Phi/Phimax: Review and Synthesis. Educational and Psychological Measurement. 1991;51(4):821–828. doi: 10.1177/001316449105100403
5. VanLiere JM, Rosenberg NA. Mathematical properties of the r2 measure of linkage disequilibrium. Theoretical Population Biology. 2008;74(1):130–137. doi: 10.1016/j.tpb.2008.05.006 18572214
6. Olivier J, Bell ML. Effect Sizes for 2 × 2 Contingency Tables. PLoS ONE. 2013;8(3):e58777. doi: 10.1371/journal.pone.0058777 23505560
7. Haddock CK, Rindskopf D, Shadish WR. Using odds ratios as effect sizes for meta-analysis of dichotomous data: A primer on methods and issues. Psychological Methods. 1998;3(3):339–353. doi: 10.1037/1082-989X.3.3.339
8. Kraemer HC. Reconsidering the odds ratio as a measure of 2 × 2 association in a population. Statistics in Medicine. 2004;23(2):257–270. doi: 10.1002/sim.1714 14716727
9. Ruxton GD, Neuhäuser M. Review of alternative approaches to calculation of a confidence interval for the odds ratio of a 2 × 2 contingency table. Methods in Ecology and Evolution. 2013;4(1):9–13. doi: 10.1111/j.2041-210x.2012.00250.x
10. Grant RL. Converting an odds ratio to a range of plausible relative risks for better communication of research findings. BMJ. 2014;348(jan24 1):f7450–f7450. doi: 10.1136/bmj.f7450 24464277
11. Warrens MJ. On Association Coefficients for 2 × 2 Tables and Properties That Do Not Depend on the Marginal Distributions. Psychometrika. 2008;73(4):777–789. doi: 10.1007/s11336-008-9070-3 20046834
12. Hubálek Z. Coefficients of Association and Similarity, Based on Binary (Presence-Absense) Data: An Evaluation. Biological Reviews. 1982;57(4):669–689. doi: 10.1111/j.1469-185X.1982.tb00376.x
13. Boyd SP, Vandenberghe L. Convex optimization. New York, NY: Cambridge University Press; 2004.
14. Beló A, Zheng P, Luck S, Shen B, Meyer DJ, Li B, et al. Whole genome scan detects an allelic variant of fad2 associated with increased oleic acid levels in maize. Molecular Genetics and Genomics. 2008;279(1):1–10. doi: 10.1007/s00438-007-0289-y
15. Loh WY. Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011;1(1):14–23.
16. Krzywinski M, Altman N. Points of Significance: Classification and regression trees. Nature Methods. 2017;14(8):757–758. doi: 10.1038/nmeth.4370
17. Reid M, Szendröi B. Geometry and Topology. New York: Cambridge University Press; 2005.
18. Bland JM, Altman DG. Statistics Notes: The odds ratio. BMJ. 2000;320(7247):1468–1468. doi: 10.1136/bmj.320.7247.1468 10827061
19. Newcombe RG. A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine. 2006;25(24):4235–4240. doi: 10.1002/sim.2683 16927451
20. Sistrom CL, Garvan CW. Proportions, Odds, and Risk. Radiology. 2004;230(1):12–19. doi: 10.1148/radiol.2301031028 14695382
21. Pearson K, Heron D. On Theories of Association. Biometrika. 1913;9:159–315. doi: 10.2307/2331805
22. Zysno PV. The modification of the phi-coefficient reducing its dependence on the marginal distributions. Methods of Psychological Research. 1997;2(1):41–53.
23. Richardson JT. The analysis of 2 × 1 and 2 × 2 contingency tables: an historical review. Statistical Methods in Medical Research. 1994;3(2):107–133. doi: 10.1177/096228029400300202 7952428
24. Cohen J. A power primer. Psychological Bulletin. 1992;112(1):155–159. doi: 10.1037//0033-2909.112.1.155 19565683
25. Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biological reviews of the Cambridge Philosophical Society. 2007;82(4):591–605. doi: 10.1111/j.1469-185X.2007.00027.x 17944619
26. Cumming G. Understanding The New Statistics. New York, NY: Routledge; 2012.
27. Marsaglia G. Ratios of Normal Variables. Journal of Statistical Software. 2006;16(4):1–10. doi: 10.18637/jss.v016.i04
28. von Luxburg U, Franz VH. A Geometric Approach to Confidence Sets for Ratios: Fieller’s Theorem, Generalizations, and Bootstrap. Statistica Sinica. 2009;19:1095–1117.
29. Newcombe RG. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine. 1998;17(8):873–890. doi: 10.1002/(sici)1097-0258(19980430)17:8<873::aid-sim779>3.0.co;2-i 9595617
30. Agresti A. Dealing with discreteness: making ‘exact’ confidence intervals for proportions, differences of proportions, and odds ratios more exact. Statistical Methods in Medical Research. 2003;12(1):3–21. doi: 10.1191/0962280203sm311ra 12617505
31. Banik S, Kibria BM. Confidence Intervals for the Population Correlation Coefficient ρ. International Journal of Statistics in Medical Research. 2016;5(2):99–111. doi: 10.6000/1929-6029.2016.05.02.4
32. Bishara AJ, Hittner JB. Confidence intervals for correlations when data are not normal. Behavior Research Methods. 2017;49(1):294–309. doi: 10.3758/s13428-016-0702-8 26822671
33. Bevington PR, Robinson DK. Data Reduction and Error Analysis for the Physical Sciences. 3rd ed. New York, NY: McGraw-Hill; 2003.
34. Kroese DP, Brereton T, Taimre T, Botev ZI. Why the Monte Carlo method is so important today. Wiley Interdisciplinary Reviews: Computational Statistics. 2014;6(6):386–392. doi: 10.1002/wics.1314
35. Buonaccorsi JP. Measurement error: models, methods, and applications. Boca Raton: Chapman and Hall/CRC; 2010.
36. Höfler M. The effect of misclassification on the estimation of association: a review. International Journal of Methods in Psychiatric Research. 2005;14(2):92–101. doi: 10.1002/mpr.20
37. Berry KJ, Johnston JE, Mielke PW. A Measure of Effect Size for R × C Contingency Tables. Psychological Reports. 2006;99(1):251–256. doi: 10.2466/pr0.99.1.251-256 17037476
38. Thomson G, Single RM. Conditional Asymmetric Linkage Disequilibrium (ALD): Extending the Biallelic r2 Measure. Genetics. 2014;198(1):321–331. doi: 10.1534/genetics.114.165266 25023400
39. Logan JD. Applied Mathematics. 2nd ed. New York, NY: John Wiley & Sons, Inc.; 1997.
40. Casella G, Berger R. Statistical Inference. 2nd ed. Pacific Grove, CA: Duxbury; 2002.
41. Kateri M. Contingency Table Analysis. New York, NY: Springer New York; 2014.
42. Kettenring JR. Coping with high dimensionality in massive datasets. Wiley Interdisciplinary Reviews: Computational Statistics. 2011;3(2):95–103. doi: 10.1002/wics.141
43. Coveney PV, Dougherty ER, Highfield RR. Big data need big theory too. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2016;374(2080):20160153. doi: 10.1098/rsta.2016.0153
44. Duda RO, Hart PE, Stork DG. Pattern classification. Wiley; 2001.
45. de Ville B. Decision trees. Wiley Interdisciplinary Reviews: Computational Statistics. 2013;5(6):448–455. doi: 10.1002/wics.1278
46. Loh WY. Fifty Years of Classification and Regression Trees. International Statistical Review. 2014;82(3):329–348. doi: 10.1111/insr.12016
47. Mingers J. An empirical comparison of selection measures for decision-tree induction. Machine Learning. 1989;3(4):319–342. doi: 10.1023/A:1022645801436
48. Krzywinski M, Altman N. Error bars. Nature Methods. 2013;10(10):921–922. doi: 10.1038/nmeth.2659 24161969
49. Nursing Home Compare datasets; 2018. Available from: https://data.medicare.gov/data/nursing-home-compare.
50. Quartararo M, Glasziou P, Kerr CB. Classification Trees for Decision Making in Long-Term Care. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences. 1995;50A(6):M298–M302. doi: 10.1093/gerona/50A.6.M298
51. Alexander GL. An analysis of nursing home quality measures and staffing. Quality management in health care. 2008;17(3):242–51. doi: 10.1097/01.QMH.0000326729.78331.c5 18641507
52. Raju D, Su X, Patrician PA, Loan LA, McCarthy MS. Exploring factors associated with pressure ulcers: A data mining approach. International Journal of Nursing Studies. 2015;52(1):102–111. doi: 10.1016/j.ijnurstu.2014.08.002 25192963
53. Nursing Home Quality Measures; 2019. Available from: https://nursinghomemeasures.com/.
54. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12(Oct):2825–2830.
55. Wasserstein RL, Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician. 2016;70(2):129–133. doi: 10.1080/00031305.2016.1154108
56. Leek J, McShane BB, Gelman A, Colquhoun D, Nuijten MB, Goodman SN. Five ways to fix statistics. Nature. 2017;551(7682):557–559. doi: 10.1038/d41586-017-07522-z 29189798
57. Grissom RJ, Kim JJ. Effect Sizes for Research. 2nd ed. New York, NY: Routledge; 2011.
58. Fisher RA. The use of multiple measurements in taxonomic problems. Annals of Eugenics. 1936;7(2):179–188. doi: 10.1111/j.1469-1809.1936.tb02137.x
Článok vyšiel v časopise
PLOS One
2019 Číslo 10
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Těžké menstruační krvácení může značit poruchu krevní srážlivosti. Jaký management vyšetření a léčby je v takovém případě vhodný?
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Correction: Low dose naltrexone: Effects on medication in rheumatoid and seropositive arthritis. A nationwide register-based controlled quasi-experimental before-after study
- Combining CDK4/6 inhibitors ribociclib and palbociclib with cytotoxic agents does not enhance cytotoxicity
- Experimentally validated simulation of coronary stents considering different dogboning ratios and asymmetric stent positioning
- Risk factors associated with IgA vasculitis with nephritis (Henoch–Schönlein purpura nephritis) progressing to unfavorable outcomes: A meta-analysis