A non-parametric significance test to compare corpora
Autoři:
Alexander Koplenig aff001
Působiště autorů:
Leibniz Institute for the German language (IDS), Mannheim, Germany
aff001
Vyšlo v časopise:
PLoS ONE 14(9)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0222703
Souhrn
Classical null hypothesis significance tests are not appropriate in corpus linguistics, because the randomness assumption underlying these testing procedures is not fulfilled. Nevertheless, there are numerous scenarios where it would be beneficial to have some kind of test in order to judge the relevance of a result (e.g. a difference between two corpora) by answering the question whether the attribute of interest is pronounced enough to warrant the conclusion that it is substantial and not due to chance. In this paper, I outline such a test.
Klíčová slova:
Biology and life sciences – Physical sciences – Research and analysis methods – Neuroscience – Cognitive science – Cognitive psychology – Psychology – Social sciences – Mathematics – Probability theory – Statistics – Mathematical and statistical techniques – Statistical methods – Statistical data – Discrete mathematics – Combinatorics – Permutation – Language – Linguistics – Semantics – Test statistics – Statistical distributions – Statistical inference – Sociolinguistics
Zdroje
1. Fisher RA. The statistical method in psychical research. Proc Soc Psych Res. 1929;39: 189–192.
2. Amrhein V, Greenland S, McShane B. Retire statistical significance. Nature. 2019;567: 305–307. doi: 10.1038/d41586-019-00857-9 30894741
3. Nuzzo R. Scientific method: Statistical errors. Nature. 2014;506: 150–152. doi: 10.1038/506150a 24522584
4. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Stat. 2016; 00–00. doi: 10.1080/00031305.2016.1154108
5. Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990–2015. JAMA. 2016;315: 1141. doi: 10.1001/jama.2016.1952 26978209
6. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E-J, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2: 6–10. doi: 10.1038/s41562-017-0189-z 30980045
7. Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p < 0.05.” Am Stat. 2019;73: 1–19. doi: 10.1080/00031305.2019.1583913
8. Kmetz JL. Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of p -Values. Am Stat. 2019;73: 36–45. doi: 10.1080/00031305.2018.1518271
9. McShane BB, Gal D, Gelman A, Robert C, Tackett JL. Abandon Statistical Significance. Am Stat. 2019;73: 235–245. doi: 10.1080/00031305.2018.1527253
10. Trafimow D. Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post p < 0.05 Universe. Am Stat. 2019;73: 340–345. doi: 10.1080/00031305.2018.1537888
11. Leek J, McShane BB, Colquhoun D, Nuijten MB, Goodman SN. Five ways to fix statistics. Nature. 2017;551: 557–559. doi: 10.1038/d41586-017-07522-z 29189798
12. Goodman SN. Why is Getting Rid of P -Values So Hard? Musings on Science and Statistics. Am Stat. 2019;73: 26–30. doi: 10.1080/00031305.2018.1558111
13. Hubbard DW, Carriquiry AL. Quality Control for Scientific Research: Addressing Reproducibility, Responsiveness, and Relevance. Am Stat. 2019;73: 46–55. doi: 10.1080/00031305.2018.1543138
14. Tong C. Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science. Am Stat. 2019;73: 246–261. doi: 10.1080/00031305.2018.1518264
15. Amrhein V, Trafimow D, Greenland S. Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. Am Stat. 2019;73: 262–270. doi: 10.1080/00031305.2018.1543137
16. Campbell H, Gustafson P. The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication. Am Stat. 2019;73: 358–373. doi: 10.1080/00031305.2018.1555101
17. Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB, Alper S, et al. Many Labs 2: Investigating Variation in Replicability Across Sample and Setting. doi: 10.31234/osf.io/9654g
18. Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349: aac4716–aac4716. doi: 10.1126/science.aac4716 26315443
19. Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016;351: 1433–1436. doi: 10.1126/science.aaf0918 26940865
20. Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J, Johannesson M, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018;2: 637–644. doi: 10.1038/s41562-018-0399-z 31346273
21. Ioannidis JPA. What Have We (Not) Learnt from Millions of Scientific Papers with P Values? Am Stat. 2019;73: 20–25. doi: 10.1080/00031305.2018.1447512
22. McEnery T, Wilson A. Corpus linguistics. Edinburgh: Edinburgh University Press; 1996.
23. Koplenig A. Against statistical significance testing in corpus linguistics. Corpus Linguist Linguist Theory. 2017;0. doi: 10.1515/cllt-2016-0036
24. Dunning T. Accurate Methods for the Statistics of Surprise and Coincidence. Comput Linguist. 1993;19: 61–74.
25. Rayson P, Garside R. Comparing Corpora Using Frequency Profiling. Proceedings of the Workshop on Comparing Corpora—Volume 9. Stroudsburg, PA, USA: Association for Computational Linguistics; 2000. pp. 1–6. doi: 10.3115/1117729.1117730
26. Kilgarriff A. Language is never, ever, ever, random. Corpus Linguist Linguist Theory. 2005;1. doi: 10.1515/cllt.2005.1.2.263
27. Paquot M, Bestgen Y. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In: Jucker AH, Schreier D, Hundt M, editors. Corpora: Pragmatics and Discourse. Brill; 2009. pp. 247–269. doi: 10.1163/9789042029101_014
28. Brezina V, Meyerhoff M. Significant or random?: A critical review of sociolinguistic generalisations based on large corpora. Int J Corpus Linguist. 2014;19: 1–28. doi: 10.1075/ijcl.19.1.01bre
29. Rieger B. Repräsentativität: von der Unangemessenheit eines Begriffs zur Kennzeichnung eines Problems linguistischer Korpusbildung. In: Bergenholtz H, Schaeder B, editors. Empirische Textwissenschaft Aufbau und Auswertung von Text-Corpora. Königssteing/ Taunus: Scriptor; 1979. pp. 52–70. Available: http://www.uni-trier.de/fileadmin/fb2/LDV/Rieger/Publikationen/Aufsaetze/79/rub79.html
30. Fahrmeir L, Künstler R, Pigeot I, Tutz G. Statistik: der Weg zur Datenanalyse; mit 34 Tabellen. Berlin [u.a.]: Springer; 2001.
31. Johnson RA. Statistical Inference. In: Lovric M, editor. International Encyclopedia of Statistical Science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011. pp. 1418–1420. doi: 10.1007/978-3-642-04898-2_545
32. Schneider JW. Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics. 2015;102: 411–432. doi: 10.1007/s11192-014-1251-5
33. Schneider JW. Caveats for using statistical significance tests in research assessments. CoRR. 2013;abs/1112.2516.
34. Bornstein MH, Jager J, Putnick DL. Sampling in developmental science: Situations, shortcomings, solutions, and standards. Dev Rev. 2013;33: 357–370. doi: 10.1016/j.dr.2013.08.003 25580049
35. Shaver JP. What Statistical Significance Testing Is, and What It Is Not. J Exp Educ. 1993;61: 293–316.
36. Glass GV, Hopkins KD. Statistical methods in education and psychology. 3rd ed. Boston: Allyn and Bacon; 1996.
37. Berk RA, Freedman DA. Statistical assumptions as empirical commitments. In: Messinger SL, Blomberg TG, Cohen S, editors. Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger,. 2nd ed. New York: Aldine de Gruyter; 2003. Available: http://www.stat.berkeley.edu/~census/berk2.pdf
38. Carver RP. The Case against Statistical Significance Testing, Revisited. J Exp Educ. 1993;61: 287–292.
39. Angrist JD, Pischke J-S. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton NJ: Princeton University Press; 2008.
40. Diekmann A. Empirische Sozialforschung: Grundlagen, Methoden, Anwendungen. 8th ed. Reinbek: Rowohlt Taschenbuch Verlag; 2002.
41. Oakes MP. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press; 1998.
42. Evert S. How random is a corpus? The library metaphor. Z Für Angl Am. 2006;54: 177–190.
43. Kohnen T. From Helsinki through the centuries: The design and development of English diachronic corpora." In: Towards Multimedia in Corpus Studies. In: Pahta P, Taavitsainen I, Nevalainen T, Tyrkkö J, editors. Helsinki: Research Unit for Variation, Contacts and Change in English. 2007. Available: http://www.helsinki.fi/varieng/journal/volumes/02/kohnen
44. Leech G. New resources, or just better old ones? The Holy Grail of representativeness. In: Hundt M, Nesselhauf N, Biewer C, editors. Corpus Linguistics and the Web. Rodopi; 2007. pp. 133–149.
45. Gries STh. Basic significance testing. In: Podesva RJ, Sharma D, editors. Research methods in linguistics. Cambridge: Cambridge Univ. Press; 2013. pp. 316–336.
46. Biber D, Jones JK. 61. Quantitative methods in corpus linguistics. In: Lüdeling A, Kytö M, editors. Handbooks of Linguistics and Communication Science. Berlin, New York: Mouton de Gruyter; 2009. doi: 10.1515/9783110213881.2.1286
47. Köhler R. Quantitative syntax analysis. Berlin: De Gruyter Mouton; 2012.
48. Baroni M, Evert S. Statistical methods for corpus exploitation. In: Lüdeling A, Kytö M, editors. Corpus linguistics: An international handbook. Berlin: De Gruyter Mouton; 2009. pp. 777–802.
49. Baroni M, Evert S. Statistical methods for corpus exploitation. In: Lüdeling A, Kytö M, editors. Corpus linguistics: An international handbook. Berlin: De Gruyter Mouton; 2009. pp. 777–802.
50. Biber D. Representativeness in Corpus Design. Lit Linguist Comput. 1993;8: 243–257. doi: 10.1093/llc/8.4.243
51. Perkuhn R, Keibel H, Kupietz M. Korpuslinguistik. Paderborn: Fink; 2012.
52. Francis WN. A Standard Corpus of Edited Present-Day American English. Coll Engl. 1965;26: 267–273.
53. Burnard L, editor. [bnc] British National Corpus [Internet]. 2007. Available: http://www.natcorp.ox.ac.uk/docs/URG/
54. Davies M. The Corpus of Historical American English: 400 million words, 1810–2009 [Internet]. 2010. Available: http://corpus.byu.edu/coha/
55. Shakespeare W. The Complete Works of William Shakespeare [Internet]. 1994. Available: http://www.gutenberg.org/ebooks/100
56. Austen J. The Complete Project Gutenberg Works of Jane Austen [Internet]. 2010. Available: http://www.gutenberg.org/ebooks/31100
57. Schmid H. Probabilistic Part-of-Speech Tagging Using Decision Trees. International Conference on New Methods in Language Processing. Manchester, UK; 1994. pp. 44–49.
58. Baayen RH. Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers; 2001.
59. Tweedie FJ, Baayen RH. How Variable May a Constant be? Measures of Lexical Richness in Perspective. Comput Humanit. 1998;32: 323–352.
60. Johnson W. I. A program of research. Psychol Monogr. 1944;56: 1–15. doi: 10.1037/h0093508
61. Rodgers JL, Nicewander WA. Thirteen Ways to Look at the Correlation Coefficient. Am Stat. 1988;42: 59. doi: 10.2307/2685263
62. Freedman DA, Lane D. Significance testing in a nonstochastic setting. A Festschrift for Erich L Lehmann. Belmont, CA: Wadsworth; 1983. pp. 185–208.
63. Lijffijt J, Nevalainen T, Säily T, Papapetrou P, Puolamaki K, Mannila H. Significance testing of word frequencies in corpora. Digit Scholarsh Humanit. 2014; doi: 10.1093/llc/fqu064
64. Winkler AM, Ridgway GR, Webster MA, Smith SM, Nichols TE. Permutation inference for the general linear model. NeuroImage. 2014;92: 381–397. doi: 10.1016/j.neuroimage.2014.01.060 24530839
65. Koplenig A. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers [Internet]. under review. Available: https://www.dropbox.com/s/2pq4kejioumtjt9/DRAFT.docx?dl=0 doi: 10.1098/rsos.181274 30891265
66. Good PI. Permutation, parametric and bootstrap tests of hypotheses. 3rd ed. New York: Springer; 2005.
67. Fisher RA. The design of experiments. Oxford, England: Oliver & Boyd; 1935.
68. Pitman EJG. Significance Tests Which May be Applied to Samples From any Populations. Suppl J R Stat Soc. 1937;4: 119–130. doi: 10.2307/2984124
69. Fisher RA. “The Coefficient of Racial Likeness” and the Future of Craniometry. J R Anthropol Inst G B Irel. 1936;66: 57. doi: 10.2307/2844116
70. Wald A, Wolfowitz J. Statistical Tests Based on Permutations of the Observations. Ann Math Stat. 1944;15: 358–372. doi: 10.1214/aoms/1177731207
71. Freedman DA, Lane D. A Nonstochastic Interpretation of Reported Significance Levels. J Bus Econ Stat. 1983;1: 292. doi: 10.2307/1391660
72. Gries STh. Dispersions and adjusted frequencies in corpora. Int J Corpus Linguist. 2008;13: 403–437. doi: 10.1075/ijcl.13.4.02gri
73. Clopper CJ, Pearson ES. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 1934;26: 404. doi: 10.2307/2331986
74. Gandy A. Sequential Implementation of Monte Carlo Tests With Uniformly Bounded Resampling Risk. J Am Stat Assoc. 2009;104: 1504–1511. doi: 10.1198/jasa.2009.tm08368
75. Satterthwaite FE. An Approximate Distribution of Estimates of Variance Components. Biom Bull. 1946;2: 110. doi: 10.2307/3002019
76. Winkler AM, Webster MA, Vidaurre D, Nichols TE, Smith SM. Multi-level block permutation. NeuroImage. 2015;123: 253–268. doi: 10.1016/j.neuroimage.2015.05.092 26074200
77. Levin B, Robbins H. Urn Models for Regression Analysis, with Applications to Employment Discrimination Studies. Law Contemp Probl. 1983;46: 247–267. doi: 10.2307/1191601
78. Gail MH, Tan WY, Piantadosi S. Tests for No Treatment Effect in Randomized Clinical Trials. Biometrika. 1988;75: 57. doi: 10.2307/2336434
79. Anderson MJ, Legendre P. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J Stat Comput Simul. 1999;62: 271–303. doi: 10.1080/00949659908811936
80. Still AW, White AP. The approximate randomization test as an alternative to the F test in analysis of variance. Br J Math Stat Psychol. 1981;34: 243–252. doi: 10.1111/j.2044-8317.1981.tb00634.x
81. Koplenig A. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. R Soc Open Sci. 2019;6: 181274. doi: 10.1098/rsos.181274 30891265
Článok vyšiel v časopise
PLOS One
2019 Číslo 9
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Těžké menstruační krvácení může značit poruchu krevní srážlivosti. Jaký management vyšetření a léčby je v takovém případě vhodný?
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Graviola (Annona muricata) attenuates behavioural alterations and testicular oxidative stress induced by streptozotocin in diabetic rats
- CH(II), a cerebroprotein hydrolysate, exhibits potential neuro-protective effect on Alzheimer’s disease
- Comparison between Aptima Assays (Hologic) and the Allplex STI Essential Assay (Seegene) for the diagnosis of Sexually transmitted infections
- Assessment of glucose-6-phosphate dehydrogenase activity using CareStart G6PD rapid diagnostic test and associated genetic variants in Plasmodium vivax malaria endemic setting in Mauritania