A non-parametric significance test to compare corpora

English version

Autoři: Alexander Koplenig ^aff001
Působiště autorů: Leibniz Institute for the German language (IDS), Mannheim, Germany ^aff001
Vyšlo v časopise: PLoS ONE 14(9)
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pone.0222703

Souhrn

Classical null hypothesis significance tests are not appropriate in corpus linguistics, because the randomness assumption underlying these testing procedures is not fulfilled. Nevertheless, there are numerous scenarios where it would be beneficial to have some kind of test in order to judge the relevance of a result (e.g. a difference between two corpora) by answering the question whether the attribute of interest is pronounced enough to warrant the conclusion that it is substantial and not due to chance. In this paper, I outline such a test.

Klíčová slova:

Biology and life sciences – Physical sciences – Research and analysis methods – Neuroscience – Cognitive science – Cognitive psychology – Psychology – Social sciences – Mathematics – Probability theory – Statistics – Mathematical and statistical techniques – Statistical methods – Statistical data – Discrete mathematics – Combinatorics – Permutation – Language – Linguistics – Semantics – Test statistics – Statistical distributions – Statistical inference – Sociolinguistics

Zdroje

1. Fisher RA. The statistical method in psychical research. Proc Soc Psych Res. 1929;39 : 189–192.

2. Amrhein V, Greenland S, McShane B. Retire statistical significance. Nature. 2019;567 : 305–307. doi: 10.1038/d41586-019-00857-9 30894741

3. Nuzzo R. Scientific method: Statistical errors. Nature. 2014;506 : 150–152. doi: 10.1038/506150a 24522584

4. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Stat. 2016; 00–00. doi: 10.1080/00031305.2016.1154108

5. Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA. Evolution of Reporting P Values in the Biomedical Literature, 1990–2015. JAMA. 2016;315 : 1141. doi: 10.1001/jama.2016.1952 26978209

6. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E-J, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2 : 6–10. doi: 10.1038/s41562-017-0189-z 30980045

7. Wasserstein RL, Schirm AL, Lazar NA. Moving to a World Beyond “p < 0.05.” Am Stat. 2019;73 : 1–19. doi: 10.1080/00031305.2019.1583913

8. Kmetz JL. Correcting Corrupt Research: Recommendations for the Profession to Stop Misuse of p -Values. Am Stat. 2019;73 : 36–45. doi: 10.1080/00031305.2018.1518271

9. McShane BB, Gal D, Gelman A, Robert C, Tackett JL. Abandon Statistical Significance. Am Stat. 2019;73 : 235–245. doi: 10.1080/00031305.2018.1527253

10. Trafimow D. Five Nonobvious Changes in Editorial Practice for Editors and Reviewers to Consider When Evaluating Submissions in a Post p < 0.05 Universe. Am Stat. 2019;73 : 340–345. doi: 10.1080/00031305.2018.1537888

11. Leek J, McShane BB, Colquhoun D, Nuijten MB, Goodman SN. Five ways to fix statistics. Nature. 2017;551 : 557–559. doi: 10.1038/d41586-017-07522-z 29189798

12. Goodman SN. Why is Getting Rid of P -Values So Hard? Musings on Science and Statistics. Am Stat. 2019;73 : 26–30. doi: 10.1080/00031305.2018.1558111

13. Hubbard DW, Carriquiry AL. Quality Control for Scientific Research: Addressing Reproducibility, Responsiveness, and Relevance. Am Stat. 2019;73 : 46–55. doi: 10.1080/00031305.2018.1543138

14. Tong C. Statistical Inference Enables Bad Science; Statistical Thinking Enables Good Science. Am Stat. 2019;73 : 246–261. doi: 10.1080/00031305.2018.1518264

15. Amrhein V, Trafimow D, Greenland S. Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. Am Stat. 2019;73 : 262–270. doi: 10.1080/00031305.2018.1543137

16. Campbell H, Gustafson P. The World of Research Has Gone Berserk: Modeling the Consequences of Requiring “Greater Statistical Stringency” for Scientific Publication. Am Stat. 2019;73 : 358–373. doi: 10.1080/00031305.2018.1555101

17. Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB, Alper S, et al. Many Labs 2: Investigating Variation in Replicability Across Sample and Setting. doi: 10.31234/osf.io/9654g

18. Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349: aac4716–aac4716. doi: 10.1126/science.aac4716 26315443

19. Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016;351 : 1433–1436. doi: 10.1126/science.aaf0918 26940865

20. Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J, Johannesson M, et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav. 2018;2 : 637–644. doi: 10.1038/s41562-018-0399-z 31346273

21. Ioannidis JPA. What Have We (Not) Learnt from Millions of Scientific Papers with P Values? Am Stat. 2019;73 : 20–25. doi: 10.1080/00031305.2018.1447512

22. McEnery T, Wilson A. Corpus linguistics. Edinburgh: Edinburgh University Press; 1996.

23. Koplenig A. Against statistical significance testing in corpus linguistics. Corpus Linguist Linguist Theory. 2017;0. doi: 10.1515/cllt-2016-0036

24. Dunning T. Accurate Methods for the Statistics of Surprise and Coincidence. Comput Linguist. 1993;19 : 61–74.

25. Rayson P, Garside R. Comparing Corpora Using Frequency Profiling. Proceedings of the Workshop on Comparing Corpora—Volume 9. Stroudsburg, PA, USA: Association for Computational Linguistics; 2000. pp. 1–6. doi: 10.3115/1117729.1117730

26. Kilgarriff A. Language is never, ever, ever, random. Corpus Linguist Linguist Theory. 2005;1. doi: 10.1515/cllt.2005.1.2.263

27. Paquot M, Bestgen Y. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In: Jucker AH, Schreier D, Hundt M, editors. Corpora: Pragmatics and Discourse. Brill; 2009. pp. 247–269. doi: 10.1163/9789042029101_014

28. Brezina V, Meyerhoff M. Significant or random?: A critical review of sociolinguistic generalisations based on large corpora. Int J Corpus Linguist. 2014;19 : 1–28. doi: 10.1075/ijcl.19.1.01bre

29. Rieger B. Repräsentativität: von der Unangemessenheit eines Begriffs zur Kennzeichnung eines Problems linguistischer Korpusbildung. In: Bergenholtz H, Schaeder B, editors. Empirische Textwissenschaft Aufbau und Auswertung von Text-Corpora. Königssteing/ Taunus: Scriptor; 1979. pp. 52–70. Available: http://www.uni-trier.de/fileadmin/fb2/LDV/Rieger/Publikationen/Aufsaetze/79/rub79.html

30. Fahrmeir L, Künstler R, Pigeot I, Tutz G. Statistik: der Weg zur Datenanalyse; mit 34 Tabellen. Berlin [u.a.]: Springer; 2001.

31. Johnson RA. Statistical Inference. In: Lovric M, editor. International Encyclopedia of Statistical Science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2011. pp. 1418–1420. doi: 10.1007/978-3-642-04898-2_545

32. Schneider JW. Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations. Scientometrics. 2015;102 : 411–432. doi: 10.1007/s11192-014-1251-5

33. Schneider JW. Caveats for using statistical significance tests in research assessments. CoRR. 2013;abs/1112.2516.

34. Bornstein MH, Jager J, Putnick DL. Sampling in developmental science: Situations, shortcomings, solutions, and standards. Dev Rev. 2013;33 : 357–370. doi: 10.1016/j.dr.2013.08.003 25580049

35. Shaver JP. What Statistical Significance Testing Is, and What It Is Not. J Exp Educ. 1993;61 : 293–316.

36. Glass GV, Hopkins KD. Statistical methods in education and psychology. 3rd ed. Boston: Allyn and Bacon; 1996.

37. Berk RA, Freedman DA. Statistical assumptions as empirical commitments. In: Messinger SL, Blomberg TG, Cohen S, editors. Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger,. 2nd ed. New York: Aldine de Gruyter; 2003. Available: http://www.stat.berkeley.edu/~census/berk2.pdf

38. Carver RP. The Case against Statistical Significance Testing, Revisited. J Exp Educ. 1993;61 : 287–292.

39. Angrist JD, Pischke J-S. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton NJ: Princeton University Press; 2008.

40. Diekmann A. Empirische Sozialforschung: Grundlagen, Methoden, Anwendungen. 8th ed. Reinbek: Rowohlt Taschenbuch Verlag; 2002.

41. Oakes MP. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press; 1998.

42. Evert S. How random is a corpus? The library metaphor. Z Für Angl Am. 2006;54 : 177–190.

43. Kohnen T. From Helsinki through the centuries: The design and development of English diachronic corpora." In: Towards Multimedia in Corpus Studies. In: Pahta P, Taavitsainen I, Nevalainen T, Tyrkkö J, editors. Helsinki: Research Unit for Variation, Contacts and Change in English. 2007. Available: http://www.helsinki.fi/varieng/journal/volumes/02/kohnen

44. Leech G. New resources, or just better old ones? The Holy Grail of representativeness. In: Hundt M, Nesselhauf N, Biewer C, editors. Corpus Linguistics and the Web. Rodopi; 2007. pp. 133–149.

45. Gries STh. Basic significance testing. In: Podesva RJ, Sharma D, editors. Research methods in linguistics. Cambridge: Cambridge Univ. Press; 2013. pp. 316–336.

46. Biber D, Jones JK. 61. Quantitative methods in corpus linguistics. In: Lüdeling A, Kytö M, editors. Handbooks of Linguistics and Communication Science. Berlin, New York: Mouton de Gruyter; 2009. doi: 10.1515/9783110213881.2.1286

47. Köhler R. Quantitative syntax analysis. Berlin: De Gruyter Mouton; 2012.

48. Baroni M, Evert S. Statistical methods for corpus exploitation. In: Lüdeling A, Kytö M, editors. Corpus linguistics: An international handbook. Berlin: De Gruyter Mouton; 2009. pp. 777–802.

49. Baroni M, Evert S. Statistical methods for corpus exploitation. In: Lüdeling A, Kytö M, editors. Corpus linguistics: An international handbook. Berlin: De Gruyter Mouton; 2009. pp. 777–802.

50. Biber D. Representativeness in Corpus Design. Lit Linguist Comput. 1993;8 : 243–257. doi: 10.1093/llc/8.4.243

51. Perkuhn R, Keibel H, Kupietz M. Korpuslinguistik. Paderborn: Fink; 2012.

52. Francis WN. A Standard Corpus of Edited Present-Day American English. Coll Engl. 1965;26 : 267–273.

53. Burnard L, editor. [bnc] British National Corpus [Internet]. 2007. Available: http://www.natcorp.ox.ac.uk/docs/URG/

54. Davies M. The Corpus of Historical American English: 400 million words, 1810–2009 [Internet]. 2010. Available: http://corpus.byu.edu/coha/

55. Shakespeare W. The Complete Works of William Shakespeare [Internet]. 1994. Available: http://www.gutenberg.org/ebooks/100

56. Austen J. The Complete Project Gutenberg Works of Jane Austen [Internet]. 2010. Available: http://www.gutenberg.org/ebooks/31100

57. Schmid H. Probabilistic Part-of-Speech Tagging Using Decision Trees. International Conference on New Methods in Language Processing. Manchester, UK; 1994. pp. 44–49.

58. Baayen RH. Word Frequency Distributions. Dordrecht: Kluwer Academic Publishers; 2001.

59. Tweedie FJ, Baayen RH. How Variable May a Constant be? Measures of Lexical Richness in Perspective. Comput Humanit. 1998;32 : 323–352.

60. Johnson W. I. A program of research. Psychol Monogr. 1944;56 : 1–15. doi: 10.1037/h0093508

61. Rodgers JL, Nicewander WA. Thirteen Ways to Look at the Correlation Coefficient. Am Stat. 1988;42 : 59. doi: 10.2307/2685263

62. Freedman DA, Lane D. Significance testing in a nonstochastic setting. A Festschrift for Erich L Lehmann. Belmont, CA: Wadsworth; 1983. pp. 185–208.

63. Lijffijt J, Nevalainen T, Säily T, Papapetrou P, Puolamaki K, Mannila H. Significance testing of word frequencies in corpora. Digit Scholarsh Humanit. 2014; doi: 10.1093/llc/fqu064

64. Winkler AM, Ridgway GR, Webster MA, Smith SM, Nichols TE. Permutation inference for the general linear model. NeuroImage. 2014;92 : 381–397. doi: 10.1016/j.neuroimage.2014.01.060 24530839

65. Koplenig A. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers [Internet]. under review. Available: https://www.dropbox.com/s/2pq4kejioumtjt9/DRAFT.docx?dl=0 doi: 10.1098/rsos.181274 30891265

66. Good PI. Permutation, parametric and bootstrap tests of hypotheses. 3rd ed. New York: Springer; 2005.

67. Fisher RA. The design of experiments. Oxford, England: Oliver & Boyd; 1935.

68. Pitman EJG. Significance Tests Which May be Applied to Samples From any Populations. Suppl J R Stat Soc. 1937;4 : 119–130. doi: 10.2307/2984124

69. Fisher RA. “The Coefficient of Racial Likeness” and the Future of Craniometry. J R Anthropol Inst G B Irel. 1936;66 : 57. doi: 10.2307/2844116

70. Wald A, Wolfowitz J. Statistical Tests Based on Permutations of the Observations. Ann Math Stat. 1944;15 : 358–372. doi: 10.1214/aoms/1177731207

71. Freedman DA, Lane D. A Nonstochastic Interpretation of Reported Significance Levels. J Bus Econ Stat. 1983;1 : 292. doi: 10.2307/1391660

72. Gries STh. Dispersions and adjusted frequencies in corpora. Int J Corpus Linguist. 2008;13 : 403–437. doi: 10.1075/ijcl.13.4.02gri

73. Clopper CJ, Pearson ES. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 1934;26 : 404. doi: 10.2307/2331986

74. Gandy A. Sequential Implementation of Monte Carlo Tests With Uniformly Bounded Resampling Risk. J Am Stat Assoc. 2009;104 : 1504–1511. doi: 10.1198/jasa.2009.tm08368

75. Satterthwaite FE. An Approximate Distribution of Estimates of Variance Components. Biom Bull. 1946;2 : 110. doi: 10.2307/3002019

76. Winkler AM, Webster MA, Vidaurre D, Nichols TE, Smith SM. Multi-level block permutation. NeuroImage. 2015;123 : 253–268. doi: 10.1016/j.neuroimage.2015.05.092 26074200

77. Levin B, Robbins H. Urn Models for Regression Analysis, with Applications to Employment Discrimination Studies. Law Contemp Probl. 1983;46 : 247–267. doi: 10.2307/1191601

78. Gail MH, Tan WY, Piantadosi S. Tests for No Treatment Effect in Randomized Clinical Trials. Biometrika. 1988;75 : 57. doi: 10.2307/2336434

79. Anderson MJ, Legendre P. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J Stat Comput Simul. 1999;62 : 271–303. doi: 10.1080/00949659908811936

80. Still AW, White AP. The approximate randomization test as an alternative to the F test in analysis of variance. Br J Math Stat Psychol. 1981;34 : 243–252. doi: 10.1111/j.2044-8317.1981.tb00634.x

81. Koplenig A. Language structure is influenced by the number of speakers but seemingly not by the proportion of non-native speakers. R Soc Open Sci. 2019;6 : 181274. doi: 10.1098/rsos.181274 30891265

A non-parametric significance test to compare corpora

Souhrn

Klíčová slova:

Zdroje

PLOS One