Characterizing the Randot Preschool stereotest: Testability, norms, reliability, specificity and sensitivity in children aged 2-11 years

Authors: Jenny C. A. Read ^aff001; Sheima Rafiq ^aff001; Jess Hugill ^aff001; Therese Casanova ^aff001; Carla Black ^aff001; Adam O’Neill ^aff001; Vicente Puyat ^aff001; Helen Haggerty ^aff002; Kathryn Smart ^aff002; Christine Powell ^aff002; Kate Taylor ^aff002; Michael P. Clarke ^aff001; Kathleen Vancleef ^aff001
Authors place of work: Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, England, United Kingdom ^aff001; Newcastle Eye Centre, Royal Victoria Infirmary, Newcastle upon Tyne Hospitals NHS Trust, Newcastle upon Tyne, England, United Kingdom ^aff002
Published in the journal: PLoS ONE 14(11)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0224402

Summary

Purpose

To comprehensively assess the Randot Preschool stereo test in young children, including testability, normative values, test/retest reliability and sensitivity and specificity for detecting binocular vision disorders.

Methods

We tested 1005 children aged 2–11 years with the Randot Preschool stereo test, plus a cover/uncover test to detect heterotropia. Monocular visual acuity was assessed in both eyes using Keeler Crowded LogMAR visual acuity test for children aged 4 and over.

Results

Testability was very high: 65% in two-year-olds, 92% in three-year-olds and ~100% in older children. Normative values: In 389 children aged 2–5 with apparently normal vision, 6% of children scored nil (stereoblind). In those who obtained a threshold, the mean log threshold was 2.06 log₁₀ arcsec, corresponding to 114 arcsec, and the median threshold was 100 arcsec. Most older children score 40 arcsec, the best available score. We found a small sex difference, with girls scoring slightly but significantly better. Test/retest reliability: ~99% for obtaining any score vs nil. Agreement between stereo thresholds is poor in children aged 2–5; 95% limit of agreement = 0.7 log₁₀ arcsec: five-fold change in stereo threshold may occur without any change in vision. In children over 5, the test essentially acts only as a binary classifier since almost all non-stereoblind children score 40 arcsec. Specificity (true negative rate): >95%. Sensitivity (true positive rate): poor, <50%, i.e. around half of children with a demonstrable binocular vision abnormality score well on the Randot Preschool.

Conclusions

The Randot Preschool is extremely accessible for even very young children, and is very reliable at classifying children into those who have any stereo vision vs those who are stereoblind. However, its ability to quantify stereo vision is limited by poor repeatability in children aged 5 and under, and a very limited range of scores relevant to children aged over 5.

Keywords:

Schools – children – Normal distribution – Eyes – Vision – Visual acuity – Research validity – Binocular vision

Introduction

Stereotests assess binocular visual function by measuring the smallest depth difference between two adjacent surfaces which a person can detect purely by using their stereoscopic vision. This stereo threshold is largely independent of viewing distance when expressed in angular terms [1]. It is usually expressed in seconds of arc (1 arcsec = 1/3600 deg), and is often referred to as stereoacuity.

Stereoacuity is clinically important because stereoscopic vision is considered the “gold standard” of binocular vision [2], requiring good vision in both eyes, good oculomotor control as well as cortical neurons to combine the two eyes’ inputs and extract disparity [3]. Accordingly, stereoacuity is a primary or secondary outcome measure in interventions for strabismus and amblyopia [2,4–7], and is routinely measured when children are referred to eye clinics with these conditions.

Several stereo tests are commonly used in the clinic. These include the near Frisby, Frisby-Davis Distance, Lang, TNO, Randot and Preschool Randot stereotests, each with their own properties and (dis)advantages[3,8–10]. Of these, the Randot family of stereotests produced by the Stereo Optical Company (stereooptical.com) are the mostly widely used. The Randot stereotest is the clinical stereotest most often used in the USA and Canada [11], while a PubMed search indicates that Randot tests are also one of the most commonly used for research (the search "testname"[Title/Abstract] AND ((stereo*) OR stereopsis OR amblyopia OR binocular OR strabismus” on 26^th March 2019 returned 165 results for testname =“Randot”, as opposed to 252 for “Titmus”, 143 for “TNO”, 88 for “Frisby”, 83 for “Lang”).

In using any stereotest, it is important to know

testability, i.e. how many children at each age have the cognitive and other capacity to obtain a meaningful measure on the test.
the normative data, i.e. the distribution of values expected at different age-ranges for visually normal individuals.
the test/retest reliability, effectively a measure of the “error” on the test. This is particularly important where one wishes to track changes in stereoacuity over time, e.g. as a result of treatment. One needs to know when a given difference in scores reflects real change, and when it is consistent with the measurement error.
the sensitivity and specificity with which the test detects binocular visual problems. This is important since stereotests are sometimes included in screening programmes. A child who fails a stereotest will usually be referred for further investigation for a binocular visual problem such as strabismus or amblyopia, while a child who passes the stereotest (and other tests) may be assumed to have no need of referral. Thus it is important to know how many children can be expected to be referred unnecessarily on this basis (the false positive rate, or 1 –specificity) and how many children with binocular visual problems will be missed (the false negative rate, or 1 –sensitivity).

Various studies, summarised in Table 1, have assessed these for the Randot Preschool Stereotest[8,12–19]. However, no study has assessed all of them together, and estimates of reliability, sensitivity and specificity are not available for all age ranges. In this paper, we report these values for the Randot Preschool Stereotest, from a cohort of around a thousand children tested in 2016 in North East England. Comparing them to results from previously published studies, we find generally good agreement, confirming the consistency of the Randot Preschool in different populations. A specific contribution of our paper is a set of equations describing the probability of obtaining a given Randot Preschool stereoacuity as a function of age in years, rather than a simple mean and standard deviation, for visually normal children.

**Tab. 1. Summary of results from this and previous studies.**

Methods

Comparisons with previous studies

To compare our results with previous studies, we conducted a Pubmed search for “Preschool Randot”, on 6^th March 2019. This returned 85 citations, which we reviewed manually to find those which contained relevant data. We excluded any studies which contained data solely for ages over 10 years. For normative values, sensitivity and specificity, we also excluded any studies reporting values solely in a clinical population. We did include reliability measures assessed in clinical populations of the relevant ages. The results of our analysis are summarised in Table 1.

Participants

A total of 1005 children (488 boys, 517 girls) participated in the study. They were aged between 2 and 12 years old (numbers in each age-group are provided in the Results sections below). The children were recruited through local primary schools, preschools, nurseries, personal contacts, and at local science centres. Testing took place in schools, nurseries, at Newcastle University and at local science centres in the city of Newcastle upon Tyne in North East England. Children were targeted in four UK school year-groups: Nursery (2 -⁠ and 3-year-olds), Reception (4 -⁠ and 5-year-olds), Year 2 (6 -⁠ and 7-year-olds), Year 6 (10 -⁠ and 11-year-olds). UK school years include children born from September to August of the following year. The study included one 9-year-old who was grouped with the 10 -⁠ and 11-year-olds. For the Reception year group, our study was combined with the routine Orthoptic School Vision Screening programme, and only children who participated in the screening were eligible to participate. Participation in the Orthoptic School Vision Screening assured that all children were screened for visual problems and the recommended referral pathway was followed. For the other age groups, all children within the targeted ages were eligible to participate. For the three studies reported below (reliability, validity, and normative data), subsamples were used. The criteria and characteristics of the samples are detailed below in the relevant sections.

Ethics

The study protocol was compliant with the Declaration of Helsinki and was approved by the Ethics Committee of the Newcastle University Faculty of Medical Sciences (approval number 01078). All parents received an information leaflet about the study. For most testing at schools and nurseries, we used opt-out consent, where parents could return a form withdrawing their child from participation. Opt-out consent was approved by our ethics committee in order to ensure a representative sample [21,22]. If requested by the school or nursery and for testing sessions at Newcastle University and local science centres an opt-in consent procedure was used. Children were always asked for oral or non-verbal assent at the time of testing. Parents and children were informed about the results on standard vision screening test (Visual Acuity and Cover Test) and referred to an optometrist or orthoptists when failing either of these tests.

Data analysis

Data analysis and statistics were carried out using R (version 3.5.2, "Eggshell Igloo") in Rstudio (version 1.1.463), https://CRAN.R-project.org/. The R data files along with R markdown code to carry out all analysis and figures for this paper are available at https://doi.org/10.25405/data.ncl.9755045.

Study design and procedures

To evaluate validity and collect a normative data sample, we assessed visual impairments via a questionnaire, visual acuity with a Crowded logMAR test, performed a Cover Test, and a Preschool Randot stereoacuity test. To evaluate reliability a subsample of children was requested to participate in a second session. The second session only included the Randot Preschool stereotest.

Vision questionnaire

Parents of participating children were asked to provide information about their child’s eye sight: whether they needed glasses for near and / or far vision tasks, whether they were receiving patching of atropine treatment for amblyopia, and to report any other vision problems. In the case of schools and nurseries, this questionnaire was sent home with the child and we requested its return completed. Questionnaires were returned for around half of children (numbers specified in Results sections). To avoid sample bias, we did not exclude children for whom questionnaires were not returned [21,22]. If indicated on the questionnaire, we asked children to wear their glasses during testing. The full questionnaire is available in the Supplementary Material.

Visual acuity

Visual acuity was measured in participants tested in non-nursery settings (thus in almost all participants aged 4 years and over) with the Keeler Crowded LogMAR visual acuity test (Keeler Ltd, UK), which is the standard visual acuity test used across the UK [23]. In this test, participants identify or match letters of various sizes presented at a distance of 3 meters with one eye covered. First a screening card is presented to the child and the size of the letters is reduced until an error is made. Once the child answers incorrectly, the examiner starts with the test card two sizes above the last correctly identified letter. If the child is able to identify 2 or more letters on a line, then the next test card is presented. The examiner proceeds until the child is unable to correctly identify 2 letters or more out of the 4 on a line, then the examiner returns to the size above and completes that line. If the 0.800 letter was not seen at 3 meters, our protocol specified that the examiner would walk closer, adding log units to correct for the change in distance. However, all children examined scored at least 0.75 logMAR in both eyes, so this protocol was not used.

Virtually all Reception-year children (4 and 5-year-olds) in our study had their acuity measured as part of the regional Orthoptic School Vision Screening programme. In their protocol, no threshold visual acuity was obtained and the best possible acuity achievable was 0.2 logMAR. Children with a visual acuity above this value were referred to an optician or orthoptist.

Visual acuity measurement was not attempted in nursery settings, as the clinical co-authors advised that in their experience it was not feasible to obtain reliable results with children of this age (2 or 3 years) in such a setting in the available time.

Cover test

A cover test was performed to detect manifest strabismus [24]. In this test one eye in turn is covered with an occluder for a short moment while the participant fixates on a near object or penlight, or a distant object. The cover is then briefly removed and the eyes are observed to see if they move as the occluded eye acquires fixation of the test object. Movement of the unoccluded eye indicates heterotropia. A cover test was performed at near and at distance. The near cover test was always performed at 33 centimetres. The distance for the distance cover test depended on the size of the room that was available. For 42 children the distance for the distance cover test was not recorded. For the other children, the target for the distance cover test was shown at 3 to 6.9 meters away (mean = 4.9, SD = 1.2). Both visual acuity and cover tests were carried out by a qualified orthoptist, either from the study team (SKR) or by an orthoptist from the Newcastle upon Tyne Orthoptic School Vision Screening programme that runs for Reception year group children (4–5 years old).

Randot Preschool stereotest

Researchers administered the Randot Preschool Stereoacuity Test. The test consists of 3 pages. At the left-hand side of each page black-and-white silhouettes of everyday objects are presented. The right-hand side shows random dot patterns. In each set of four random-dot patterns one contains no object (is flat), while the remaining contain disparity-defined objects matching one of the silhouette objects presented on the left. The objects are only visible when wearing 3D polarized glasses. The participant has to identify the object in each random dot pattern or point to the matching object on the left page. The available levels are 800, 400, 200, 100, 60 and 40 arcsec. The test distance for Randot Preschool is not specified in the test’s own manual, nor in the test protocol described on the Pediatric Eye Disease Investigation Group (www.pedig.net), but we followed previous authors[8,12,13] in performing it at 40cm. Stereopsis was tested at 800 arcsec, 400 arcsec, 200 arcsec, 100 arcsec, 60 arcsec, and 40 arcsec, in that order, following a non-stereo pre-test to check understanding and cooperation. Note that we followed the protocol used in the PEDIG studies rather than that supplied with the test (ATS Miscellaneous Testing Procedures Manual downloaded from www.pedig.net). A lower disparity was shown only if the child could identify at least 2 out of 4 shapes correctly at the previous level. The final score was calculated as the lowest level measured at which 2 of more shapes were correctly identified.

Converting between threshold and log threshold

We prefer to work in terms of log-threshold (specifically, the common or decadic logarithm of the threshold in arcsec, measured in log₁₀ arcsec), since the distribution of log-threshold is closer to normal than the distribution of threshold itself [25–28]. Other workers in the field have reported statistics on the threshold itself.

This raises problems when comparing results, since the logarithm of the mean threshold is not the same as the mean of the log-threshold, as illustrated in Fig 1. However, if we assume that log-threshold is indeed distributed normally, then it is possible to derive formulae for converting between statistics on the thresholds themselves and on the log-thresholds. These formulae are provided in Table 2. Note that they require us to know two statistics about the distribution, e.g. both the mean and the SD. In this way, we were able to estimate mean and SD of the log-threshold for previous studies which reported the mean and SD of the threshold in arcsec (Table 1). This was not possible for studies which reported only the mean threshold.

Results and discussion

Selection bias with questionnaire

Most of our children were recruited via opt-out consent. Consent forms and study questionnaires were sent home with the children, but if they were not returned, children were still included in the study. We justified this procedure by the non-invasive nature of the study and the importance of avoiding selection bias. Our data-set permitted us to examine the likely effect of any such bias, by comparing children for whom questionnaires were or were not returned.

Sample

In this section, we used our whole cohort of 1005 children.

Results

Overall, questionnaires were returned for 54% of our children. Questionnaires were much more likely to be returned for younger children (e.g. 83% of two-year-olds vs 44% of 11-year-olds; p = 0.0001, logistic regression on age). There was no significant difference by sex. In every age-group, the mean visual acuity and stereoacuity were lower (i.e. better) in children for whom questionnaires were returned (Fig 2). This difference was significant in both cases (visual acuity: regression of visual acuity on log age with questionnaire as a categorical factor, p = 0.0008; stereoacuity: ordinal logistic regression (see below) with log age and questionnaire as factors, p = 0.04). For stereoacuity, the advantage of having parents who returned a questionnaire was equivalent to being 0.6 years older.

Discussion

We had imagined that parents might be more likely to return questionnaires for children who had a diagnosed eye condition such as amblyopia. With such an effect, vision would have tended to be worse in children with questionnaires. In fact, we find the opposite effect. The possible cause is beyond the scope of our study; nevertheless these results suggest that our sample would have been biased towards children with better vision if we had used opt-in consent.

Testability with Randot Preschool stereotest

Sample

For the testability analysis, we included all 1005 children. The data is provided in data-file RandotPreschoolTestability.RData and the analysis in this section can be recreated by running R markdown file AnalyseTestabilityData.Rmd (see Supplementary Material).

Results

Fig 3 and Table 3 shows the percentage of children who were testable in each age group. Testability rose from 65% in two-year-olds and 92% in three-year olds, to virtually 100% in older children. Slightly more girls than boys were testable in the youngest age-group, but this difference was not significant. Reasons for non-testability included failing the non-stereo pre-test (which requires naming simple luminance-defined black-on-white shapes), not being willing to wear the 3D glasses (including one two-year-old who burst into tears after putting them on!), and not understanding what they were being asked to do.

**Fig. 3. Testability on the Randot Preschool, by age-group and sex.**

Discussion

Our results agree with previous studies in finding virtually 100% testability in children aged 4 and up [14–17,29], but we find substantially higher testability in two-year-olds than previous studies: 65% compared to 31% [16], 32% [17], 33% [14], or 47% [15].

Test/retest reliability of Randot Preschool stereotest

Sample

A subset of children aged three and older were retested on Preschool Randot in a separate session within three weeks of the first test (max 21, mean 16 days apart). We compared results on the two sessions to calculate the test/retest reliability. For this analysis, we did not exclude any children because of visual conditions, but we did exclude children who were recorded as having worn optical correction on one session but not on the other, since we did not want to confound poor test reliability with changes in optical correction. We also excluded children who did not understand the test the first time, as we wanted to understand repeatability of results independent of changes in understanding. In total, this sample consisted of 182 children from 3 to 11 years old.

Results

We initially looked at the reliability of a binary pass/fail classification. We define “failing” as not passing the 800 arcsec level. In the ideal case where the test is 100% reliable and independent of cognitive effects or motivation, children should either pass the test on both sessions or fail on both. P

Fig 4 shows the proportion of children in each situation, classified by their age-group on the first session. Overall 96% of children passed Preschool Randot both times. Only 1% (2 children) failed on the second session having previously passed, and one of these, an 11-year-old, had obtained 800 arcsec on the first session. Thus, the Randot Preschool is extremely consistent at classifying children into those who do versus those who do not have any demonstrable stereo vision.

Fig. 4. 187 children were tested in two sessions (22 three-year-olds, 50 four-year-olds,27 five-year-olds, 48 aged 6 or 7 and 40 aged 10 and 11) and on each session we classify them as passing/failing the stereotest.

To assess agreement in more detail, we first examined the correlation between scores on the two tests. The Spearman correlation coefficient, which examines the ranking of scores, was 0.59. To compute the Pearson correlation, following previous workers[18], we first replaced a “nil” score with a notional level of 1600 arcsec, i.e. one log-level up from the highest available score of 800 arcsec. The Pearson correlation coefficient between the log-thresholds on the two sessions was 0.62. Both correlations were extremely significant (p<10⁻¹⁰).

We then carried out a Bland-Altman analysis, shown in Fig 5. Following previous workers [8,18,19], we analysed log-thresholds rather than thresholds, since these are closer to normally distributed. The mean difference between results on the two sessions was -0.114 log-arcsec or a factor of 0.77, and this was significantly different from zero (p<10⁻⁵). Thus, children tended to obtain a better score on the second session, presumably due to practice effects.

**Fig. 5. Difference between log-thresholds in the two sessions (second minus first) plotted against means, for 187 children aged 3–11 years.**

Since this improvement is relatively small compared with the variability between the two sessions (Fig 5), we will neglect it in computing the reliability. We quantify reliability using the Bland-Altman 95% limit of agreement, where a value of L means that one can be 95% confident that the result of a second test will lie within ± L of the first. For a normal distribution, this corresponds to 1.96 times the standard deviation of the differences. In our Preschool Randot data, this gives L = 0.63 log₁₀ arcsec, corresponding to a factor of 4.3 in thresholds. For example, if a children scores 200 arcsec in the first session, their score in the second session could be between 50 arcsec and 800 arcsec, without any change in their binocular vision.

One can also compute the 95% confidence interval on the estimate L. We follow Bland & Altman’s (1986) recipe for this, estimating the standard error on the limit of agreement as √(3s²/n), where s is the standard deviation of the differences and n is the sample size. We then estimate the 95% confidence interval as the original estimate ± t times the standard error, where t is the t-statistic corresponding to the 95% confidence interval (1.96 for an infinite sample). In this way we estimate a 95% confidence interval of 0.55 to 0.71 log₁₀ arcsec (factors of 3.5 to 5.2).

Fig 6 shows the reliability as a function of age. Values are given in Table 4. Reliability is similar between ages 3–7, but better in the oldest age-group, 10 & 11 year old. A linear regression of absolute difference in log-thresholds against age, with sex as a covariate, revealed a highly significant decrease in absolute difference with age (p<0.058). Absolute differences were slightly higher for boys, though this was not significant (offset = 0.074, p = 0.058).

**Fig. 6. Differences in Preschool Randot thresholds (second session minus first session) are plotted by age-group for 182 children.**

**Tab. 4. 95% limits of reliability and 95% confidence interval by age-group.**

Discussion

Our estimate of the reliability of Preschool Randot is very similar to previous estimates, with the 95% limits of repeatability being ±0.64 log₁₀ arcsec or a factor of 4.3 in threshold. Fawcett & Birch [18], in 102 children aged 2–12 years, obtained exactly the same value as us: 0.64 log₁₀ arcsec. Adams et al[8] report a similar value of ±0.60 log₁₀ arcsec (a factor of 4) with 19 children aged 7–18, while Smith et al obtained a slightly lower value, 0.46 log₁₀ arcsec (a factor of 3), in 47 people aged 3 to 80 years. The differences likely reflect an improvement in reliability with age.

A previous study [18] concluded there was no change in test/retest reliability over the range 3–12 years. However, this conclusion was based on a linear regression and t-test on the differences themselves, rather than the absolute difference. This is in fact testing for a change in the bias (i.e. the mean of the differences), rather than in the reliability. An increase in reliability with age would be expected to decrease the variance of the difference in the scores obtained on two sessions, without changing its mean. This is why we did our linear regression on the absolute values of the differences. An F-test confirms that there is a highly significant decrease in the variance of differences in log-thresholds with age. In our data, this variance is 2.38 times higher in children under 5 than in those aged 5 and over, and in study [18] the figure is actually 3.00 times higher (data read off from their Fig 3 [18]); p<10⁻³ for both.

However, this apparent improvement in reliability with age may in fact reflect improved stereo thresholds combined with a floor effect in the scores available. As reported in the section on Normative values, almost all older children obtain the best possible threshold of 40 arcsec on the Randot Preschool. Suppose that the reliability is in fact a factor of 4 at all ages. A young child whose true threshold is 200 arcsec may obtain 400 arcsec on one session and 100 on the next. Yet an older child whose true threshold is 20 arcsec will obtain 40 arcsec on both sessions, simply because a score of 10 arcsec is not available. Reliability will appear higher for the older child but this will be a side-effect of their improved stereo, combined with the available scores. The Randot Preschool was not designed to assess genuine changes in the repeatability of stereo thresholds in general over this age range.

Normative values of Randot Preschool stereotest

Sample

We next investigated the distribution of stereo thresholds obtained with the Randot Preschool in children who, as far as we could tell, had normal vision. To this end, we excluded participants who failed a cover test, or in whom cover test data were not available. We also excluded participants whose parents reported that they were diagnosed or treated for amblyopia or strabismus, or under assessment in an eye clinic for suspected vision problems, or who when tested were not wearing glasses when their parents reported that they needed glasses; but we did not exclude children for whom parental questionnaires were not available. For children aged 4 and over, we also excluded participants for whom visual acuity data were not available, or whose visual acuity was worse than 0.2 logMAR in either eye, or whose interocular acuity difference exceeded 0.2 logMAR. Table 5 summarises the visual acuity of our normative sample. In the 2 and 3 year-olds, who were tested in nurseries, visual acuity data was not available. Our “normative” sample may therefore include some children in this age-range with undiagnosed poor vision. We also excluded any children who were not testable with Randot because they refused to wear the glasses or did not cooperate with the test in another way (see section on Testability for age breakdown). The remaining 826 participants were 402 boys and 424 girls aged between 2.00 and 11.6 years (Fig 7 and Table 5). Analysis code is provided in AnalyseNormativeData.Rmd using data-file RandotPreschoolNormative.RData (Supplementary Material).

**Tab. 5. Visual acuity in the two older age-groups of our normative sample.**

Results

Fig 8 and Table 6 report the distribution of Randot Preschool scores in each age-group. There is a clear improvement in stereoacuity with age. Additionally, scores are consistently slightly better for girls. To quantify this, we carried out an ordinal logistic regression, using function polr from R package MASS [30]. The main effect of age and sex were both significant (p<10⁻¹⁰ for age and p = 0.002 for sex). The better performance of girls was most pronounced in the 2 and 3 year-olds, where visual acuity was not measured, but remained significant even if only children in the two older age-groups (over 5 years) are considered.

**Fig. 8. Distribution of normative Randot Preschool stereo thresholds in different age-groups, for children with normal vision, separated by sex.**

**Tab. 6. Randot Preschool stereo thresholds by age-group, for children with normal vision who were judged as being able to understand and cooperate with the test.**

In the ordinal logistic regression model, the probability of obtaining a stereo threshold equal to or better than a particular Randot Preschool level L is modelled as

where R is score on Randot Preschool stereotest, A is log age in log₁₀(years), M is a categorical variable specifying sex (M = 1 for males and M = 0 for females), β is a fitted parameter describing the effect of age A, γ is a fitted parameter describing the effect of sex, and the six fitted parameters α_L depend on the level L (Table 7). For our data, the 8 fitted parameters (the regression slope for age and gender plus 6 coefficients describing the transitions between the 7 different Randot levels) were as specified in Table 7. The predicted probability of obtaining each Randot Preschool score as a function of age A in years is then:

Discussion

As it was designed to, the Randot Preschool contains a range of test values suitable for assessing threshold stereoacuity in pre-school children (aged 2–5 years). For older children, most visually normal children will obtain the best available score, and so the test is more appropriate as a screening tool.

Normative values from ours and other studies are reported in Table 1. Our values are a little higher than in previous studies [12,13,15]. A potential reason could be that, as described in the Methods, we followed the PEDIG protocol for the Randot Preschool rather than the manufacturer’s instructions. This means that we started the stereo part of the test at 800 arcsec and proceeded to smaller disparities, while the manufacturer instructs testers to begin at 200 arcsec and move up or down depending on results. The latter increases the probability of obtaining a good score by chance, and similarly means that a child who ceases responding after a few trials will obtain a better score (if poor motivation is mistaken for inability to do the task). However, previous studies using the PEDIG [13] or manufacturer’s [12] protocols have obtained similar results, suggesting that this is unlikely to be responsible. Perhaps the most plausible reason could be poor acuity in our “normative” sample. Visual acuity was not measured in our 2 -⁠ and 3-year-olds, so these groups likely included some children with undiagnosed poor vision. However, we also obtain higher scores in the older age-groups, where all children had visual acuity better than 0.2 logMAR in both eyes. Additionally, an unknown fraction of our sample may have not been wearing the correct refractive correction. The previous studies performed more thorough screening for normal vision, e.g. performing cycloplegic refraction and excluding children with anisometropia >1D [13]. However, anisometropia would be expected to affect stereoacuity via an effect on visual acuity, and we still obtain slightly higher scores even with more stringent limits on visual acuity and interocular acuity differences (S1 Table in S1 File).

Surprisingly, we have found a small but significant sex difference in stereoacuity, with girls scoring slightly better (i.e. lower) than boys, especially at younger ages. This has not previously been reported. In adults, female interpupillary distance is smaller than males’, meaning that a given depth creates a smaller angular disparity at the retina. Thus, females might be expected to develop sensitivity to smaller disparities, i.e. better stereoacuity. To our knowledge, only one publication has reported an effect of interpupillary distance [31], and this was in the wrong direction (the study used the Frisby stereotest, which uses real depth, so observers with larger interpupillary distance should have experienced more disparity for a given test level; yet in fact thresholds increased with interpupillary distance). Other studies have found no sex difference in stereoacuity in adults [32,33], and also no effect of interpupillary distance [33,34]. Sex differences in interpupillary distance become more pronounced during development [35,36], whereas our sex differences in stereoacuity are more pronounced in the younger age-groups. For these reasons, we think it is unlikely that interpupillary distance accounts for the sex difference. It may be related to sex differences in binocular function reported very early in life [37–39], with stereopsis emerging earlier in girl babies and their vergence responses being more responsive to disparity than boys’. It may also reflect non-visual cognitive/social/developmental sex differences, e.g. in willingness to cooperate. Boys in our cohort also showed slightly lower testability and test/retest reliability (Fig 3, Fig 6), though these differences were not significant. Further work would be needed to establish whether this sex difference in stereoacuity is reproducible and to establish its causes.

Sensitivity and specificity of Randot Preschool stereotest

Stereotests are commonly used in visual screening. The aim is to identify children with binocular vision problems by their failing the stereotest. Here, we evaluate the sensitivity and specificity of the Randot Preschool used in this way. Sensitivity or true positive rate is the proportion of children with binocular vision problems who failed the stereotest. Specificity or true negative rate is the proportion of patients without binocular vision problems who passed the stereotest (Fig 9).

**Fig. 9. Definition of sensitivity, specificity, and positive/negative predictive value, in the context of stereotests.**

Sample

For this analysis, we included children in whom we were able to perform the Randot Preschool (i.e. they were testable) and the cover test; in children over 4, we also required that visual acuity was measured in both eyes. This left 892 children, for 480 of whom parental questionnaires were available. Since binocular vision problems were rare (only 37 of the 892 children had one of the binocular visual problems defined in the next paragraph), for this analysis we combined the younger age-groups so that we had at least 200 children in each group. The analysis code is available in AnalyseValidityData.Rmd using data in RandotPreschoolValidation.RData.

Results

Fig 10 shows the percentage of children scoring in each of the Randot Preschool levels, by age, color-coded to show whether their vision appeared entirely normal (green) or whether a binocular visual problem was identified. We classified children as having a problem likely to affect binocular vision if (a) their parental questionnaire (if available) clearly indicated one, e.g. “attends eye clinic for lazy eye” (purple, “Parent” in Fig 10); or (b) our cover test showed clear evidence of a problem, e.g. exotropia (red, “CTFAIL”); or (c) their visual acuity in the poorer eye exceeded 0.48 logMAR, the WHO threshold for “moderate visual impairment” (brown, “ModVI”); or (d) they had an interocular visual acuity difference in excess of 0.2 logMAR (gold, “IADonly”). Note that each class includes the ones to its right in the legend. Thus only one child appears in the graph purely because of a large interocular acuity difference (gold), but children who failed the cover test (red) may also have had a large interocular acuity difference. We included all four categories when classifying children as having a binocular vision problem.

**Fig. 10. Randot Preschool stereo thresholds by age-group, color-coded to indicate whether any binocular vision problems were identified.**

Table 8 shows the specificity, sensitivity and positive/negative predicted value for the Randot Preschool stereo test in detecting a binocular vision problem. In all age-groups, the specificity is very high (>~ 90%) but the sensitivity is poor. The high specificity means that nearly all children without binocular vision problems pass the Randot Preschool. However, the poor sensitivity means that many children with binocular vision problems also pass. The positive predict value is fairly good, especially in older age-groups, meaning that most children over 5 who fail the Randot Preschool do have a binocular visual problem. The numbers in Table 8 reflect our fixed pass level of 800 arcsec, but as is apparent from Fig 10, there is no alternative choice of criterion–even varying it by age–which would permit higher sensitivity while retaining the high specificity. We also explored other definitions of “moderate visual impairment”, without finding a better criterion.

**Tab. 8. Predictive value of Randot Preschool in detecting binocular vision problems, by age-group, taking “pass” as a score of 800 arcsec or lower.**

A possible concern is that some children for whom visual acuity or parental questionnaire were not available may have had visual problems which we did not detect. They would therefore be erroneously classed as “false positives” if they failed the Randot Preschool, rather than “true positives”. We therefore also examined the results when including only children for whom visual acuity and questionnaires were available (S1 Fig in S1 File, S2 Table in S1 File), but this did not change the conclusions. In fact the sensitivity values were even poorer.

A further potential concern is crosstalk. The Randot stereotests use linear polarisation to separate the images for the two eyes. When used correctly, this has extremely low crosstalk. It is however critical that patients do not tilt their heads or rotate the test book, as this introduces interocular crosstalk. When crosstalk is present, both eyes can see the images intended for the separate eyes. Crosstalk generally weakens stereoscopic depth perception [40–42], but counter-intuitively it can help stereoblind observers pass the Randot Preschool test. This is because the images overlap differently in the disparate region of the image, leading to a visible difference in the combined image which can be used to pass the higher levels of the test with one eye when crosstalk is present (cf Fig 11). The crosstalk is minimal with the eyes parallel to the top edge of the book, and reaches 100% when the book is rotated through 45^o relative to the interocular axis. Such crosstalk is a theoretical reason why a child without stereo vision could nevertheless achieve a measurable score on the Randot Preschoool. While we cannot rule out that this contributed to our results, we did attempt to ensure that children viewed the stereotest correctly.

**Fig. 11. Randot Preschool test card, 400 arcsec level, photographed without polarising glasses (100% crosstalk).**

Discussion

We find that the Randot Preschool has good specificity but poor sensitivity. This is in line with previous findings about the Randot Preschool and other stereotests. Previous results for the Randot Preschool are summarised in Table 1. Afsari et al [13] found that sensitivity was 0% when counting a “fail” as nil stereoacuity; sensitivity varied between 9% and 27% when a score of 800 arcsec also counted as a fail. Birch et al [12] found a sensitivity of 24% for the same definition of fail. Some authors have suggested that random-dot stereotests are more sensitive to strabismus than to amblyopia, but neither our data nor that of Afsari et al[13] suggests a significant difference in sensitivity (S3 and S4 Tables in S1 File). Our data and previous studies indicate that around half of children with binocular vision problems can score well with Randot Preschool, so this cannot be relied upon as a screen for binocular visual problems.

Conclusions

We have carried out a comprehensive analysis of the Randot Preschool stereo test in a thousand children aged 2–11 years, and compared our results with previous studies. The Randot Preschool can be successfully completed by most children from as young as 2 years old. It contains a limited range of possible stereo thresholds, which span the range of normal values in children between 2 and 5 years. These groups have a mean score of ~100 arcsec, or 2.1±0.35 log₁₀ arcsec (mean±SD). In older children and adults, most people will score the best possible value, 40 arcsec, making the Randot Preschool–as the name implies–not suitable for investigating individual differences in vision in these age-groups. The Randot Preschool is extremely reliable at classifying children into those with/without any stereo vision; it is very rare for a child to pass it on one occasion and fail it one another. However, the reliability of stereo thresholds themselves is poor in the youngest age-groups, with a changes of up to ±0.7 log₁₀arcsec or a factor of 5 in stereo threshold occurring by chance. Reliability is high in clinical populations, where many patients fail every time, or in older age-groups, where many obtain the best score every time. Regarded as a screen for binocular vision abnormalities such as strabismus and/or amblyopia, the Randot Preschool has excellent specificity (true negative rate >95%), meaning that almost all people without a binocular vision abnormality pass the test. Thus, failing the Randot Preschool merits further investigation. However, its sensitivity (true positive rate) is poor, <50%, so passing the Randot Preschool, even with the best possible score, certainly does not mean that a binocular vision abnormality can be ruled out.

A strength of our study is the relatively large sample and the completeness of the analysis; few previous studies have examined testability, reliability, normative values and validity. A limitation is the limited or lacking visual acuity data in the younger age-groups, and the lack of information on refraction. Using cyclopleged refraction would have enabled us to exclude anisometropic children from the normative sample, but would have required opt-in consent, which as we have seen would have likely biased the sample towards children with better vision. Different design choices have different strengths and weaknesses, which is why it is valuable to have results from many studies with different designs.

Supporting information

S1 File [docx]
Word document containing details of analyses referred to in the main text: (1) Normative data with more stringent limits on visual acuity; (2) Sensitivity and specificity for limited data set where full data were available; (3) Sensitivity and specificity for strabismus vs amblyopia.

Zdroje

1. Bradshaw MF, Glennerster A. Stereoscopic acuity and observation distance. Spat Vis. 2006/01/18. 2006;19 : 21–36. Available: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16411481 16411481

2. Elliott S, Shafiq A. Interventions for infantile esotropia. Cochrane Database Syst Rev. 2013;7: CD004917. doi: 10.1002/14651858.CD004917.pub3 23897277

3. Read JCA. Stereo vision and strabismus. Eye (Lond). 2015;29 : 214–24. doi: 10.1038/eye.2014.279 25475234

4. Li T, Shotton K. Conventional occlusion versus pharmacologic penalization for amblyopia. Cochrane Database Syst Rev. 2009; CD006460. doi: 10.1002/14651858.CD006460.pub2 19821369

5. Rowe FJ, Noonan CP. Botulinum toxin for the treatment of strabismus. Cochrane Database Syst Rev. 2012;2: CD006499. doi: 10.1002/14651858.CD006499.pub3 22336817

6. Stewart CE, Wallace MP, Stephens DA, Fielder AR, Moseley MJ, Cooperative M. The effect of amblyopia treatment on stereoacuity. J AAPOS. 2013;17 : 166–173. doi: 10.1016/j.jaapos.2012.10.021 23622448

7. Wong AM. Timing of surgery for infantile esotropia: sensory and motor outcomes. Can J Ophthalmol. 2008;43 : 643–651. doi: 10.3129/i08-115 19020629

8. Adams WE, Leske DA, Hatt SR, Holmes JM. Defining real change in measures of stereoacuity. Ophthalmology. 2008/12/19. 2009;116 : 281–285. doi: 10.1016/j.ophtha.2008.09.012 19091410

9. Broadbent H, Westall C. An evaluation of techniques for measuring stereopsis in infants and young children. Ophthalmic Physiol Opt. 1990/01/01. 1990;10 : 3–7. Available: http://www.ncbi.nlm.nih.gov/pubmed/2184389 2184389

10. Simons K. A comparison of the Frisby, Random-Dot E, TNO, and Randot circles stereotests in screening and office use. Arch Ophthalmol. 1981/03/01. 1981;99 : 446–452. Available: http://www.ncbi.nlm.nih.gov/pubmed/7213163 doi: 10.1001/archopht.1981.03930010448011 7213163

11. Vancleef K, Read JCA. Which Stereotest do You Use? A Survey Research Study in the British Isles, the United States and Canada. Br Ir Orthopt J. 2019;15 : 15–24. doi: 10.22599/bioj.120

12. Birch E, Williams C, Drover J, Fu V, Cheng C, Northstone K, et al. Randot Preschool Stereoacuity Test: normative data and validity. J AAPOS. 2007/08/28. 2008;12 : 23–26. doi: 10.1016/j.jaapos.2007.06.003 17720573

13. Afsari S, Rose KA, Pai AS, Gole GA, Leone JF, Burlutsky G, et al. Diagnostic reliability and normative values of stereoacuity tests in preschool-aged children. Br J Ophthalmol. 2013;97 : 308–313. doi: 10.1136/bjophthalmol-2012-302192 23292927

14. Tarczy-Hornoch K, Lin J, Deneen J, Cotter SA, Azen SP, Borchert MS, et al. Stereoacuity Testability in African-American and Hispanic Pre-School Children. Optom Vis Sci. 2008;85 : 158–163. doi: 10.1097/OPX.0b013e3181643ea7 18317330

15. Yang JW, Son MH, Yun IH. A Study on the Clinical Usefullness of Digitalized Random-dot Stereoacuity Test. Korean J Ophthalmol. 2004;18 : 154. doi: 10.3341/kjo.2004.18.2.154 15635829

16. Pai AS, Rose KA, Samarawickrama C, Fotedar R, Burlutsky G, Varma R, et al. Testability of refraction, stereopsis, and other ocular measures in preschool children: the Sydney Paediatric Eye Disease Study. J AAPOS. 2012/04/25. 2012;16 : 185–192. doi: 10.1016/j.jaapos.2011.09.017 22525178

17. Trager MJ, Dirani M, Fan Q, Gazzard G, Selvaraj P, Chia A, et al. Testability of Vision and Refraction in Preschoolers: The Strabismus, Amblyopia, and Refractive Error Study in Singaporean Children. Am J Ophthalmol. 2009;148 : 235–241.e6. doi: 10.1016/j.ajo.2009.02.037 19426960

18. Fawcett SL, Birch EE. Interobserver test-retest reliability of the Randot preschool stereoacuity test. J AAPOS. 2000/12/22. 2000;4 : 354–358. doi: 10.1067/mpa.2000.110340 11124670

19. Smith SJ, Leske DA, Hatt SR, Holmes JM. Stereoacuity Thresholds before and after Visual Acuity Testing. Ophthalmology. 2012;119 : 164–169. doi: 10.1016/j.ophtha.2011.06.041 21924502

20. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1 : 307–310. Available: http://www.ncbi.nlm.nih.gov/pubmed/2868172 2868172

21. Anderman C, Cheadle A, Curry S, Diehr P, Shultz L, Wagner E. Selection Bias Related To Parental Consent in School-Based Survey Research. Eval Rev. 1995;19 : 663–674. doi: 10.1177/0193841X9501900604

22. Severson HH, Ary D V. Sampling bias due to consent procedures with adolescents. Addict Behav. 1983;8 : 433–437. doi: 10.1016/0306-4603(83)90046-1 6610283

23. Public Health England. Child vision screening: Service specification. Available: https://www.gov.uk/government/publications/child-vision-screening/service-specification

24. Stidwill D, Fletcher R. Normal Binocular Vision: Theory, Investigation and Practical Aspects. Wiley-Blackwell; 2011.

25. Vancleef K, Serrano-Pedraza I, Sharp C, Slack G, Black C, Casanova T, et al. ASTEROID: A New Clinical Stereotest on an Autostereo 3D Tablet. Transl Vis Sci Technol. 2019;8 : 25. doi: 10.1167/tvst.8.1.25 30834173

26. Serrano-Pedraza I, Herbert W, Villa-Laso L, Widdall M, Vancleef K, Read JCA. The stereoscopic anisotropy develops during childhood. Investig Ophthalmol Vis Sci. 2016;57. doi: 10.1167/iovs.15-17766 26962692

27. Hess RF, Ding R, Clavagnier S, Liu C, Guo C, Viner C, et al. A Robust and Reliable Test to Measure Stereopsis in the Clinic. Investig Opthalmology Vis Sci. 2016;57 : 798–804. doi: 10.1167/iovs.15-18690 26934135

28. Bosten JM, Goodbourn PT, Lawrance-Owen AJ, Bargary G, Hogg RE, Mollon JD. A population study of binocular function. Vision Res. 2015;110 : 34–50. doi: 10.1016/j.visres.2015.02.017 25771401

29. Schmidt P, Maguire M, Kulp MT, Dobson V, Quinn G. Random Dot E stereotest: testability and reliability in 3 -⁠ to 5-year-old children. J AAPOS. 2006/12/26. 2006;10 : 507–514. doi: 10.1016/j.jaapos.2006.08.019 17189143

30. Venables WN (William N., Ripley BD, Venables WN (William N). Modern applied statistics with S. Available: https://cran.r-project.org/web/packages/MASS/citation.html

31. Shafiee D, Jafari AR, Shafiee AA. Correlation between Interpupillary Distance and stereo acuity. Bull Environ Pharmacol Life Sci. 2014;3 : 26–33.

32. Zaroff CM, Knutelska M, Frumkes TE. Variation in stereoacuity: normative description, fixation disparity, and the roles of aging and gender. Invest Ophthalmol Vis Sci. 2003;44 : 891–900. Available: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12556426 doi: 10.1167/iovs.02-0361 12556426

33. Mai MN, Schlueter MA. The Relationship Between Pupillary Distance and Depth Perception in Humans. Investigative Ophthalmology & Visual Science. [Association for Research in Vision and Ophthalmology, etc.]; 2010. pp. 4359–4359. Available: https://iovs.arvojournals.org/article.aspx?articleid=2372974

34. Eom Y, Song JS, Ahn SE, Kang SY, Suh YW, Oh J, et al. Effects of interpupillary distance on stereoacuity: the Frisby Davis distance stereotest versus a 3-dimensional distance stereotest. Jpn J Ophthalmol. 2013/07/06. 2013. doi: 10.1007/s10384-013-0253-9 23828094

35. Pryor HB. Objective measurement of interpupillary distance. Pediatrics. 1969;44 : 973–7. Available: http://www.ncbi.nlm.nih.gov/pubmed/5365062 5365062

36. Fledelius HC, Stubgaard M. Changes in eye position during growth and adult life as based on exophthalmometry, interpupillary distance, and orbital distance measurements. Acta Ophthalmol. 1986;64 : 481–6. Available: http://www.ncbi.nlm.nih.gov/pubmed/3492853

37. Horwood AM, Riddell PM. Gender differences in early accommodation and vergence development. Ophthalmic Physiol Opt. 2008;28 : 115–126. doi: 10.1111/j.1475-1313.2008.00547.x 18339042

38. Gwiazda J, Bauer J, Held R. Binocular function in human infants: correlation of stereoptic and fusion-rivalry discriminations. J Pediatr Ophthalmol Strabismus. 1989/05/01. 1989;26 : 128–132. Available: http://www.ncbi.nlm.nih.gov/pubmed/2723974 2723974

39. Held R, Thorn F, Gwiazda J, Bauer J. Development of binocularity and its sexual differentiation. In: Vital-Durand F, Atkinson J, Braddick OJ, editors. Infant Vision. Oxford: OUP; 1996. doi: 10.1093/acprof:oso/9780198523161.003.0017

40. Seuntiëns P, Meesters L, IJsselsteijn W. Perceived quality of compressed stereoscopic images: effects of symmetric and asymmetric JPEG coding and camera separation. ACM Trans Appl Percept. 2000.

41. Yeh YY, Silverstein LD. Limits of fusion and depth judgment in stereoscopic color displays. Hum Factors. 1990;32 : 45–60. Available: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=2376407 doi: 10.1177/001872089003200104 2376407

42. Tsirlin I, Wilcox LM, Allison RS. The Effect of Crosstalk on the Perceived Depth From Disparity and Monocular Occlusions. IEEE Trans Broadcast. 2011;57 : 445–453. doi: 10.1109/TBC.2011.2105630

Characterizing the Randot Preschool stereotest: Testability, norms, reliability, specificity and sensitivity in children aged 2-11 years

Summary

Purpose

Methods

Results

Conclusions

Keywords:

Introduction

Methods

Comparisons with previous studies

Participants

Ethics

Data analysis

Study design and procedures

Vision questionnaire

Visual acuity

Cover test

Randot Preschool stereotest

Converting between threshold and log threshold

Results and discussion

Selection bias with questionnaire

Sample

Results

Discussion

Testability with Randot Preschool stereotest

Sample

Results

Discussion

Test/retest reliability of Randot Preschool stereotest

Sample

Results

Discussion

Normative values of Randot Preschool stereotest

Sample

Results

Discussion

Sensitivity and specificity of Randot Preschool stereotest

Sample

Results

Discussion

Conclusions

Supporting information

Zdroje

PLOS One