The CAST (Childhood Asperger Syndrome Test)

Text-only Preview

04 Williams (bc/t) 21/10/04 1:12 pm Page 45
autism © 2005
The CAST (Childhood Asperger
SAGE Publications
and The National
Autistic Society
Syndrome Test)
Vol 9(1) 45–68; 049029
Test accuracy
University of Cambridge, UK
University of Cambridge, UK
University of Cambridge, UK
University of Cambridge, UK
University of Cambridge, UK
S I M O N B A R O N - C O H E N
University of Cambridge, UK
University of Cambridge, UK
The Childhood Asperger Syndrome Test (CAST) is a
parental questionnaire to screen for autism spectrum conditions. In
this validation study, the CAST was distributed to 1925 children aged
5–11 in mainstream Cambridgeshire schools. A sample of participants
received a full diagnostic assessment, conducted blind to screen status.
The sensitivity of the CAST, at a designated cut-point of 15, was 100
percent, the specificity was 97 percent and the positive predictive value
was 50 percent, using the group’s consensus diagnosis as the gold
standard. The accuracy indices varied with the case definition used. The
sensitivity of the accuracy statistics to case definition and to missing
data was explored. The CAST is useful as a screening test for autism
spectrum conditions in epidemiological research. There is not currently
enough evidence to recommend the use of the CAST as a screening test
within a public health screening programme in the general population.
Correspondence should be addressed to: J O W I L L I A M S , Department
of Public Health and Primary Care, Institute of Public Health, University Forvie Site,
Robinson Way, Cambridge CB2 2SR, UK. e-mail: [email protected]

Screening for autism spectrum conditions may be desirable as a public health
service or as a part of epidemiological research. Screening as a public
health service is a means of actively identifying cases where there may or
may not be a previous concern about development. It has been shown that
DOI: 10.1177/1362361305049029

04 Williams (bc/t) 21/10/04 1:12 pm Page 46
A U T I S M 9(1)
the mean age of diagnosis for typical autism is 5.5 years, and as late as 11
years for Asperger syndrome, in spite of much earlier parental worries
(Howlin and Moore, 1997). Screening might be able to bring the age of
diagnosis earlier, and also function to reassure the worried well. Earlier diag-
nosis may be desirable for a number of reasons: to allow time for genetic
counselling; to initiate parental support; and to allow for earlier intervention
(Baird et al., 2001).
Currently there is insufficient evidence to recommend screening for
autism spectrum conditions as a public health service (National Screening
Committee Child Health Subgroup, 2001). One of the gaps in the evidence
is the lack of a screening test that has been fully validated and shown to be
effective in the general population. This article provides evidence relevant
to this gap.
An effective screening test for autism spectrum conditions would also
be invaluable for epidemiological research. Due to the resource impli-
cations it would not be possible to undertake a detailed assessment of all
children in a large population-based study. A screening test can be used in
a first phase of an epidemiological survey to sift out the children who
require further detailed assessment in a second phase of the study, and
hence make large studies feasible.
The focus of this study is on primary-school-age children. Potential
screening tests for typical autism in preschool children have been devel-
oped (Baird et al., 2000; Robins et al., 2001). It is appropriate to develop
a screening test for primary-school-age children, as many children with
autism spectrum conditions are not identified prior to school entry. Coverage
of preschool surveillance is incomplete, and the existence or severity of
an autism spectrum condition may only become apparent in the new and
demanding environment as a child enters school (Hall and Elliman, 2003).
Numerous screening tests have been written that can be used with
primary-school-age children. These include: the Australian Scale for
Asperger Syndrome (Atwood, 2001); the Children’s Social Behaviour Ques-
tionnaire (Luteijn et al., 2000); the Pervasive Developmental Disorders
Questionnaire (Baird et al., 2000); the Asperger Syndrome Screening Ques-
tionnaire (Ehlers and Gillberg, 1993; Ehlers et al., 1999); the Autism Behav-
iour Checklist (Krug et al., 1980); the Gilliam Autism Rating Scale (Gilliam,
1995; South et al., 2002); and the Social Communication Questionnaire
(Berument et al., 1999).
There are no published validation studies available for the Australian
Scale for Asperger Syndrome or the Pervasive Developmental Disorders
Questionnaire. Both sensitivity and specificity estimates are not available
from studies of the Children’s Social Behaviour Questionnaire or the Gilliam
Autism Rating Scale. The Social Communication Questionnaire has been

04 Williams (bc/t) 21/10/04 1:12 pm Page 47
W I L L I A M S E T A L . : C A S T T E S T A C C U R A C Y
validated in two studies (Berument et al., 1999; Bolte et al., 2000), and has
demonstrated good sensitivity and specificity. However both these studies
were in clinical samples, and the test needs further validation in the general
population. The Asperger Syndrome Screening Questionnaire has been vali-
dated in a clinical sample (Ehlers et al., 1999) and showed good sensitivity
and specificity. Whilst it has been used in the general population (Ehlers et al.,
1999), data on sensitivity and specificity are not available in this context.
Many promising screening tests are being developed, but there is
currently no screening test for autism spectrum conditions which has been
fully validated in the general population, which has been shown to be effec-
tive, and for which information about validation is available in the public
domain. The aim in further developing the Childhood Asperger Syndrome
Test (CAST) was to validate a test for use in the general population rather
than clinical populations, and to develop a test that is sensitive to autism
spectrum conditions, including pervasive developmental disorder not other
specified (PDD-NOS), not just to typical autism.
The CAST is a 37-item parental self-completion questionnaire, shown
in the Appendix. There are some points to make about the name of the
questionnaire. The CAST is not, strictly speaking, specific to Asperger
syndrome, but it was developed to be sensitive to autism spectrum con-
ditions in the mainstream school population, and therefore for use predomi-
nantly in children with cognitive ability within the normal range. Therefore
many, though not all, of the children identified with an autism spectrum
condition using the CAST will have Asperger syndrome. The name CAST is
kept for the purposes of this article to maintain continuity with the test’s
previous publication (Scott et al., 2002a).
There is an ongoing debate over whether autism represents an extreme
end of normal variation in behaviour or qualitatively different behaviours
(Volkmar et al., 1997). The CAST was designed as a quantitative scale and
assumes that behaviours fall on a continuous distribution, and is based on
a dimensional conceptualization of autism spectrum conditions and related
social and communication difficulties. It is possible, however, to impose
arbitrary cut-points on the continuum to delineate categories of behaviour
that are qualitatively different from normal behaviour, and the CAST is
therefore compatible with a categorical conceptualization of autism.
Details of the instrument development of the CAST have been published
previously (Scott et al., 2002a). Two previous pilot studies have been con-
ducted (Scott et al., 2002a). The first pilot was in a small sample of known
diagnostic status. This study demonstrated that the CAST discriminates well
between children with Asperger syndrome and normally developing
children. A preliminary cut-point of 15 was chosen, as all the children with
a diagnosis of Asperger syndrome scored at 15 or above and none of the

04 Williams (bc/t) 21/10/04 1:12 pm Page 48
A U T I S M 9(1)
normally developing children scored above 15. A second pilot study was in
a population-based sample of 1150 children in mainstream schools. The
cut-point of 15 was used again and showed that the CAST has good
specificity at this point (98 percent). The response rate in the population
sample was very low (17 percent), and it was not possible to calculate the
sensitivity as children with a low score on the CAST were not given a full
diagnostic assessment. The aims of this article are to further validate the
CAST in a larger population sample, to improve the response rate, to
generate sensitivity data, and to confirm a suitable cut-point for the CAST.
School selection and response
Six schools were selected to represent different geographical areas of
Cambridgeshire: two in Cambridge city, one in North Fenlands, one in East
Fenlands, and two in West Fenlands. Large schools were selected for con-
venience. Each of the headteachers received a letter of invitation to join the
study, which was followed by a meeting between each headteacher who
was interested in taking part, and two members of the research team (FS,
JW). The aim of this meeting was to explain further details about the study,
and to provide an opportunity for the headteacher to ask questions. A
training session for the staff on Asperger syndrome was offered. One of the
schools took up this offer. Five of the schools agreed to take part, with one
of the Cambridge city schools refusing. The percentage of children on the
special needs registers of the participating schools ranged from 18 to 66
percent (mean = 34 percent, SD = 19 percent) (Ofsted, 2003).
Questionnaire distribution
Each school was asked to distribute a copy of the CAST to each child in the
school who was between the ages of 5 and 11. Questionnaires were dis-
tributed to the schools on 29–31 January 2001. The schools distributed the
CAST during that or the subsequent week. Each child received an envelope
that contained the CAST, a covering letter, and a Freepost envelope to return
the questionnaire. A total of 1925 questionnaires were distributed. A
second batch of questionnaires, identical to the first, was distributed to four
of the schools that agreed to take part again in order to improve the
response rate. This mailing was identical to the first except for the addition
of a note to ask parents not to send back the questionnaire if they had
already returned the first.
Returned questionnaires were excluded if the child was not in the
specified age band, if they were not at one of the schools approached, or if

04 Williams (bc/t) 21/10/04 1:12 pm Page 49
W I L L I A M S E T A L . : C A S T T E S T A C C U R A C Y
the questionnaire was blank or a whole page was missing. A few families
returned a second questionnaire on their child following the reminder
mailing, and in these cases the second questionnaire was excluded.
Data entry and cleaning for the screen
The data were entered on return of the questionnaires, keeping personal
and identification data separate from the screen results. A 10 percent
random sample of questionnaires was double entered to audit accuracy of
the data entry. There was an agreement of 98.9 percent between the two
entered sets of data, and discrepancies were checked against paper versions.
The data were cleaned, checking that each entry had a unique identifier.
Single-item checks were carried out for each variable to ensure that the values
entered were possible and not missing if obligatory. Within-interview
checks were carried out to ensure that answers were not given randomly
(e.g. all ‘Yes’ or alternately ‘Yes’ then ‘No’) and to check that whole pages
of the questionnaire were not omitted. The data were checked in this way
independently by two members of the research team (FS and JW), and a
consensus decision was made over any data entry ambiguities.
Questionnaire scoring and sampling
The questionnaires were scored by unweighted addition of the endorsed
scoring items. A total of between 0 and 31 could be scored. Scores were
grouped into three bands: ≥ 15; 12–14; <12. A score of 15 was taken as
the provisional cut-point for the screening instrument. All those scoring
≥ 15 and 12–14, and a random unstratified 5 percent sample of those
scoring < 12, were invited for a detailed diagnostic assessment.
Participants in the assessment sample were contacted by telephone to
arrange the assessment. Where this was not possible, they were contacted
by post. Assessments were arranged between 11 and 15 months after the
screen. Due to this long time lag between screen and assessment, the screen-
ing test was administered again at the start of each assessment (CAST–R).
Assessments were carried out in each participant’s home.
Two instruments were used as a ‘gold standard’ for diagnostic assess-
ment: the Autism Diagnostic Interview–Revised (ADI–R: Lord et al., 1994)
and the Autism Diagnostic Observation Schedule–Generic (ADOS–G: Lord
et al., 2000). Clinical judgement is usually considered to be the diagnostic
gold standard. These instruments have the advantage over clinical diagnosis
of being standardized, and their reliability and validity have been shown to
be good (Lord et al., 1994; 2000). No other diagnostic tools that could
have been chosen were validated with the same rigour as the ADI–R and

04 Williams (bc/t) 21/10/04 1:12 pm Page 50
A U T I S M 9(1)
the ADOS–G. The ADOS–G was designed to differentiate between autism,
autism spectrum disorder (including PDD-NOS) and non-autism (Lord
et al., 2000). The ADOS–G has also been shown to discriminate between
children with pervasive developmental disorders and specific develop-
mental disorders such as specific language impairment (Noterdaeme et al.,
2002). Whilst the ADI–R and ADOS–G have often been used with strict
criteria to select a conservative group of cases for genetic studies, the value
of using these tools as continuous measures of the wider phenotype of
autistic symptoms has been described (Lord et al., 2001).
Both the interview and observation were carried out with the inter-
viewer blind to the CAST score. Most usually one researcher did both the
interview and the observation. The order of the ADI–R and ADOS–G was
not randomized due to practicalities of being able to do the interview first
before the child came back from school.
Reliability of assessment
Inter-rater reliability on the ADI–R and ADOS–G was assessed. A sample of
videos of interviews and observations was reviewed to come to consensus
codes. The mean inter-rater reliability was calculated in two ways. First,
each interviewer’s code was compared with each consensus code, the mean
agreement across all the codes made in each interview or observation was
taken, and the mean reliability across all the assessments reviewed was
calculated. For the ADI–R the inter-rater reliability across all codes was 90
percent (based on ratings on one interview), and for the ADOS–G it was
87 percent (based on ratings of eight children observed). Second, weighted
kappa statistics and multi-rater kappa statistics of inter-rater reliability were
calculated for the ADOS–G observations using standard linear weights
(Cohen, 1968; Fleiss, 1981, pp. 225–32). A weight of 1 was used for exact
agreement, 0.5 for a difference of 1 in the rating, and 0 for a difference of
2 in the rating. The mean weighted kappa for the ADOS–G ratings (based
on four schedules) across all non-unique rater pairs was 0.59. The multi-
rater kappa statistic was 0.54. This shows that there was moderate inter-
rater reliability (Landis and Koch, 1977). Data were not available to calculate
kappa statistics on the ADI–R.
Assessment outcome and case definition
A case of autism spectrum condition was defined in two ways:
1. Assessment diagnosis. If a child scored above the cut-point for autism or
autism spectrum condition on both the ADI–R and the ADOS–G, or if
they had a previous clinical diagnosis of autism, Asperger syndrome or
another autism spectrum condition, they were recorded as a case of
autism spectrum condition.

04 Williams (bc/t) 21/10/04 1:12 pm Page 51
W I L L I A M S E T A L . : C A S T T E S T A C C U R A C Y
2. Consensus diagnosis. There were a number of reasons for choosing a second
case definition. A case definition for wider spectrum conditions includ-
ing Asperger syndrome and PDD-NOS was required, and the ADI–R
only provided a cut-point for autism. Some hold the opinion that the
ADI–R and ADOS–G algorithms are too stringent for inclusion of
PDD-NOS. For example, one study defined the criteria for PDD-NOS as
scoring above two of the three domains of the ADI–R rather than all
three domains, according to the algorithm (Bishop and Norbury, 2002).
Also, disagreement between the ADOS–G and the ADI–R and between
these tools and previous diagnoses has been observed (Bishop and
Norbury, 2002).
For these reasons some researchers have used clinical judgement, based on
the results of the ADI–R and ADOS–G and using international diagnostic
criteria, in order to make research diagnoses, in particular for autism
spectrum conditions including PDD-NOS (e.g. Bolton et al., 1994). This
approach was taken for a second case definition in this study, which was
referred to as the consensus diagnosis. A child was given a consensus diag-
nosis if they received an assessment diagnosis or were below the cut-point
(≤ 2 points) in only one of the domains covered in the algorithm on either
of the instruments, and the research team agreed that they met ICD-10
research criteria (World Health Organization, 1993) for a diagnosis of
atypical autism, Asperger syndrome or PDD-NOS. This judgement was
made by consensus by three researchers (FS, CS, JW). In practice the sub-
groups of autism were not differentiated, and a research diagnosis of autism
spectrum condition was given.
Referral of children to clinical services
Following the assessment, parents of children who received a research diag-
nosis were contacted to ask if they would like their assessment data to be
passed to a clinician in the research team for possible referral into clinical
services. In addition, where parents had substantial concerns about their
child’s development that were not related to autism spectrum conditions,
they were contacted to recommend that they see their GP.
The characteristics, as recorded in the CAST questionnaires, of responders
and non-responders at the assessment stage were compared to assess
whether systematic bias was introduced through non-response. In addition,
those invited and not invited for assessment in the lowest score group were
compared. Tests for significant differences between groups were used:
Mann–Whitney test for difference between medians, unpaired t-tests for

04 Williams (bc/t) 21/10/04 1:12 pm Page 52
A U T I S M 9(1)
differences between means, and chi-squared tests for differences between
proportions. Where numbers were small, Fisher’s exact test was used. It was
not possible to assess the effect of non-response to the screen on the distri-
bution of score on the CAST as descriptors of the characteristics of non-
responders were not available.
The CAST scores at the time of the screen were compared with the
scores at the second administration during the assessment. If an individual
moved sampling group when using their maximum score (that is, the score
each individual would have if each missing item were replaced with 1) in
place of their observed score, their maximum score was used. Otherwise,
their observed score was used.
Indices of test accuracy (sensitivity, specificity and positive predictive
value) were calculated, based on observed score on the CAST. As a two-stage
sampling strategy was employed, inverse probability weighting using
sampling weights were used. The weights were empirical weights defined
as the inverse probability of being assessed from a particular score group,
reflecting both the sampling and the response rate in each score group.
Confidence intervals were calculated. Where the proportion was 100
percent, confidence intervals were calculated using the weighted count to
calculate a binomial exact confidence interval. If weights had not been used,
the positive and negative predictive values calculated would simply have
reflected the sampling strategy that led to a proportionally higher preva-
lence of autism spectrum conditions in the assessment sample than would
be found in the general population (Feinstein, 1977; O’Toole, 2000).
Questionnaires were not omitted from the analyses due to missing data,
with the exception of the exclusion of questionnaires that were blank or
had whole pages missing. Two sensitivity analyses were carried out to
investigate the effect of missing responses in the CAST questionnaires. First,
the analysis was rerun using the maximum score. Second, if individuals
crossed over a sampling boundary (from < 12 to ≥ 12, or from < 15 to
≥ 15) when their maximum score was used rather than their observed
score, the analyses were rerun excluding these people.
All analyses were carried out using STATA version 7 (StataCorp, 2001).
Response rates
Response rates at each phase of the study are shown in Figure 1. Overall
the response rate for the screen was 26 percent, with the response rate
ranging from 20 to 33 percent across the different schools, and the standard
deviation across schools was 5.4 percent. There was an inverse relationship

04 Williams (bc/t) 21/10/04 1:12 pm Page 53
W I L L I A M S E T A L . : C A S T T E S T A C C U R A C Y
CASTs sent, n = 1925
CASTs returned, n = 552
CASTs excluded, n = 52:
n = 22, < 5 years, >11 years
n = 5, school not in sample
n = 2, questionnaire blank
Valid CASTs, n = 500
n = 2, whole page missing
n = 20, duplicate questionnaire on a child
(complete, n = 387)
n = 1, all answers ÔyesÕ
Not invited for assessment
sample, n = 65
sample, n = 435
Refused assessment
assessment, n = 40
sample, n = 25
Figure 1
Response rates
between the school response rate and the percentage of children on the
special needs register, according to Ofsted reports (Ofsted, 2003). For
example, the highest responding school (33 percent response) had the
lowest percentage of children on the special needs register (18 percent),
and the lowest responding school (20 percent) had the highest percentage
of children on the special needs register (66 percent).
The response rate for the assessment was 60 percent. The characteristics
of those that accepted and refused assessment are shown in Table 1. Within
score groups, responders and refusers were very similar in terms of CAST
score, age, gender, and parental education. Significantly more families took
part where parents reported there had been concern expressed over the
child’s development by a teacher or a health visitor (Fisher’s exact test, p =
0.017). This difference was not observed within each score group. No other
differences between responders and refusers were significant.

04 Williams (bc/t) 21/10/04 1:12 pm Page 54
A U T I S M 9(1)
7 (28)
7.4 (1.8)
6 (86)
1 (14)
7 (100)
0 (0)
18 (4)
17.4 (1.3)
2, 2, [3]
3, 2, [2]
1, 3, [3]
0, 4, [3]
0, 4, [3]
15 on C

oup 3:

8.2 (1.9)
3 (17)
3 (17)
18 (72)
18 (5)
15 (83)
17.8 (2.2)
15 (83)
5, 8, [5]
2, 8, [8]
4, 8, [6]
4, 8, [6]
0, 10, [8]
9 (45)
8.3 (2.1)
6 (67)
3 (33)
4 (44)
5 (56)
12 (1)
16.4 (0.9)
1, 6, [2]
0, 6, [3]
2, 4, [3]
0, 6, [3]
0, 6, [3]
12–14 on C
efused assessment
oup 2:

8 (2.0)
5 (45)
6 (55)
8 (89)
1 (11)
11 (55)
13 (1)
17.2 (2.8)
3, 5, [3]
1, 5, [5]
3, 5, [3]
0, 8, [3]
0, 8, [3]
9 (45)
5 (4)
6.9 (1.6)
5 (56)
4 (44)
0 (0)
9 (100)
18 (2.8)
0, 8, [1]
0, 8, [1]
2, 6, [1]
0, 8, [1]
0, 8, [1]
< 12 on C
esponders (see text).
oup 1:

4 (5)
7.9 (2.0)
4 (36)
7 (64)
2 (33)
4 (67)
11 (55)
17.2 (1.8)
1, 7, [3]
0, 8, [3]
1, 7, [3]
0, 8, [3]
0, 8, [3]
esponders and non-r
es, no
Median (IQR)
Mean (SD)
Mean, (SD)
een r
AST at scr
ences betw
t on C
A comparison of the characteristics of those who accepted and r
ents left
ears, decimal)
er child’
elopment b
ysical disability
ental r
education (mother
decimal y
teachers or health
vious diagnosis:
Language dela
Autism spectrum
able 1
AST scor
Significant diff
Age (y
Age par
Concerns expr