The following is a transcript of the presentation video, edited for clarity. Presentation slides are available for download via the PDF button in the toolbar.
Originally presented at the ASHA Convention (November 2013) as part of the session Development of a Bilingual Test for Spanish-English Children: A Long and Winding Road. Videos in this series are:
Challenges in Assessing Bilingual Populations
(Elizabeth D. Peña, University of Texas at Austin)
(Elizabeth D. Peña, University of Texas at Austin)
Bilingual Phonology Assessment Design
(Brian A. Goldstein, La Salle University)
Bilingual Pragmatics Assessment Design
(Aquiles Iglesias, Temple University)
Bilingual Semantics Assessment Design
(Lisa M. Bedore, University of Texas at Austin)
Bilingual Morphosyntax Assessment Design
(Vera F. Gutierrez-Clellen)
Semantics, Primary Language Impairment, and Bilingualism
When we started working on the semantics test, not a lot was known about semantic development in bilingual children.
We started to think about what the relationship would be between semantics and primary language impairment. Some of the things that we know about the vocabulary development in children are that children with PLI have weaker semantic representations than their typically developing peers, and they have less semantic depth.
We also know that they require more exposures to learn new words.
But we also know that if we give them a single-word vocabulary test, their vocabulary knowledge may fall within the low-normal range.
If we think about what’s going on with bilingual children, we know that they have normal language learning ability. There’s no empirical reason to believe they have greater risk for language impairment than their monolingual peers.
We also know that their experiences across their two languages are shared. So the concepts that they have are shared across their two languages.
Because their experiences come in two languages, they have divided input. So each of their experiences with their words will be less reinforced. They’ll have less opportunities to hear and use those words, and so they may need more time to learn the phonotactics or to learn those words.
However, again, if we give children a single-word vocabulary test, we know that they’ll score in the low-normal range.
Assessment Approach
Areas
We decided to take an approach of organizing our test items or test blueprint around several core areas that tap semantic knowledge across the children’s languages.
We drew this from the typically developing literature available for English-speakers at the time.
We had a number of different kinds of item types that we used.
-
Analogies. That is actually pretty challenging. For example, an item such as, “Hamburger is to plate as soup is to ___”
-
Descriptions. We would ask children to tell us things about an object they might be familiar with. So something like, “Tell me three things about a school bus.” Or, “Tell me three things about a truck.”
-
Category generation. We had items such as, “Tell me as many zoo animals as you can think of.” And we give children a set amount of time to generate as many items as they could in that category.
-
Similarities and differences. For example, we showed children pictures of cards or invitations and we would ask children to tell us what makes these kinds of things go together. They might be the same color, or the same shape.
-
Item functions. We asked children to help us identify what we used different kinds of items for. “What is this pencil for?” or “What is a knife for?”
-
Associations. We asked children to tell us items that go with other categories. For example, we’d say, “Tell me something that goes with bird.” And we would expect the children to tell us that birds can fly.
-
Linguistic concepts. That’s an important school concept for children. Children might be asked to identify the color or shape of something like a balloon or box.
As we developed the actual items for the test, we focused on items we thought would be challenging enough to separate children with typically developing and language impairment. For example, instead of having single word items where children would just have to name or recognize, we had items children might look at and have to explain the difference to us.
“Here are two piñatas, they are different. Tell us what’s different about the piñatas.” And children would tell us something about the points. Or tell us there’s a different number of points, or there’s something different about them.
Psychometric Equivalence
Another thing we looked at when we were trying to develop our items was the possibility of psychometric equivalence across the items in English and Spanish.
When you ask children to do the exact same thing, as if you were directly translating a test, and you give children the same item in English as you do in Spanish, they may take that repetition of the item to indicate that they did something wrong the first time you asked and change their answer. It’s not because they didn’t know what it was that was different about the item or what you were asking. But they just take the repetition as an opportunity to say something different, and maybe get it right this time, and you’ll stop asking them.
This is a characteristic property item, where we were asking children to tell us about the features of these two items. So for example, in Spanish we might ask the child, “¿Cómo es la pelota?” [What’s the ball like?] and they can talk about the color or the shape or the features, and in English we can get at that same kind of item saying, “What’s this present like?” And again the child can talk about the color, the shape, and the features.
We address these different kinds of items with different questions — but they were psychometrically equivalent in terms of their difficulty.
Finally, we developed items that require semantic knowledge, but that you could respond to using different kinds of vocabulary. In the same way that Aquiles was talking about with the Pragmatics test, there’s lots of different kinds of appropriate responses that could get at the key feature that you were talking about, but could be answered in different ways.
Here we’re asking about how these presents are similar. So children could talk about the red bows, the red ribbons. They could talk about red string. They could talk about the shape of the presents. But they would have to know, for example, that they’re not the same size.
As we scored these kinds of items, we provide sample responses that children might respond to in either language. One of the unique things about the semantics test is that we allow children to respond in either language. So, in the English test the most likely response is going to be the English response, but if the child responds in Spanish, we count that just the same as if they had responded in English. And they would get a 1 for either of the bolded responses to “What makes these gifts go together?”
We do also mark whether the children provide other language responses, just so we can keep track of what language children are responding to the test in.
Iterative Test Development and Item Analysis
So, we started off with a very large set of items. I think about 187 per language. It took a couple of days, at least, to provide this item set to the children. Not the whole day — just a couple of test sessions. Although, maybe the teachers felt like we were pulling these children out for the full day.
We started off with an equal number of items in each of the categories. We tested these out with 71 children in our local school districts in Texas. We tested with 4, 5, and 6 year old children. And we had a subset of 5 year olds who had either nice, typical language skills, or definitely had language impairment.
So we used these iterative approaches to giving the test to identify groups of items that would reflect good items for the children — items that reliably elicited the targets we were looking for, and that differentiated children with and without impairment.
What we can see, is that as we progressed from our local set of kids to the larger set of children that we tested in different sites across the country, the number of items goes down, because we’re getting rid of items that aren’t so reliable and don’t differentiate children. And we also see that over time, the numbers of items in each of the languages varies.
We ended up with 24 items in Spanish and 24 items in English in our final version of the test. But you can see that, for example, we have many more Similarities and Differences items in English than we had in Spanish. But we had many more Functions of objects type questions in Spanish than in English. So the overall difficulty of the test is the same, but the configuration of items varies related to each of the languages.
In the final version of the test, what we end up with is, for 4, 5, and 6 year olds, we see p values increasing by age. Remember that p values are the percentage of children who get items correct at each age. So there are systematic increases in Spanish and in English for typically developing children that have normal language, as they increase from about 55% correct to 80% correct. We also see that there is a progression for the language impaired children. They go from about 26% to about 50% correct.
We also took into account item discrimination. That’s the difference between how typically developing children and language impaired children score. And here we see that all of these items cluster around 0.3. That’s an ideal difficulty level for differentiating children with and without language impairment. That’s the target value. We see that for both languages across ages we were able to hit this iterative process of getting rid of items that don’t work.
The other thing that we see as we look at our semantics test is that it corresponds or correlates with related measures. For example, we see moderate correlations — but highly significant correlations — with language sample measures. We collected narrative samples on the children to whom we gave the semantics test and the other tests.
We see correlations in the 0.3 to 0.4 range. And we also looked at how our test correlated to the Expressive One Word Picture Vocabulary Test, which is a naming test. So looking at children’s single-word vocabulary. Again we see a moderate correlation there.
But I think the other important thing to think about here, is we don’t have correlations of 0.8 or 0.9. So our test isn’t doing the exact same thing as these other measures do. So it speaks to the validity of the kinds of tasks we do, but it’s telling you that you’re going to get different information from doing this than doing these other kinds of measures you might routinely include in a large-scale assessment.
We finalized our analysis by looking at whether we were doing a good job. What’s our classification accuracy on the semantics test?
We used discriminant function analysis to set cut-points. To decide where was the ideal cut point to differentiate children with and without language impairment, and what we see is we get very nice sensitivity and specificity in Spanish. We do see a little challenge there with the five-year-olds, and I’ll talk about that when I get all the way through this.
We looked at the same thing in English. Again we see sensitivity and specificity at all ages, above 80%.
One of the unique things we did with the semantics test, that we did with all of our test that we see here. The children are bilingual. We know that by looking at the slides of classification about how children do relative to their dominance, we know that sometimes children who are bilingual do better in their first language, because they have strong word knowledge there. Or we know that in typical development, children learn words before they learn grammar. And so maybe they’re doing better in their second language because that’s starting to take over.
So you can use an approach where you look at children’s best language to classify them. We see that generally speaking, classification values go up for the children.
One thing that we see both in Spanish and the Best language classification is that we’re right there, just about 70% to 75% sensitivity for five year olds. We attribute this to some of the demands of increased use of English at kindergarten, putting a pressure both on their English, and on their ability to retain Spanish. So that one’s a little bit lower. Of course, this semantics test doesn’t stand by itself. It stands in conjunction with the other subtests that we developed, so you would need to combine it with morphosyntax to get really good classification of all of your children.