Bilingual Phonology Assessment Design

Third in the series on Development of a Bilingual Test for Spanish-English Children.

Brian A. Goldstein

DOI: 10.1044/cred-pvd-c13008

The following is a transcript of the presentation video, edited for clarity. Presentation slides are available for download via the PDF button in the toolbar.

Originally presented at the ASHA Convention (November 2013) as part of the session Development of a Bilingual Test for Spanish-English Children: A Long and Winding Road. Videos in this series are:

Weaknesses with Phonology Assessments c. 1997

We’re sort of starting in 1997, but my interest in psychometric properties or weaknesses of phonological assessments related –not even really to bilingual kids, but to Spanish-speakers, started actually 11 years before this in 1986. I’m pretty sure there are people in this room who were not born in 1986. I guess I’ve been a phonology geek since the late ’80s. At that time, we weren’t even talking about bilinguals. We were basically talking about Spanish monolinguals because we had no idea what to do with bilinguals.

Typically what you saw were tests that were translated from English into Spanish. Even for phonology. Now think about that. That’s where we were in the mid- to late-90s. Forget about the fact that nothing was standardized at that time in terms of phonological assessments.

The tests that were out there were all largely modeled on the Goldman-Fristoe. That is, they were the proverbial three-position test having each phoneme in the initial, medial, and final position. We won’t go into the limitations of tests that focus on that particular construct.

They also tended to be single versus multiple analyses. What I mean by that is, largely what you would do is you would score these tests, and you would get the number correct or the number incorrect. And that was it. Again, because they weren’t standardized, there really wasn’t much you could do with them. I’ll talk about that more in a second.

At the time, assessments were almost all based on the Mexican dialect of Spanish. That relates to two aspects. It is really imperative to understand dialect in terms of Spanish in particular because of the nature of the dialect. It changes largely consonants in Spanish, as opposed to vowels in English, for example.

The other thing is vocabulary items vary widely and wildly from dialect to dialect. As Aquiles can attest, in our lab we used to have a picture of a drinking straw in the lab. Every new group of students that would come in, one of the things they would have to do is write their word for straw on this piece of paper. By the time I left a couple years ago, I think there were either 10 or 11 different words in Spanish for “straw.” As in “you’re drinking from a straw right now.” I was trying to think, is there any other word in English other than “straw” for straw. I don’t think so. In Spanish, it was either 10 or 11. So dialect is critically important from a lexical standpoint and from a phonological standpoint.

For me, what was more critically lacking in 1997 was internal validity in terms of the construction of these tests. I’m just giving you a couple of examples here, in terms of what were weaknesses or what was missing from those assessments.

One is, how many opportunities for each of the phonemes were there? Again if your mindset is that you’re creating a three-position test, then you’re going to have s in initial position, so-called medial position, and final position, and that’s it. Disregarding any other aspect of the phonology. At that time, there was little talk about phonological processes or phonological patterns. Barbara Hodson’s test was out at that particular time, but the relationship hadn’t really been translated into tests for non-English speakers.

And the phonotactic structure of the language was not taken into account. The idea was these tests were constructed to be sure that the m sound, for example was tested in initial, medial, and final position.

Spanish vs. English

Let me just give you a couple of examples of why this is critically important for languages like English and Spanish.

What I have here is a phonetic or phonemic inventory of Spanish versus English by sound class. I won’t go through all of this.

What you want to note is there is an inventory aspect that is specific to each language and more or less common to each language.

If you look at the stops series, for example, there’s a sharing of the stops between English and Spanish. They’re not exactly the same, there are differences in terms of manner and in terms of place that I won’t go into.

If you look at things in terms of nasals: Spanish has the palatal nasal ɲ, English does not. English has an ŋ, Spanish does not. Again, at the phonemic level. English has a wider variety of fricatives than does Spanish. English has the ʤ affricate as a phoneme, Spanish doesn’t. English has the ɹ sound, Spanish has the ɾ. They both have an l. The trill in Spanish is not a liquid, it is a separate kind of class, so you can’t clump them together.

I think of phonemes as content, if you will, and the syllable types as a kind of frame in which to slot the consonants and vowels. If you look at the frames in which we’re slotting the consonants. English has 14 different syllable types. Everything from a single vowel syllable, like “a” into one that has an onset of three members, a vowel, and then a coda of four consonants (e.g., strengths). Of that group of 14, Spanish has 8 of the 14. So the point is, there is a frame that has to occur in English that doesn’t necessarily have to occur in Spanish. And if you’re going to create an assessment, then you have to take all of these into account.

Internal Validity: Phonotactic Structure

In creating the phonology subtest, then, we were guided by a number of operating or underlying principles or philosophies.

One is that we wanted to have a number of opportunities for each phoneme, including vowels. Knowing that at the end of the day, as Liz said, we were not going to end up with a four-hour test. Nobody would administer it, nor should anybody be administering a four-hour test. Especially in these days when, if you have 60-90 minutes, then you’re in a really good district if you have that kind of time and you should never move.

So what we tried to do was internally structure the subtest so that the frequency of occurrence of sounds on the test had some relationship to their frequency of occurrence in the language. So there was a match, knowing we were not going to end up with a supreme phonology test that would dominate the entire assessment. Because, again, that’s not feasible, and it’s not efficient.

At the time, phonological patterns or processes were coming into vogue. Knowing this was a shorthand, descriptive way to look at error patterns, we wanted to make sure there were a number of opportunities for at least the common phonological patterns to be represented in the final word set so one could go beyond a mere score and do additional analyses. The idea was that you could go beyond the score if you had the time or were of the mindset to be able to do that.

English is largely monosyllabic words. I think it’s something on the order of 60% of words are monosyllabic. Spanish is reverse from that — the majority of words are multi-syllabic. So we had to make sure those were controlled for.

Also, we wanted to be sure we had initial and internal consonant clusters.

I mentioned the dialect issue. We were going for, as much as possible, dialect neutral. And I should have had that in quotes, because there is really no such thing as dialect neutral, where there is no variation whatsoever. But we wanted a test that indicated to you, the end user, what those dialect effects would be.

Assessment Design

The design that we ended up with, which is similar across all these subtests, is separate subtests for Spanish and English. These are not mere translations of each other. They are interdependent notions of each other. We ended up with 28 words in Spanish and 31 in English. Frankly, I’m not sure it can be much more efficient than that.

We test all singleton consonants in Spanish and English, except for ʤ in English because it’s hard to find a picture of azure.

Each sound is targeted at least once, most are once in both initial and final position if it’s appropriate for the language. And again, we’ve included commonly occurring clusters, syllable initial, and abutting consonant pairs.

We target all vowels. At the time we designed this, too, there was work being done by people like Martin Ball and Karen Pollock indicating that vowels were important to look at, particularly for English speakers. And I’ll show you an example of that in a minute.

Words of varying lengths. Spanish has one- to five-syllable words. English has one- to four-syllable words. And also, words of varying stress, which is important for the languages. So, words with stress on the antepenultimate, penultimate, and final syllables.


This just gives you a snapshot of what the score sheet looks like in Spanish. Again if we want to go way, way back to a time when I know many of you were not born — that’s 1978 — Aquiles had designed a very elegant test that he called the assessment of phonological disabilities. I kind of took the format that he used for that test to have a model that was easy to administer, easy to score, and easy to transcribe. Because transcription is still extremely important when you’re doing phonology.

We have a stimulus item, with the transcription underneath it. What the child’s production is. Again, hopefully one would transcribe. You circle the score, whether it’s correct or incorrect.

The elicitation. Hopefully you can get a spontaneous production, show the kid the picture, “What is this?” “¿Qué es esto?” so that they can label it. If not, you can go for a function or get direct imitation.

We divided the words into what happens in syllable initial and final positions, so you can mark not only that there was an error, but what the error was, so you can come back and look at it in more detail, in both consonants and in vowels.

Finally, we give you the dialectal variation. Right on the score sheet, you have a sense as to whether that production is affected by dialect or not.

Here’s one in which the child’s production of [ten] for [tren] is not dialectal. The score would be a 0, because for the initial consonant cluster the child produced a [t]. That’s a score of 0 for that word, it’s incorrect because there is one error in it.

If the child had produced the word with a dialect feature, that is, the deletion of the final n, the nasalization of the vowel. So instead of [tren] it’s [tɾẽ]. That would be correct according to the dialect, so the kid gets a 1 for that, that there is no error.

The same thing works for English. Here’s an example for frog.

Scoring and Interpretation

So, what do you end up with?

Really, what you’re looking at in terms of the differentiation of typical versus of disordered, is the number of correct in terms of consonants. But what we want to do, again, is be more comprehensive than simply saying how many are correct and how many are incorrect.

So you can look at percent correct in whole words. Initial versus final. Total consonants versus vowels, and all segments.

This is what it ends up looking like when you complete the entire assessment. What we’ll focus on is what’s here in red. For this sample child, their total consonants correct is about 56% in English and 57% in Spanish. Which, when you have a kid who truly has a disorder, this is not uncommon. That is, both languages will be low, and they will both be somewhat similar to each other.

Will it be this close? Probably not. But they are both going to be low, not one is low and the other is not low.

If you look at our sample child versus normative data that we had for typical kids and kids with disorder, because as Liz said, we tried this out on numerous kids before the actual norming sample, so we had a comparison to make to be sure that the test we were creating showed kids who were typical as typical, and kids who were disordered as actually having disorders.

Again, we tried to build in secondary analysis, so you can look at the kid’s phonetic inventory percents on at least major or common phonological patterns, and also look at other errors. Things like initial consonant deletion, backing, or deaffrication. That is items or patterns that we don’t often see in typically developing kids.

And again, vowel errors. Vowel errors are not really discriminating in Spanish, but they absolutely are for English. So if you have a kid with a whopping phonological disorder in their Spanish, their percent of vowels correct in Spanish is going to be really high. It’s not a diagnostic marker, where it is for English.

Phonology Subtest Highlights

What’s most pleasing at the end of this long and winding road is — according to Elena Plante’s designation of sensitivity and specificity being 80% and above is acceptable, and 90% is good — in both English and Spanish, the test’s sensitivity is above 83%, and for specificity it is above 90%.

If the kid has a disorder, it is likely the test will show that they have a disorder. If they are typically developing, the test will come out saying that, yes, the kid is typically developing.

What we think we’ve created at the end of this quite, quite, quite long and absolutely up and down, winding road, is a reliable, valid assessment that accounts for dialect, gives you the opportunity for primary and secondary analyses, and is both effective and efficient.

Brian A. Goldstein
La Salle University

Originally presented at the ASHA Convention (November 2013) as part of the session Development of a Bilingual Test for Spanish-English Children: A Long and Winding Road. Co-Presenters: Elizabeth D. Peña, University of Texas at Austin; Aquiles Iglesias, Temple University; Vera F. Gutierrez-Clellen, San Diego State University; Brian A. Goldstein, La Salle University; and Lisa M. Bedore, University of Texas at Austin.
Disclosure: All of the above-listed authors/co-presenters benefit financially from royalty payments from the Bilingual English-Spanish Assessment (BESA.).
Copyrighted Material. Reproduced by the American Speech-Language-Hearing Association in the Clinical Research Education Library with permission from the author or presenter.

