Bilingual Morphosyntax Assessment Design

Sixth in the series on Development of a Bilingual Test for Spanish-English Children.

Vera Gutierrez-Clellen

DOI: 10.1044/cred-pvd-c13011

The following is a transcript of the presentation video, edited for clarity. Presentation slides are available for download via the PDF button in the toolbar.

Originally presented at the ASHA Convention (November 2013) as part of the session Development of a Bilingual Test for Spanish-English Children: A Long and Winding Road. Videos in this series are:

  1. Challenges in Assessing Bilingual Populations

    (Elizabeth D. Peña, University of Texas at Austin)

  2. Steps in Test Development

    (Elizabeth D. Peña, University of Texas at Austin)

  3. Bilingual Phonology Assessment Design

    (Brian A. Goldstein, La Salle University)

  4. Bilingual Pragmatics Assessment Design

    (Aquiles Iglesias, Temple University)

  5. Bilingual Semantics Assessment Design

    (Lisa M. Bedore, University of Texas at Austin)

  6. Bilingual Morphosyntax Assessment Design

    (Vera F. Gutierrez-Clellen)

We started with spontaneous language samples, and with the literature, with the reviews of the literature in Spanish that were available that were pretty much based on monolingual children, looking at their Spanish-language development. And with the knowledge that we had then about cross-linguistic differences in the manifestations of the language impairment when we were looking at grammar.

With all this initial work that we did, we started with the assumption that children who are learning Spanish and English — who are dual-language learners, with different levels of development in each language — that if they have a language impairment, they would show grammatical difficulties in both languages. But, it would be represented in a different way.

Grammatical Measures

We are going to go over some characteristics in terms of morphosyntax of the grammatical problems that these children have in each language.

Based on the literature and the initial work that we did, we focused on developing a morphosyntax measure for Spanish that would focus on articles, clitic pronouns, complex verbs such as subjunctives, and complex syntax such as the use of a sentence repetition task that is included in the morphosyntax test.

The Spanish measure is based on linguistically appropriate targets. We are not translating. We are not looking at what you would look at in English in Spanish, even if you have the same morphemes. This will become clear when we look at English.

In English we didn’t focus on articles, for example. We focused on morphology, and we also looked at possessive nouns, plurals, and passives.

Basically, we developed two tests for the price of one, looking at very different grammatical features in each language.

As was discussed before, our intent was to develop a measure that would have sufficient sensitivity and specificity across different groups of speakers with different dialects of each language, and with different levels of proficiency in each language.

One of the things we had to do was to look at what grammatical differences we should expect in speakers of different Spanish dialects. Based on the review of the literature we decided to provide alternative scoring for specific morphemes that could be vulnerable for speakers of specific dialects. We looked at features of Puerto Rican Spanish and other Caribbean Spanish dialects, as well as Mexican-American Spanish. Then we tried also to look at what we could do if a child learning English as a second language, or being raised in an English-speaking community, speaks a non-standard variety of English, such as some features of African-American English or Korean English that may be different from other groups of English speakers who are bilingual.

One important issue is that you may find grammatical differences in children that may be related to their limited use of their home language. You may find that children will experience loss or attrition of their home language when the home or the school context doesn’t promote maintenance of the home language.

It’s almost like a moving target. We are trying to find the right level of exposure and use to be able to say, “This child is not experiencing attrition, there may be something else going on.” On the other hand, children who are only tested in the second language, who are learning English as a second language, may show errors that are not related to language impairment, but that are related to limited English-language development and proficiency.

Test Development

We started with a large pool of items. We had initially 112 items for Spanish and 127 items for English. We didn’t want to give up any of them. That was a struggle. But with some pilot studies that we did initially, we cut down each language of the morphosyntax test to 73 for Spanish and 63 for English. We ran our first discriminant analysis study based on that set of items.

Spanish Items

I want to give you an example of how we looked at it. We used a closed task to assess Spanish articles: “Los niños tienen unos carros. ¿Y aquí qué tienen los niños? Tienen… ” And we were targeting the article in front of the noun. Here the children have some cars. (It doesn’t work very well in English). And here, what do these children have? They have… A car.

So we are looking at articles in Spanish because children who have language impairment have a hard time producing them with correct gender agreement and number agreement. And many children omit them altogether.

We also focused on the use of clitics. Here, “Juan is going to paint the table. And here, what is Juan doing with the table? Juan …” And the target is painting “it” so we are looking for the direct object in this case. “La pinta” which has to agree in gender with the gender of the noun and also with number and case.

Clitic pronouns are complex morphemes in Spanish. That’s why children with SLI — specific language impairment — have a hard time producing them correctly. Either they omit them or they don’t use correct agreement.

Then, using this closed task, we looked at the subjunctive. Here, “La mamá quiere que pongan la mesa.” The mom wants them to set the table. And here, what does she want? Mom wants them to — coman/tomen here you have the subjunctive verb that is targeted, that is obligatory in this sentence completion task.

In the scoring we focused on not penalizing dialectal differences related to the use of “leísmo” le/lo for him which is common in some varieties of Mexican Spanish.

Or plural omissions in articles and clitic pronouns, which may be found in Spanish Caribbean dialects.

Spanish Test Analysis

In the first study that we did, we had 160 kids. We had 80 with language impairments, 80 with typical language. We sampled from Texas, Georgia, Pennsylvania, and California. They were randomly assigned to exploratory or confirmatory groups to evaluate the classification accuracy of the Spanish morphosyntax set.


We looked at the level of classification when the child was considered Spanish-only proficient. That means the child had limited or minimal English proficiency. They were not monolinguals, they just had minimal English. With or without language impairments.


And we looked at the classification of the measure with Spanish dominant bilinguals. So we had children who were Spanish-dominant and impaired or had typical language development.


Here are the results of that first study, where we looked at whether the Spanish measure had a different effect on children who spoke different varieties of Spanish. And we compared Caribbean Spanish scores with the Mexican Spanish scores, and there were really no differences in terms of the performance on the Spanish morphosyntax across these dialects.


For the Spanish morphosyntax we had, in this initial study that was based on the 73 items, this was the exploratory stage of the study, we had high sensitivity and specificity for the early age group from 4 to 5. Much higher for the five year olds, and then it went down for the six year olds. That was what we did for Spanish with those cut off scores, taking into account dialectal differences of these two different Spanish varieties.


When we looked at Spanish dominant versus Spanish only proficiencies, we didn’t find any differences, but remember these were Spanish dominant kids. If they had been English-dominant it would not have worked So, the measures of dominance were based, for this particular study, on the proportion of grammatical utterances in each language. So if they have higher grammaticality in Spanish, these kids were considered Spanish dominant.


We didn’t find any differences across the bilinguals if they were Spanish-dominant, compared to the Spanish-only proficient, on the Spanish morphosyntax.

English Items

For the English morphosyntax, we were at 63 items. And these were the types of structures that we looked at. And this is one of the examples, “Today he is going to take a bath. And yesterday, he did that too. What did he do yesterday? Yesterday he…” “took a bath”, past tense.


We had plurals, we had third person singular.

English Test Analysis

And these are the results of the study, looking at the classification accuracy of these items of the English morphosyntax test. We had 111 children tested in California, Texas, and Philadelphia. We had 59 with typical language, 52 with language impairments. English proficient and English dominant.


So this was the first round of analysis where we tried to look at the classification accuracy of the English morphosyntax across two different groups of English speakers, the English only from the Southwest and the English dominant from the Southwest areas — Texas and California — and then from the Northeast.


You could see that sensitivity was good, and specificity was good across the groups, but not so good across the Northeast group in terms of specificity. Sensitivity was good, but not specificity.


We wanted to look into this further by looking at the specific items in this particular group of children, and look in particular at who were in this sample.


Before we get into that, one of the things we did in this initial work, was to get information about whether the child may have been using a variety of non-standard English through the parent questionnaires and teacher questionnaires. From that, we were able to see whether there was a sample of children that may have had lower performance, that were typical, but may have had lower performance due to use of a Caribbean-English dialect variety. We tried doing some alternative scoring and modifications. But then when you do that, you are taking sensitivity out, because the same features that identify language impairment in monolingual English speakers or in English dominant bilinguals from the Southwest were then ones that would have penalized speakers of nonstandard English to some extent. Some of those speakers. Because there is a lot of variability to it. So it wasn’t very clear that we could get the job done, if we took those out altogether.

Further Item Analysis and Selection

So, in the next step, what we did was to look at the performance of children, using a larger sample, and conducted item analysis to reduce the number of items even further. And then we looked to see if that final set was able to provide as good classification accuracy as when we had the 73 items in Spanish and 63 items in English. And see if it also worked across the different dialects of Spanish and English.


So in this sample we had 492 children who took the Spanish morphosyntax. And 393 took the English morphosyntax. And 128 took both languages.


We had here the number of kids with typical language, with impairments, across the three age groups, and across the two languages.


And we started looking at item discrimination to see how the different items worked across the age groups that we had.


We had our age range was from 4 to 6 years 11 months. So, every six months we were looking at how the items worked for that particular age, 4 to 4 years 6 months. 4 years 6 months to 4 years 11 months, and so on. And we would see what percentage of children who had typical language development passed the item, compared to the percentage of children with language impairment that passed the item. In item discrimination we subtract those percentages, and that’s how we get the index of discrimination that Liz was talking about before, per item. It is an extremely laborious process, because we had to go through all the items. And we had tens and tens of items, and hundreds of children.


The principle that we used for retention of the items and dropping items for the final set, was that an item was going to be retained if we found greater than 0.25 discriminant value across a minimum of three age groups out of the — we had two subgroups per age — so out of six, we had to find this value in at least three of those age subgroups, no negative values for the other age groups, and we had to have an average of at least 0.2 across all the groups. It was a very deliberate procedure to decide which items were going to be retained because they had to have at least this discriminant value.


We selected, then, the best three to four items for each target. So we ended up with a total of 52 items for Spanish, 57 for English. We still have a good number of items for the different types of morphosyntax targets that we had.

Maximizing Classification Accuracy

I’m going to show you here how we were able to maximize classification accuracy. When people try to develop an assessment for bilinguals, typically they compare bilinguals in each language using a monolingual standard or monolingual referent.


These are all children with different levels of proficiency in each language, so what we did was look at the best language score. So, for morphosyntax, what was the best language score, comparing their English score and their Spanish score? Then, we looked at what level of sensitivity and specificity it gave us to use the best language score. That’s how we found, using the best language, we had a very high level of sensitivity across the three age groups, and also specificity across the three age groups.


Lisa and Liz talked about the differences you may find in performance from one language domain to the other within and across languages. So what we did was use a composite score, combining the best score for morphosyntax and semantics combined. So we had a composite score that looked at the best score in morphosyntax and the best score in semantics regardless of language, for example morphosyntax English plus Spanish semantics and so on and so forth.


Here are the levels of sensitivity and specificity when we use the best language of morphosyntax and the best language of semantics combined in this composite score. As you can see, we have very high sensitivity and specificity across the three age groups.


No test gives us everything that we want and is perfect. In terms of the potential for bias for specific items and for maybe specific users of specific varieties of that language, we did an item bias analysis where we compared differences between the east coast and the west coast, and tried to find if there were differences for specific items. We identified the number of items that had differential item functioning across the regions. You could see that it is in the English morphosyntax subtest that we find a higher level of potential differences in our sample. So we went farther and looked at east coast composites, since those were the ones we had found in the initial stages of the research to be more variable across regions.


So, here what you see is that using this composite score, semantics plus morphosyntax, for the east coast group sample only we still have good sensitivity and specificity, but not so good for the five year olds, if you will, and some of the potential explanations for this were discussed earlier.

Vera Gutierrez-Clellen
San Diego State University

Originally presented at the ASHA Convention (November 2013) as part of the session Development of a Bilingual Test for Spanish-English Children: A Long and Winding Road. Co-Presenters: Elizabeth D. Peña, University of Texas at Austin; Aquiles Iglesias, Temple University; Vera F. Gutierrez-Clellen, San Diego State University; Brian A. Goldstein, La Salle University; and Lisa M. Bedore, University of Texas at Austin.
Disclosure: All of the above-listed authors/co-presenters benefit financially from royalty payments from the Bilingual English-Spanish Assessment (BESA.).
Copyrighted Material. Reproduced by the American Speech-Language-Hearing Association in the Clinical Research Education Library with permission from the author or presenter.

