Noisy Data: Incorporating Variability into Your Analysis

Advice for "thinking about the noise" and the value of adding mixed model analysis to your research toolkit

April 2014
Richard Schwartz
Originally Published in CREd Library
https://doi.org/10.1044/CRED-AI-BTS-001

Advice for “thinking about the noise” and the value of adding mixed model analysis to your research toolkit

Noisy data has a high degree of variability and a greater range — of scores, or reaction times, or whatever it is you’re measuring. In a lot of the cochlear implant studies we’re doing now, we’re using eye tracking. So for us it would be more variability in how they look at the pictures they’re shown as they hear words. Noisy data really just has variability in success or in performance, ranging from what might be considered on a given trial to be a failure versus a success.

The other part of the variability, actually, comes in terms of the individual items. It may be that for these children who are outliers, they have particular problems with particular items on the experimental task that we don’t anticipate. We control certain variables. But there’s always the possibility with our statistical analysis to go back and do item analyses and find out if this child who is performing over here — versus this child performing more at the mean or median — if this child is having particular problems with specific kinds of items, you can go back post hoc and really look at the characteristics that you didn’t control that might be affecting the performance for this child.

How do you look at “noise” in your analysis?

What we do in research with clinical populations, is very often we create these groups. We do these group comparisons, which actually masks individual differences. Very often when you do these group comparisons, your data are fairly noisy. And often noisier for the clinical group than for the typical control group. In those cases, it’s particularly interesting to think about the noise, and to think about ways to do the group comparisons — which I still believe are important — but also to look at the individual differences among the children, in my case because I do research with children, in the clinical group.

We’ve begun this move away from analysis of variance, which has been the typical statistical approach to these data. Analysis of variance requires that you have these two distinct groups. But sometimes what happens in that case is that your group definitions for the typically developing children or for the clinical group are such that you have to exclude children who are in the middle — who are not quite within normal limits, but they are not quite impaired enough to be within the language disorders group, whatever group that is.

What we’ve been trying to do — like other fields in behavioral sciences — is move toward a different kind of statistical model, which is called varyingly mixed model analyses, or hierarchical logistic regression, or hierarchical linear models. But they are all sort of the same thing — it’s a statistical analysis that is derived from regression analyses. What you can do is throw all your children in, typically developing and not, and then take certain scores whether it is their performance on experimental tasks, or their performance on a test battery that you give them, and look at how it predicts their actual performance on the tasks.

This yields, first of all an inclusion of more subjects, and it allows you to look at these individual differences. So you begin to see who the outliers are. You can begin to understand what demographic variables condition their outlying performance on the experimental tasks. I think that’s a very important move forward for our field, as opposed to these sort of forced groups which exclude children who are sort of in the middle.

How are you applying this in your current research program?

I think about this a lot with this current grant that I have, looking at lexical access and language processing more generally in children with cochlear implants. The whole story of cochlear implantation in terms of its main goal in children — which is to give children the opportunity to acquire oral language — yields a wide range of outcomes.

Some children are absolute stars. They may have some underlying deficits, but on the surface their language is as good as any peer they are compared to. Whereas there are other children who really don’t do as well with the same opportunities, with the same device, the same intervention. And we really don’t know, still, what causes that range of outcomes in children with cochlear implants. We know some obvious things like earlier implantation is better, good intervention — good habilitation is better. But there are many unknown factors in this.

I think it’s important for us to not only do the comparisons we’re obligated to do on the grant — which is to compare the children with CIs to their typically hearing peers. But also to really understand if there are subgroups among the children, and if some of the demographic data that we collect might predict what the outcomes are.

What advice do you have for researchers looking to add this to their “toolkit”?

For me it’s a new thing. I’m at a point in my career where I was hoping to just keep doing Analyses of Variance until retirement. What happened was, I had a doctoral student whose data were particularly well suited to this newer kind of analysis. And we had a lot of help from a faculty member and a doctoral student who were statisticians in our Ed-Psych program.

That exposure made me start reading. I went to a couple of workshops, and I pushed myself — despite my initial reluctance — to add this new tool to my research repertoire and to learn this, because I saw the value in it.

My students, up until the last couple of year have been trained — by me — to think in terms of analysis of variance designs. The newer students are hearing this kind of approach. The older students are going to have to do what I do, which is to re-tool. There are lots of places where you can go for workshops, lots of universities have people in the Stats department, or Ed-Psych, or in Psychology who do these analyses and I would suggest they contact them.