Auditory Scene Analysis: An Attention Perspective

Auditory scene analysis facilitates the ability to perceive sound events in the environment, such as when listening to your friend talking in a noisy restaurant. Neural processes must account for the dynamics of such situations. This presentation will out

Elyse Sussman

DOI: 10.1044/cred-pvd-c16002

The following is a transcript of the presentation video, edited for clarity.

Good morning. I first want to thank Karen for inviting me. I am delighted to be here and have this opportunity to share some of the research from my laboratory with you today. I first am told that I need to disclose that I have nothing to disclose.
Attention plays an important role in how we perceive and understand the environment. So today in my talk I am going to focus on some experiments that demonstrate how we process and organize sound to form coherent events in the environment that are useful for perception.

This process of identifying the events involves interaction between automatic or stimulus-driven and attention processing that facilitates our ability to navigate in noisy environments.

I’m first going to give you a brief outline of what I’m going to go through and my talk today.

First I’m going to define auditory scene analysis. And then I’m going to talk to you a bit about how we measure brain processes that are associated with auditory scene analysis. And then I’m going to go into how attention influences scene processing. There are three parts to this from the stimulus-driven or automatic processing; how do we represent sound; and then how attention modifies what we perceive.

What is auditory scene analysis?

So the first thing is: What is auditory scene analysis? Auditory scene analysis is a fundamental skill of the auditory system that allows us to perceive and identify events in the environment. It’s that skill that allows you, as you heard Barbara talk about, it’s that skill allows you to select out the voice of your friend in a noisy restaurant and listen to them. Or that you can listen to and focus in on the melody of the flute in an orchestra.
The interesting thing about this, if I would ask you to come and join me at this cocktail party. You would hear people talking, you would hear glasses clinking, the wind is blowing in the background. Maybe there’s music playing. And it is the sounds that enter your ears in a mixture from all the sounds in the environment. And then the brain disentangles the sounds to provide neural representations that maintain the integrity of these distinct sources. And this is the process of stream segregation — segregating them out and maintaining these distinct representations. And this is what we’re trying to understand: What does the brain do? What happens automatically? And how does attention modify what we perceive?
I’m going to give you an example because in my talk today also I’m going to talk about — not about speech at all. I’m going to talk about simple sounds, tone processing.
So I’m going to use an example from music. Composers have known about this remarkable ability of the auditory system for hundreds of years. This piece is one of my favorite examples of auditory stream segregation. It’s by a composer named Francisco Torrega, a Spanish composer. He’s actually now most famous for this little phrase in one of his solo pieces of compositions.
The Nokia tune, there you go. But in this piece of music we have one timbre, one sound source: the guitar. And the sounds, as you can look at the beginning of the score here, they’re played sequentially across a range of frequencies, and what you’ll experience is multiple sound streams that occur simultaneously but converge harmonically.
Now this particular piece of music, guitar piece, uses a challenging guitar technique called the tremolo. What that is, is the guitarist plucks on a single melodic note with the three fingers of one hand, and you can see they look sort of like triplets here. And in the thumb they play a counter bass melody, and it’s a very challenging technique for a guitarist. But what you’re going to experience is separate sound streams — two melodic streams. And if you don’t know the story that I’m telling now when you were to hear it without a visual of a man playing a guitar, people often think it’s a duet of two guitars. So I’m going to let you listen to this now.
So hopefully you got a flavor for how that goes with the sustained note and the melodic below.
And to take this into the laboratory of playing sounds sequentially and how you segregate them out and hear them as integrated or segregated streams: In our laboratory we have an example here with some different frequency tones that are played sequentially. It’s sort of in a waltzing rhythm: bump bump bump, bump bump bump. And what we do is we vary the distance between the two sets of sounds, and then we want to know how those sets of sounds are represented in the brain is either one or two streams and how people perceive them. So here, when they’re integrated you can see this sort of waltzing tune. And if they are segregated in frequency we might hear them as two distinct frequency streams. And then they have a different rhythm. There’s a single asynchronous rhythm in the bottom and a little flutter in the top of the second stream.

Measuring brain processes associated with auditory scene analysis

So before we get into the experiments I want to explain a little bit of how we use electrophysiology to measure what the brain is doing with the sounds.
So event-related brain potentials are really excellent tools for understanding how sound is indexed in memory and how it’s held in memory. And the way that we get the event-related brain potentials is we record the EEG, as Barbara was talking about. We use an electrode cap, and here’s one of our happy subjects. And as we record the EEG, we want to extract the evoked potentials from that continuous EEG recording so we can look at the voltage changes that are time-locked to the specific events that we’re interested in.
And I’m going to give you a further detail about how this works. I was assuming that some of you in the audience might not be quite so familiar with it, so here’s how we do it. The subject is sitting in a chair in the sound attenuated booth. They have a cap on with the electrodes. They’re hooked up to the amplifier. They have insert earphones in their ears. We’re recording their EEG, and we play sounds through the insert ear. And for each sound, it’s time-lock stamped onto the EEG record. So that when they go home we can then analyze what is the brain’s response time lock to a particular sound that we’re interested in.
And what we do is we segment out those pieces that occur a little before and a little bit after the sound event, and we segment them out and average them together. Because the brain activity is quite noisy. We get all the spontaneous neural activity, and we want to know what is the specific response to the events that we’re interested in. And as we average together these multiple trials, what emerges is a time and phase-locked ERP component. And it has changes in the polarity of this waveform. And this is a classic auditory evoked potential that you’re looking at here. An obligatory response just to the sound onset. What you see on the Y-axis is the display of microvolts, the amplitude of the of the response, and across the X-axis we have time. So you see this is a very rapid response from time onset. And if you follow from the baseline we have three main components that have different polarities. We have a P1 which peaks around 50 milliseconds. It’s the first positive peak, followed by a negative potential that peaks around a hundred milliseconds we call that N1, the first negative peak. And then there’s a second positive peak around a hundred seventy milliseconds. And this is the classic onset obligatory response.
But when we want to know what the brain is doing with the sounds we want to get a little more complicated, and we want to understand how they’re processing a multitude of sounds. So now what we want to do — this is the obligatory response — we want to know, this is the exogenous what’s happening stimulus-driven. We want to know what’s happening in internally, endogenously.
So we presented what we call an auditory oddball. One example of what we might do: We repeat a sound over and over, and in this case we change the frequency of the sound represented in pink here. And we want to know: Did the brain detect the difference between repeating regularity, which we call the standard, and the pink one that changed in frequency, which we call the deviant. So we averaged together the response to all of the standard sounds, which is the black ones that I showed you, and you’re looking at. And then we averaged together the response to all of the randomly infrequent-occurring deviant sounds, that’s represented here in pink. And you can see overlaying with the obligatory components — the P1, N1 and P2 — there’s a negative displacement that begins around one hundred milliseconds and continues through the P2 response.
That negative displacement that’s based on the standard is what we call the mismatch negativity. It’s the index that the brain detected something different happened here, and we get this negative displacement. And the way we visualize the MMN, is we subtract the standard ERP evoked response that we got from the deviant response, and the subtraction eliminates the obligatory components and leaves you with a peak latency of change detection: the mismatch negativity component. And this is a front electrode site where we see the greatest signal-to-noise ratio for the mismatch negativity.
Now the mismatch negativity component — the reason why I’m going into so much detail is that the measure that I’m going to use for all of the experiments that I’m going to talk about. And I’m going to give you a little bit of information on why this is such a valuable tool for looking at auditory scene analysis.
The first thing about it is that it’s modality-specific. It’s generated within auditory cortices. So we know where we are in the brain. The second thing is that we don’t have to ask the listener what they heard to index sound change detection. So we can record the sounds coming in the ears, and we can understand what’s going on in the background when the listener is doing something else in the scene, whether it’s visual or auditory. They can be ignoring the sounds, in other words. They still have to hear, the sounds but they can be doing a task with them for this change detection response to occur. Or they can be actually listening to and pressing the key for the deviants, and we also get the MMN. So we can learn about what’s happening for unattended sounds when attention is directed in many different ways, whether it’s ignoring the sounds or attending to the sounds.
The last thing that’s really crucial because we want to understand about sound organization at a somewhat later level, how they’re represented in memory — the element is strongly context-dependent. And context-dependent means its elicitation is based not simply on the features of the sound, but on the memory of the history of the sounds that have been ongoing. I’m going to give you what I find quite a striking demonstration of what that means. Not a scene analysis situation, but just with two tones that we did that we did early on in my lab.
Here is the random oddball where we present the tone twenty percent of the time randomly in a sequence with another series of sounds. And as I just demonstrated, when we do that we get an MMN elicited by the deviant. And what we would say the standard is, is this repeating regularity in time, location, intensity, duration of the sound. The only thing that changed in the case that I showed you was the frequency of the sound. We could change any feature, and same thing would occur. We can change intensity or any other feature.
And so we get MMN based on the standard. The standard is the context and the deviant is this change in frequency. But if we take those same two tones with the same exact probability, except now what we’re doing is we’re going to group the sequence of sounds so that the deviant occurs every fifth tone — or the pink frequency occurs every fifth tone. Now it occurs the same probability, but now there’s a there’s a sequence of five tones that repeat, repeat, repeat. And now what happens is MMN is no longer elicited by the deviants that occurred before. The reason for that is that even though it’s infrequent in the block, and it is a different frequency than the black squares represented here, what is the standard now– if the brain detects it — is this five-tone repeating regularity. And you don’t get an MMN from a standard, you get an MMN from something that’s detected as being different. So now this frequency of tone is no longer deviant.
So the thing that you should understand about MMN, which is something that I did not know when I was a student doing my PhD, and learned this later — very early though as I was working on it — was that you don’t elicit MMN simply by having an infrequent frequency or feature that occurs in sequence. There has to be a standard that’s detected that serves as a basis. And most of the MMN studies that were done use the oddball paradigm, so the standard was just a repeating sound. So everybody just thought that was that was as far as it went, but it’s much more complicated.
So when MMN is elicited you know that a change was detected and you can infer what the standard was. And that’s going to form the basis of all the experiments I’ll talk about. And I’ll talk about the different standards in a little more detail.

What happens automatically?

So the first thing I’m going to speak about is what happens automatically? What happens when you don’t have a task for the sounds? Are they organized or are they simply background noise? How are the sounds sorted and stored in memory when you have no task with the sounds?
What I’m going to show you is that this stream segregation process is very primitive, as Al Bregman speculated in his early work, that it occurs automatically, without any task with the sounds.
Now to explain this, I need to go through a few processes because I’m going to show you a number of different things that occur in this particular situation that we’re looking, at not just stream segregation. So first I’m going to talk about a phenomenon that we discovered earlier on. And now we’re going to look here at the influence of the context, but not in a stream segregation situation, just how the context influences how we process sounds in the environment.
So here what we have is, as you can see, a somewhat modified oddball. That every time a deviant occurs, another deviant follows it. Every time a deviant occurs, another deviant follows it. And the thing to notice here is we have a very rapid ISI. A very rapid stimulus onset rate of a hundred fifty milliseconds, which is within the temporal window of integration. So we would say that some elements that fall within the temporal window of integration tend to get integrated together. You need to note that because it works differently with longer stimulus rates. And I’m going to play this for you and you can listen and also see if you can hear them as single or double.
So, every time a deviant occurred, another one follows it. And what we found was that we got one MMN to those two successive events. We interpret this to mean that in this situation every time a deviant occurred, there were two, so this was one deviant event.
Now another way you might interpret this, you might say maybe the MMN generators are refactoring and they just can’t put out two MMNs within a hundred fifty milliseconds. Well, we ran another condition where we mixed up what the deviants were. So now every time a deviant occurred, it wasn’t fully predicted that another deviant would follow it. We had what will call single deviants and double deviants in the block at the same pace. And now what happened was we got two MMNs. This same exact physical input put out two MMNs in the mixed condition whereas one MMN in the block condition. And what this indicates here is that this second deviant here is giving new information, so we need to represent that in a different way. And even at this even at this very rapid pace you can see that the MMN is distinguishing.
One other thing to look at here, when they were integrated together, you can see the peak latency of the MMN was longer than when they were two separate ones. These are separated by a hundred fifty milliseconds — remember that timing with evoked potentials is very precise. So we have the timing of these two MMNs, one hundred fifty milliseconds apart. So you can see clearly that the context of having the single deviant influenced whether this exact same input was processed as one or two events.
Now we take that phenomenon, an oddball, into a streaming paradigm. And now what we did was we have a set this this set of low sounds, four hundred and forty hertz, and then a set of higher sounds quite far away so they should automatically segregate if you’re not paying attention to the sounds. But now the pace is seventy five milliseconds onset to onset and so you have seventy five milliseconds from low to high low to high, they’re alternating, and we maintain this hundred fifty milliseconds between double deviants, the same that we saw, but the only difference is that now we have a tone that actually occurs in the input to the ear — there’s a high tone that occurs in between every low tone so this is a tone here. And what we wanted to know was: that if the sounds segregate automatically, and you get a representation of the low and a representation of the high held separately in memory, then the contextual effects can occur on the low stream, and should we get one MMN from in the block condition and two MMNs in the mixed condition. The converse would be true if they’re not segregating automatically, then they have all these tones occurring in the middle we should get two MMNs from in both situations.
What happened was we found that we got one a MMN in the block condition and two MMNs in the mixed condition, which to us indicates that the within-stream event formation is based on the already segregated input. So if you think about yourself in a cocktail party that we were visiting before and the first thing that happens when you enter the room is you all the of sounds are segregated based on the spectral/temporal characteristics of the input, and then the events that are within every stream get formed. So you’re listening to your friend talking or listening to a melody, and you’re making the calculations of what events are in that stream based on the already segregated information that we have.
Let’s put that into a schematic model of what would be happening here. Look at all of the different processes that are happening in a bottom-up fashion. We have the sound input coming in, it gets parsed based on the spectral/temporal cues, and then the streams are formed, events are formed on the already segregated information. So the standards are formed here, and then the deviance detection occurs, and then we get MMN. A lot is happening before MMN output, which is already at a hundred fifty milliseconds.

How does attention modify neural representations of sound?

So that was the first thing we talked about. The second is, now we’re going to look at how does attention act on the neural representations? How does it modify what it is that you’re hearing? What does attention do in these complex scenes?
The first thing attention does is we find that it can sharpen stream segregation. So now you’re going to recognize that little waltzing pattern that I showed you before. This is one of the paradigms that we’ve used in the lab. And what we have here is we wanted to know what the brain was doing with the sound in both passive and active listening conditions. We wanted to have a perceptual measure, but we wanted to have two perceptual measures so that we could get some converging evidence about how things are being heard.
So we had two behavioral tasks I’m calling Experiment 1 global perception. So we had these different frequencies, and we separated the frequencies at varying distances, and we play them to the subject with different frequency separation. So we asked the subject, “Do you hear one stream, or does it sound more like two streams going on simultaneously?” And in the second experiment we recorded the EEG while we have them perform the behavioral task, and we took the same set of sounds, but this time we had them actively segregating out the low stream here. And I’m going to describe how we did this because it’s important for you to understand why and how MMN would be elicited to know how this part of the experiment would work. Because the loudness detection task is predicated on their ability to segregate out that low stream from the other intervening sounds.
So here’s how this task works. We have an oddball, and this time we’re going to have an intensity oddball — not a frequency oddball. If we look here, at 440 hertz is the sound, and I’m calling them x-tones, just to be able to call this that stream. And these x-tones were played at 66 dB SPL. And the probe tone — I’ll call it the probe tone because it isn’t always a deviant — was 78 dB. So we have twelve dB difference. And if you play a random — ten percent of the time this probe tone occurred randomly amongst a 66 dB, it’s a very easy task to do, and the subject was just press the key every time they hear the louder sound. So it’s a loudness detection task.
Now what we did was we put two tones intervening in between every one of the x-tones and now the pace of sounds is ten hertz, it’s one sound every hundred milliseconds. Now we have the sounds that are intervening have intensities that very above and below the standard and the deviant. And we did that so that this listener couldn’t anchor onto the softest or the loudest sound to find the deviant. It had to be segregated out, and so it was buried in amongst all these other intensities. Now if the sounds were integrated together, and you’re doing this waltzing rhythm, then you have a flurry of intensity. Because intensity changes randomly from tone to tone to tone, and there’s no in standard to be detected. But if the sounds are segregated, and this lower stream, x-tones form its own stream, then you have an intensity standard and an intensity deviant. So if the MMA is evoked, if it’s elicited, then we know that the streams were represented in the brain as segregated, for that standard to emerge in the low stream and the deviant to be detected. And so we’re going to have listeners watching a movie and ignoring the sounds in one condition, and we’re going to have them pay attention to and actively segregating out the sounds and pressing the key when they hear that louder deviant and ignoring the higher sounds in the other condition.
First I’ll show you the behavioral results because we had both of those experiments. And what you see here is that the semi — we use semitones. Even if you don’t play a musical instrument, most of us have seen a piano. A semitone is from the white key to a black key. That’s one semitone distance, and it’s on a logarithmic scale. So whether you play from a C to a C-sharp in the low range, or a C to C-sharp in the high range, it’s the same interval, one semitone distance, even though the absolute frequency value of those C/C-sharp in the lower range, C/C-sharp in the high range are not the same distance. So we used one semitone, which is the smallest distance on the piano, and up to five, seven and eleven semitones — this was just under an octave. And on the Y-axis we show for the Experiment 1, where we ask them to hear one or two streams, we are recording the proportion of times they said they heard two streams. And for the loudest detection task I’m showing you here, the hit rate.
What you notice that as the frequency separation got larger, as you go this way along the axis, the ability to perform the task increased, and the proportion of times a person said they heard two streams was also increased. So that was similar in both cases, and you can see that at one semitones they almost never said they heard two streams, and at eleven semitones they almost always said it sounded like two streams. And the other thing here is that these two measures were consistent with each other, so we feel that they’re measuring, it seems, the same thing. When a subject said they heard two streams they did pretty well in a loudness detection task. Just as a side note, this isn’t true in children with language impairments. But it is in typically developing children. So this is very consistent between these two measures.
So now let’s look at what happens with passive listening — they’re sitting in the booth, sounds are coming in their ears, they’re watching a movie and they have closed caption video to keep their attention onto the movie. And what we see is there’s MMN elicited by the oddball. There are no intervening tones, and we’re showing the mastoid electrode as well because we want to identify that we’re in auditory cortex, so we’re showing the dipole relationship between FC in the mastoid of the neurons that are oriented perpendicular, that are in auditory cortex. So we often show this dipole relationship so we can landmark ourselves within auditory cortex. And you see a nice robust MMN here. And you see that at eleven semitones with the interfering tones there’s still a significant MMN elicited, but it’s considerably smaller. And with seven semitones as well, there’s no difference between these two. But by five semitones there was no MMN. So we would say here that you needed at least a seven-semitone separation when you’re not attending to the sound, when they’re in the background, for them to be represented robustly enough as two distinct streams in memory. Five semitones is the distance where we would say they were ambiguous, where the subject would report most of the time, half the time they say here it is one stream, half the time they say here’s two streams. So it’s not always clear, and it wasn’t robust enough when they’re watching a movie and with no task for them to be represented here as two streams, so there’s no MMN elicited. And of course, not at one semitone.
Now we’re having them actively segregate the sounds. We are telling them to pull out the low stream and listen for the louder tone, press it every time you hear the louder tone. And now you see of course there is a robust a MMN elicited by the oddball, and again this is why we need this mastoid inversion to show you this dipole relationship. Because you notice these two other important components that are elicited, here these are target detection responses called N2b P3b, and they are non-modality-specific. They’re not elicited within auditory cortex, and so you don’t have that inversion over the N2. So we have the sequence of events to the change detection. The loudness, louder tone was detected, and then these are the target responses that go along with your reaction time. So you see a nice robust response at eleven, seven and even at five semitones.
So when you’re using your attention to pull out what you want to hear, attention can refine the stream segregation process so that you can represent two streams at smaller frequencies than you would do if you were looking around the room, and you had things going on in the background.
So let’s put this in now into our model: We have the sound coming in, and we know that the parsing is happening automatically because we use very large frequency separation. So where is attention acting? Attention is acting on the stream formation process. We can force it, that we can hear as I just said, smaller and smaller frequency separations. We can segregate out the sounds with our attention, which then allows us to form the standards that detect the deviance, and MMN is elicited at five semitones when we’re listening.
What else can attention do? Attention can override processes of the automatic system.
So now I’m going to use a different paradigm here. Not the simple oddball paradigm. This is the paradigm that I used for my dissertation, and here instead of having a simple oddball which I didn’t want to have, and I’m going to explain why in a minute. I used a complicated three-tone sequence — a three-tone rising pitch pattern, let’s call it that. But we have again, it’s a very fast, rapid rate it’s ten hertz. And the sounds are entering the ear as high-low, high-low, high-low, high-low. And we have a rising pitch pattern in each one. Now what’s interesting about this: if they segregate, then you have a standard pitch pattern and a similar one in low stream. Before I play this for you, I need to tell you something because I will challenge you to listen for it. This is somewhat of an auditory illusion because the input to the ears, as you can see if the in the bottom line, is high-low, high-low, high-low, high-low. When you listen to the sounds, you can’t hear the alternating pattern at this rapid rate. What happens is you automatically segregate them, and you can listen back and forth to the pitch patterns, and you can try to hear the alternation — most people can’t, it would be interesting to know if somebody felt that they could. This is a short sample, I’ll play it twice.
I’ll play it again. Most of the time with dominates is that rising pitch pattern in the low, and the rising pitch pattern in the high. And you can flip back and forth and listen to the low and the high and find both of them.
So one of the reasons why I wanted to use this pattern standard is because at the time that I was working on this experiment, the thinking back then was that any sound that differed in any parameter would elicit an MMN, like a global MMN. So what we did here is we reversed the order of the elements, so there was no new elements involved there were no new frequencies, no new intensities, nothing was new in here. In order for you to identify the deviants you had to identify the order of the deviants and know that it was a court occurring in the reverse order.
Now in this experiment — because I’m not going to go into, that was part of my first experiment — because I want to talk about attention a little bit here. What we have here is we have a deviant that occurs in the low stream, followed by a deviant that occurs in the high stream. Every time a deviant occurs in the low stream, a deviant occurs in the high stream. And this is a cross stream — the other time I showed you the double-deviant, it was in the same stream. And of course it goes across a longer time course, but there is only about a hundred milliseconds in between the offset of this and the onset of this one.
When we did that and the subject was watching a movie, we found an MMN elicited to the first one, the low tone deviant, but there was no MMN to the high tone deviant. Now you heard the sounds. They clearly sound segregated, and you would think that you could hear them both distinctly. So there were a number of things that were going on — they were watching a movie, they didn’t have a task with the sounds. It could have been that it was detected as a long double deviant. Or it could have been that there was so much going on, that they detected one deviant and then didn’t really catch on with what was happening anymore because they weren’t paying attention there. But you can clearly hear if you are listening to it.
So we had subjects come in for another experiment, another condition, and this time we said, “We want you to focus in on the high sounds, ignore the low sounds, and press the key every time you hear the reversal of order in the high stream.” And that’s what they did. And what you see: the black trace shows the response to the high tone. So you see they’re doing the task there’s the N2b P3b. They knew that the high tone deviant was their task. There’s N2P3 in the low because they were ignoring them — even though they were intervening in between every one. They’re alternating low high low high low high. So that’s pretty remarkable. So we have an MMN now in an N2bP3b. And you see that the low tone deviant is still being detected automatically in the same latency and amplitude as it was when they were watching the movie and had no task with the sounds.
So what we say here is that attention could — well let’s look at it here. We have a sound input and automatically streaming. And before we showed modulation at the stream formation level. Now we’re showing modulation at the event formation level. When you are paying attention to the previously obscured events that were happening, now with attention you can pull them out. And now that allows you to detect the deviations and MMN was elicited. So what was previously obscured by whatever automatic processes going on and you had no task with the sounds, attention can override these automatic limitations when you need to listen for something that’s going on in the scene.
Lastly, I’m going to show — we know that attention can select a subset of information. But here we were interested — in both of the experiments I just explained to you, whenever you actively segregated out one stream there was only one option left over. You only had one other stream leftover. So if you’re segregating a low stream, you only have a high stream left over. You can see here that I took the paradigm that I just showed you, this complicated rising pitch pattern paradigm, and now added in a third frequency stream. And now the sounds are going low middle high, low middle high, low middle high, so there are two tones intervening in between every high tone, or in between every stream of tones. And now we want to know that when you’re attending to — in this case it’s going to be the high stream — what does the brain do with the background sounds? Are they organized or are they noise? I’m gonna let you listen to it now with the streams.
What we had the subject do was attend to the high tones. We have in here — since the high tone stream is an oddball, but it has the single deviants and double deviants. So again, we didn’t want the subject just press the key every time another sound popped out because they occur randomly. We wanted them to pay attention to the high stream, and perform a task. So they press the key every time they heard the only the double deviant. So they had to listen, and it wasn’t that they just press the key anytime a deviant occurs. They press the key only when the double deviant occurs. It was a very difficult task. And now we have this in the unattended background, it’s a very complicated, it’s not just simple oddballs, we have patterns back there. If the patterns emerge, then we should get an MMN to the pattern reversal in the low and to the pattern reversal in the high. And just to note, these are not double deviants. I separated them out so there was no issue of them interfering with each other.
Okay, so now what happened? So when all of the sounds in the back were in the background, and they were watching a movie, MMN was elicited by the double deviant which was their target that you see here — was elicited by the middle tone pattern deviant and by the low tone pattern deviant. So the sounds were automatically segregated, and we made the distance between them more than seven semitones so that they would automatically segregate if they could. And we had them watch the movie, and we have never done it with three streams, so we didn’t really know for sure what would would happen there, and you can see that they segregated out automatically when they were watching the movie and they were in the background.
Now when we had them actively do this detection task, you can see clearly there’s the MMN to the first, MMN of the second — remember there are two intervening tones in there, and there’s the N3b P3b that was their task. But in the background, this is the background now, you can clearly see there was no MMNs elicited. So attention to do the task with part of the auditory scene modulated what was happening to the unattended information.
But here’s the interesting — well that was interesting, but here’s how we’re interpreting what might be going on there. So before I showed you that the sound input comes in, and now we have attention modulating what gets parsed out. We’re parsing out a subset of that information, and now we have the background. And in the attended stream, that stream you’re attending to forms its own stream, the events can be heard, and we saw the deviants were detected and we got an MMN.
But what’s happening now in the background? There are more than one possibilities on where the limitation occurred that interfere with the ability for the MMN to come out. It’s possible that attention preempted the stream segregation process, and that when you’re listening to your friend talking in a noisy room, the rest his background, there’s no structure. The other possibility is that when you’re listening to your friend in the noisy room, there’s structure to the other events going on — you know if it’s a male or female speaker, but you don’t know whether they’re talking about China or Germany. I don’t know why I picked two countries, I could have said cooking or reading. But what happens here is we had those three tone pitch patterns. And the other possibility is that the sounds were grossly segregated by frequency, but because your resources, your attentional resources, were so focused for doing this very difficult task — even though you knew there were two different streams back there, you didn’t know what was in them. You didn’t know what the events were that were in them. And then the event formation would be blocked.
Now we’ve since done a followup study which showed just that. We used entrainment, similar to what Barbara was talking about before, we used different rhythms we changed the paradigm a little bit so we could look at what the rhythms were in each stream, and what we found was it looks like stream segregation is going on at that level, but that the event formation was preempted — that that complicated finding out what’s going on in the other streams. And again if we go back to thinking about the cocktail party, when you’re listening to your friend talking, you can only listen to whether the pitch of the voice is a male behind you, but you can’t know what they’re saying simultaneously, what’s going on with the events within that stream. And my student Renee Symonds has a poster — just for a little plug here — which also is doing some follow ups. Because we only used frequency is the cue here, and so there are many other ways we can follow up with making the streams more distinctive, and to learn what’s happening automatically, how it subserves our ability to process what’s going on in the background.
So I’m going to now review: What does attention do? The first thing I showed you was attention sharpens the organization towards whatever your behavioral goals are. You can segregate out sounds at a smaller frequency with attention than you can, than happens automatically when there’s no task — when it just uses the cues in the input.
The second thing was, that we can overcome — with attention we can overcome limitations of the automatic system. If things are obscured, if there’s a lot of background stuff going on, but also you want to listen to something particularly, you can focus your attention in on that specific thing. And even though it’s obscured by other or masked by other noises you can pull it out.
And the third thing is you can select a subset of information, but that can affect processing that’s going on in the background. In other words, we can’t listen to the within-stream events occurring simultaneously even though we can get a gross picture of how they’re organized.

How is sound information stored in memory for use by attentive systems?

So the last part of the talk, I’m going to talk about a slightly different issue in this scene analysis picture, which is how we resolve ambiguity. What if the information is not clear, or can be perceived in multiple ways? What I showed you before was clear segregation. We knew that we manipulated it as such. And as an example of what I mean by perceptual ambiguity, I ask you what do you see? You either saw, first off, black shapes, or you saw a word. So the same physical input can give rise to multiple percepts. And when you see the word when you see the word “lift” the black shapes are in the background. And when you see the black shapes, the white is in the background. So the same physical input can be interpreted in multiple ways, and you’re using all that information, but you get different percepts from it.
Now we’re going to take that to the auditory system because we wanted to know in the auditory system, is their mutual exclusivity? In other words when there’s competition for neural representation does the one you’re attending to win out in neural memory? Or, can you have multiple organizations represented in memory even though one of them appears in perception?
So now I have to I’m going to walk you through the design, because it’s a little bit complex. It’s not impossible to understand, it’s just complex. Now this is a slight — but you’re gonna get it easily, because it’s just variations on what you’ve seen before. It’s a slight variation on the loudness detection task experiment that I showed you. Here we have three tones that occur isochronously, an equal distance apart, and then there’s a silence, and then three tones and a silence. And these all vary in intensity randomly, and that that would be the high stream. And here we have, in the low stream, now these are five semitones distance, and five semitones we’ve repeated this with so many hundreds and hundreds of subjects that they hear fifty percent of the time they hear it as one stream, fifty percent of the time they hear a five semitone differences as two streams. So it’s ambiguous. In the same subjects.
So what we did here was we created three different patterns. For the integrated — the first way that you can perceive these sounds is as integrated together. And it’s either integrated, we call this pattern one, and this pattern two, and this pattern three. The place at which this low tone was inserted in between these three higher tones created either pattern one, pattern two or pattern three. It occurred after the first, second, or the third one. So in order to identify pattern one, pattern two, and pattern three, you have to know where this lower tone belongs with respect to those higher tones. And that would be the integrated percept.
For the segregated percept, which would be the other way to hear the sounds, now we’re going to do what I’ve been describing before many times, which is we segregate out the low tones from the higher tones, and we want to listen to these low tones as a separate stream and ignore the high tones. So you can take the same five semitone difference, and you can either segregate out the low tones and find that oddball stream, or you can integrate them together to hear the patterns. And when you do this segregated task, you now have a jittered ISI, because the place at which it would have occurred within these makes it feel jittered — makes it actually jittered — because it either across here or there. So the distance between these is no longer isochronous. It now has a groups a jittered rhythm.
Okay, now here’s the key. We wanted to know whether this sounds were — how they were represented in the brain, and we needed to know, we needed to get an object of index. So we couldn’t have the same deviant eliciting from segregated and integrated or they would be confounded. So we had to create a deviant for the integrated percept, and we know what the deviant is for the segregated percept because I showed you that before. Well, I’m going to show you again. But we needed a separate one for the integrated and a separate one for the segregated, so that if both representations were there, we would have two MMNs.
So for the integrated deviants, now this gets a little complicated, but our pattern — pattern one and pattern two — were presented forty five percent each. So together the standard patterns formed ninety percent of the time. And the deviant pattern was pattern three. Pattern three occurred infrequently. So in order for an MMN to be elicited, the listener would have to detect that there were two patterns that occurred more frequently than the third pattern. So it’s quite a complex thing for the brain to be doing.
For the segregated deviants we did what I demonstrated before — we had the intensities vary above and below, so that only when the sounds were segregated could an MMN be elicited by the intensity deviant because we have the intensity standard and the intensity deviant.
So we presented this in a task switching paradigm. The same exact sets of sounds — randomized differently across trials, but the same sets of sounds. All the sounds played to the ears had all the different patterns and intensity deviants in them. There was nothing different about any of them — the only thing that differed was how the subject attended to the sounds, and which task they performed. And we did this in this task switching paradigm, where we would give them a visual cue and tell them which task to do, a set of sounds would come on — I don’t remember how long it was, for ten or thirty seconds — they would do the task let’s say for thirty seconds, and then they sounds would go off. They get another cue, now they would have to segregate. Sounds would go off, now detect pattern one, pattern three et cetera. And these were randomized.
So the question was — just to go back to the reason we’re doing this — when you’re performing a task with one organization, what happens to the alternative unattended organization, in this situation where the sounds can be heard in multiple ways. And we hypothesized that attention would resolve the ambiguity toward the organization used to perform the task, and that would predict that performing the task would preclude the neural representation of the alternative organization, and therefore we should only MMN to the task that you’re doing. If you’re doing the pattern task, there should be only pattern deviants, and no intensity deviants. If you’re doing the intensity task, there should be only intensity deviants and no pattern deviants.
But that’s not what happened. And this was really — it was fascinating, is fascinating still. It makes a lot of sense, and I’m going to tell you why in a minute. And we also had some control conditions to make sure it wasn’t anything that was paradigmatic because it was the first time we’ve used this paradigm.
What you see here is our scalp distribution maps. And the blue is negative, and the red is positive. We’re showing electrodes all around the head, and what you can see is the bilateral distribution of the MMN coming from bilateral auditory cortices generating the MMN response. The blue is where I was showing you the responses of the traces before, the FC electrode. And the red would be below the Sylvian fissure where we can record it nicely at the mastoid electrode. What you see here is when subjects were attending to the pattern and doing the pattern task — and remember they were doing all the pattern tasks, we were looking at pattern with the deviant and we subtracted the standards from the deviant — pattern one and pattern two from the deviant — there was an MMN. And when they did the intensity task, when they segregated out the sounds and were pressing the key, they were ignoring those upper sounds — this is the pattern MMN response that they got. And it looks the same amplitude, latency and same morphology in both cases.
And this is what really surprised us because we thought: when you’re doing the integrated task, you can’t possibly be segregating. When you’re doing the segregated task, you can possibly be integrating. The only difference here is that these are ambiguous — an ambiguous range. I’m going to talk about this a little bit more on what this means. And then just show you they were doing the task — this is red is positive, and this is the P3b showing the target response. And when they were doing the intensity task, there it wasn’t significant — though there is a little bit, it looks like a little bit of attention on those patterns. I think the patterns might have been a little bit easier to recognize.
When they did the intensity tasks, and now they’re segregating out the sounds, they attended to the intensity and this was their MMN, and when they attended to the pattern and were pressing the key for only pattern one or pressing the key for only pattern — here is the intensity MMN. Similar amplitude, latency, and morphology. Both representations were there, no matter which task they were doing. And if you look at the P3b, you can see they were only doing the intensity task when that was their job to do, and when they were attending to the pattern there’s no indication they were attending to the intensity deviants at all.
So, even when one organization dominates in perception, alternative organizations are represented in the neural trace. You don’t lose information, and this is important because auditory information is temporal. You only have the physical input of the sound to use. Once the input is gone, you only have the neural trace of the physical input once the words are said. And if you lose that information, you can’t go back and get anything else, you have to say, “What, what did you say?” and it’s still not the same exact thing. So if we’re in the noisy situation and you listen to one part of the room and lose everything in the background, how are you ever going to get it back? So when things are overlapping and ambiguous and you can perceive them in multiple ways, your brain holds multiple representations which allow great flexibility for you to be able to switch your attention around the room. So the attended organization doesn’t win out when the information is ambiguous. Attention has access to multiple sound organization simultaneously.
So the conclusions that we can make from what I’ve talked about, all the things that I’ve talked about here, are that stream segregation occurs automatically, and it sets the basis for event formation. The first thing you do when you walk in a room is segregate out the sounds, and then the events get formed from within that scene. Attention interacts with automatic processing to facilitate your task goals. So even when things are precluded, we saw, you can pull them out if you want to listen to them — if they don’t happen automatically — and you have this interaction between what’s going on with what your brain does with the stimulus-driven information, and then how attention modifies that input. And the third thing is that you don’t lose information when you listen to your friend, you can hear things that are going on. You do lose some of the discrete information of what’s going on within the stream, but you don’t lose the global picture. And it’s a multilevel analysis that involves these interactions.
So let’s put this all together.
A little Bach for you in the morning. So if we’re putting it all together, we can think of an orchestra as an example. Where you have you can sit back and listen to the global picture of the orchestra, where you have the harmonies and the passing melodies trading off between those strings and the base and reeds or the brass and reeds I should say. Or you can choose to focus in and listen to the melody of the flute. So you have multiple organizations that are represented in neural memory. And this allows you, or facilitates your ability to switch back and forth amongst the different streams of events to identify what events are going on in the environment. And as I said a moment ago, there are limitations. You can’t pull out all the events that are going on simultaneously, but you can while you’re listening to your friend, get a global picture of what’s happening. Or when you’re listening to the flute, have a global picture also of what’s happening.
So the main conclusion is that we have both the local and global representations that allow for great flexibility in the auditory system to allow us to switch our attention around the room, and to deal with and facilitate our ability to deal with these noises situations when there are competing sound streams.
So thank you. The last thing I want to do is just acknowledge some contributions from the members of my lab. Kelin Brace, Elizabeth Dinces, Wei Wei Lee, Renee Symonds, and Joann Tang. Many of those people are in the room right now. And some of our collaborators who have contributed to the work that I talked about, which is Al Bregman, Kathy Lawson, Alesia Panesse, and Mitch Steinschneider. And of course we always like to thank NIH for allowing us to do this work.

Questions and discussion

Audience Question:

My name’s Shae Morgan from the University of Utah, a PhD candidate there. So we’re in a complex listening environment. We’re attending to one stream, but still processing what’s going on around to be aware. Do you think the extent to which we are able to process background depends on our motivation and other preferences for certain things, like I don’t know maybe a mother would be more predisposed to hear a baby than somebody talking about the news. Or just do you think that an individual person’s life experiences play into the extent to which they — yeah I’m trying to form this right — but the extent to which they listen in the background for other information.

That is a great question. There are multiple levels, because that would be a level beyond what’s in the stimulus information. And the great William James, the psychologist, I love this part of his book in his first book Principles of Psychology — he talked about how the intensity of the sound — I’m paraphrasing because I can’t remember the exact thing — but the intensity of the sound is relative. So that if you are in a situation and you’re waiting for your lover to tap on the window to get you to come down on the ladder or whatever, your threshold lowers in order that that faint tap on the window sounds like a boom. Because you’re waiting with such great anticipation. And that same tap, the actual intensity of that sound heard in another environment or in the background somewhere else, would be totally missed. So there is, I always took that to mean exactly what you were saying. That there’s this other level of motivation, or like the mother example that you gave, that that lowers or raises our threshold depending on what’s happening. If you’re taking the GREs, and you need to focus on the task, I assume you raise your threshold. But if your mother listening for a child your threshold lowers. I don’t know how that works exactly because that’s a different level of processing. But that would be interesting, but its relative. We do find that the way that you process sounds is completely relative and has nothing to do with the absolute physical features of the stimulus input. That we do find. So I think that would go along with what you were saying.

Audience Question:

Hi, my name is Molly Brown from the University of Pittsburgh. I might have missed you say it earlier, but I wanted to ask anyways: Were all of your subjects normal hearing? And have you ever done any of this research on somebody that doesn’t have normal hearing, and you know how they do in that situation?

Oh right, I forgot where I was for a second. So yes sorry, yes all of our subjects were normal hearing. We do a hearing screening with all of our subjects to just check to make sure that their hearing status is normal. And if we have any questions we have Wei Wei Lee in the audience and she does are our TMS and anything else we need to do. So yes they have normal hearing. And as a matter of fact. I would like to do this more with children, but we have done this, we haven’t yet published the data, but we have done this in aging individuals. Elizabeth Dinces who works in my lab has a geriatric population, she’s a clinical neuro-otologist and we have taken older adults, above sixty years old — a group of sixty, a group of seventy, a group of 80 year-olds, and in one group they have normal audiograms, or what would be considered normal audiograms for that age. And then we also have mild hearing impairment. And we find big differences both in automatic processing and in active attention based processing, which is part of what we’re going to start writing up what happened. This situation is quite different. And the balance between what happens automatically and what happens with attention changes. And we find this also, actually, in children. It’s the automatic system that needs to be refined. People would think, “That happens automatically, you’re born with the architecture to do it.” But no, it’s the automatic system in children as well, that needs to be refined. And then it’s like driving a car, you get the skill and then it works, and then it helps you do what you need to do, and you can do what you need to do even better. And we find that in aging what seems to be going is this automatic system isn’t supporting the attentive system in the way that it used to, which is sort of going to be the message of our paper when it comes out.

Audience Question:

Hello my name is David Kessler, and I’m a student at Vanderbilt University, and I’m interested in the stream segregation aspect of this, but specifically for speech. And my question is how specific does this segregation go? For example if you’re listening to a specific voice, are you lumping that one voice as a total segregated entity, or like you were kind of talking about before, you could really focus on many different parts of like super-segmentals for example, you could focus specifically on the intensity of that voice. And would focusing on one specific super-segmental kind of suppress the other super-segmentals of that same voice? Or how does that interplay?

Well at least you’re thinking like a scientist now. We haven’t done that level of fine detail. The idea is, as you said in the beginning, the idea is that the voice that speaking is joined into one stream that has then coherent within stream characteristics. And in the first experiment that I explained to you, I think some of those general auditory mechanisms are involved with speech processing because you have to integrate and segregate even within a speech stream. But I haven’t gone beyond that, looking that in a speech stream how do you segregate out that further. So I wouldn’t be able to answer that, but it does sound like an interesting question to ask. We have looked at what happens, or, we’re in the process of looking at what happens when you have multiple cues versus single cues coming from different streams and how that affects the coherence of the individual streams, and that might play more into, because there are multiple features in the speech streams, so that may play more into how strongly that speech stream coheres, and whether you hear it as separate items or whether it’s just differences within the single stream.

Audience Question:

Hi, my name is Samantha Gustafson said I’m a PhD candidate at Vanderbilt, and I want to go back for a second to the automatic processing of the unattended signal, and I’m wondering if you’ve looked at, within your experiment over time, if that automatic signal or processing reduces after a period of time, where they’re asked to process. And whether or not that might tell us something about fatigue and our ability to use that automatically processed signal when we’re not, when we’re exhausted. So in the first trials versus the last trials, would you expect to see the same type of — or the same strength of automatic processing. I’m not sure how long your sessions are. If they’re short, it may not matter. But if you lengthen them it may be that their motivation or their fatigue — motivation reduces as fatigue kicks in and that automatic processing is not as robust.

So that is a good question. There are effects of fatigue that I’ve heard about. We have not really analyzed in that way. We have to average together a lot of trials, and we we would have to design the experiment in order to have enough trials to look at the beginning, the middle, and the end. Usually when we have done it to look at the beginning, the middle, and the end we had not fatigue issues but learning issues or training issues. Like if you’re doing a really difficult task, how do you perform in the beginning versus how do you perform at the end. I’ve never seen, just informally over the decades I have been doing this, fatigue measures in the automatic processing. But I’ve seen measures in behavioral. But I haven’t, can’t think that, I can’t swear that I’ve checked it systematically. But it’s an interesting question, but I think the fatigue comes in more when you’re listening, focusing, and doing the task. That’s what we’ve seen more fatigue issues come in. But when you’re watching a movie or doing something less strenuous, and we haven’t really noticed that there’s a difference that’s based on fatigue.

Audience Question:

I’m Jessie Erikson from the University of Arizona, and I had a question about one of your methodological decisions. A lot of the examples we talked about where you choose to attend to certain sounds or certain parts of the auditory environment, you’re not explicitly told to ignore the other sounds. But in your experiments I noticed that participants were often or always explicitly told to ignore certain sounds, or you can ignore these. As opposed to just emphasizing what to focus on or what to attend to. And so I was curious about your model of attention that might have motivated this, or your theoretical framework that motivated this methodological decision.

Now that’s a good question, this kind of question comes up all the time because people don’t really understand what attention is. And so, and we’re still trying to learn what attention is. What we tell the subjects — first I’ll just tell you what we usually tell the subjects — I no longer tell my subjects ignore the sounds. Because there’s no such thing, and I don’t really want them to. What we tell them is: “You’re gonna hear sounds played in your ears. Tell us if you don’t hear the sounds.” Because that’s the first — we want them to hear the sounds. And then we tell them what tasks to do. Either watch a movie or do a task with the sounds. Here’s my thinking about that. I’ll just give you quick, well I won’t talk too long about this because I could talk quite a long time about this, my thinking about this. When I first started people used to say, you’re ignoring the sounds it’s pre-attentive processing. And then everybody would argue, but it’s not pre-attentive processing, they might sneak peeks in this kind of thing. There’s no such thing as pre-attentive processing. Because when you wake up, you’re attending to something, and you have awareness of all the things that are going on in your environment. So this is not pre-attentive. There’s a different way that you focus on the sounds, and what I want to look at now is if you focus on a movie or a visual task and the sounds in the background and you don’t have any information about that sounds, you don’t care about it because you’re being paid to watch a movie, what does your brain do with the sounds. Or if I tell you to focus on a set of sounds and listen for something, are you hearing it? What is your brain tell us about what it is that you’re doing with those sounds, and then how does that modify what was happening in the background that we saw before? So I want to redirect your attention in multiple different ways. So in that experiment I showed you with the grouping, the random oddball, and then every fifth tone it was a group. We have a two forms of attention in that experiment. We have them ignore and watch a movie — or I said ignore forgive me. We have them watch a movie. And then we had them listening to the sounds, but listening to the pitches of the sounds and pressing the key when they heard a rare pitch. And when they did that we got an MMN to the pink, what would have been the pink one because the pitches were of interest, and so all the pictures were being recorded. But when we asked them to take that same set of sounds, they were grouped let me say that they were grouped when we have them listening to the pitches, now we have them take the same set of grouped sounds but we said there’s a five-tone repeating pattern and listen to that. And on the basis of the five tone repeating pattern press the key when you hear a pattern change, and they were pressing the key to the same tones as they did when we had them listen to the pitches, listen for pitches, then no MMN was elicited. They were listening to the same set of grouped sounds in two different ways. We got an MMN when they listened to the pitches, and no one MMN when they listened to the pattern. So it isn’t that it’s just ignore or attend. It’s where you’re directing your attention, how are you using the information that’s coming in your ears, what makes a difference. It’s the same like when the the high tone stream got obscured because the automatic system didn’t need to use that information, it was just a lot of stuff going on there. But when you needed to use that information, attention could pull out those events. So it was affecting a different level of the system. So I like to think of it in terms of the levels of the system and how they’re interacting together.

Audience Question:

Hi, Sophie Schwartz from Boston University. I was wondering about your behavioral work, looking at the differentiation of two streams based on semitone difference and wondering how long you gave individuals, these adults, how much time you give them to be able to say if the two streams were the same or different, and when you’re finding this five semitone sort of threshold, did that get lower when you, perhaps if you gave them more time.

Ok so I think I I understand your question because there’s what we call a buildup period before sometimes you can hear things as integrated or before you hear them a segregated. So we far surpassed the buildup period. I think they had ten to twenty seconds I don’t remember the exact number, when they had to say is it somewhat more like one or more like two. And we had them wait until the end of the sequence before they gave their response. So they listen to the sequence of sounds and we said even if you think you know what is, listen to the whole thing and give us the response at the end. The thing that is — the other thing that we did was, we didn’t present a five semitone sequence and get one response and then present another one and get a response. We mixed up and randomized whether they got a one, a five, eleven, seven, all randomized up so the five came in after many different things. So they couldn’t — maybe habituate would be a good word here — they couldn’t habituate to segregation or integration and have that hysteresis effect change what they were perceiving, it was all randomized. And no matter how many times we randomized this and how many participants we asked to do this, it on average we get fifty percent of the time, when that five semitone ones come up they say one stream half the time and two streams the other half of the time. And we don’t find a difference — loudness detection task, it’s true it’s going on for minutes, and they’re doing this and we get very similar consistent responses across different groups of subjects where it’s ambiguous for the five semitones difference. We have an integrated region around one semitones, an ambiguous region around five semitones and a clearly segregated region eleven semitones and above, for adults with normal hearing. Not true for kids.

Audience Question:

Question inaudible.

Oh okay. That’s a really great question, and something that I’m quite interested in. We measure — I’m not a clinician and we want to know what the average group response is, so that if we were to pick somebody out of the crowd, we can say this is what we expect the response would be. We see a great deal of individual variation and part of what we want to do is maintain that variance so that we can know whether this is something that happens in a group, or it’s just specific to a certain individual or group of individuals. And we do see individual variation, so for example I said five semitones is the ambiguous region. Well, there are a large number of people where three semitones is the ambiguous region. And we can tell that. But when it comes to, can we tell the difference between auditory processing disorder or attention deficit disorder, there are so many factors that feed into the individual variants in a child with auditory processing disorder or attention deficit disorder — at the moment, the measure — I haven’t had the chance to systematically study whether we can, which measures we can use to sort through their question. Which is something that I would like to do because we’re now getting a handle on what attention does in kids and what is important in kids. And it’s very different from what it does in adults. But right now, we’re still at the group level. So we can’t quite answered the question yet. But I would like to.

Audience Question:

Hi, I’m Kristi Ward, a PhD student at Northwestern. And I was — your study that the methodology was looking at the integration or segregation question. That kind of got me thinking about how much experience plays into this ability to segregate a stream, especially complex stream. For instance, if you were presented with the same complex stimulus, you need to kind of experience that stimulus over some given time period in order to then learn enough about the stimulus to move forward with the segregation component, at least that’s what I was thinking about in regards to the methodology that you presented. Do you have any comments on that experience component and how it would relate to the cocktail party effect?

Oh that’s an interesting, complicated question. I’ll parse it down a little bit. The experience question, we find that people who get experience with our really complex paradigms do better as they get more experience. And actually Kelin Brace one of the students in my lab is just finding some evidence supporting that, we want to follow up on that. We do find, however, that most people who come into the lab, but not all people to come into lab, can do the very complicated tasks. I remember when I was doing my dissertation I had the pitch patterns, and in the experiment I showed you they were ignoring one case, but in another case they were attending. And I remember having a subject, I said you have to listen to the high stream and press the key every time the three tones reverse. And for me it was crystal clear what that meant. And they weren’t doing the task, and I said what happened? He said, “Well I can hear that there are high tones, and I can hear that there are long tones, but I don’t hear that rising pitch pattern.” And it was the first time I had encountered that. And then I had encountered that a couple more times. There’s a great deal of variation amongst individuals. And what happens, and we have found that if you train them on a particular paradigm that they can get better at the paradigm. And musicians are better at most things because they’ve been trained to listen in a different way. So there is that element that can come into it. But most people can do what we ask them to do without any training. Everyone gets a practice period to do it, so I’m not sure of the extent of the training that you were talking about. But the people who are working in our labs, the students working in our labs after they hear the same sounds over and over and over, they’re really great at it.

Audience Question:

I’m Adam Bosen from Boystown, I’m a postdoc there. And I was curious, this might be kind of a weird question, but I was thinking about — so usually the results show that attentional focus on a particular feature sharpens your ability to detect mismatches in that feature. Are there conditions under which the opposite might be true? Where, say, active attention is degrading your ability to detect a mismatch by degrading your ability to understand — or to group the standard into a single stream.

Okay, so this is this is a good question because the MMN — I’m first going to clarify one thing. The MMN is an outcome measure that shows what the brain detected is changing. And it’s the tool we use so that we can understand what’s represented in memory. And what we have found, when I say it’s highly context-dependent is, that whichever way you listen to the sounds, which ever way you group them will influence whether MMN is elicited or not, because it will influence what the standard is, what the event is. So the question that you had was — what was the question that you had? Now I remember. If you change the way the sounds are grouped, then MMN will no longer be elicited in the way that I showed you, that if you take what would be a deviant if it occurred infrequently in the block and made it part of the standard, then you no longer get an MMN because now it’s not a deviant. But it’s still frequency change. So that would take a frequency change that will listen MMN if you were ignoring, then is not elicited when it’s grouped together because that’s what your task is. So you can go back and forth, but what is crucial is what’s represented in memory as being a repeated element, repeated elements in the signal. Then MMN is only like the dumb output thing. If there was a change in there, it will elicit MMN. If there’s no change in there, you won’t get an MMN because there’s no change to be had.

Audience Question:

Dr. Sussman, I was wondering at the end at your conclusion, you were showing the symphony orchestra playing Bach. And I was wondering in terms of variations of the subjects if you were let’s say to get a group of conductors and a group of musicians, composers who are able to listen linearly and then also vertically and hear everything holistically. Let’s say for example some of these people are capable of training their brains that they can let’s say hear a Brahms six-part violin piece and then also here Philip Glass and know everything that’s going on. So apparently from what you’re saying, that they are training their brains to hear the individual portions, but they’re also doing something in meta-musically and they’re putting it all together and capable of hearing this. So in terms of attention, is the human brain let’s say capable of doing that beyond what we do normally in terms of, let’s say helping children with ADHD and training them using musical tones, musical line and focusing just like a composer or a conductor would do?

Well the last part would be interesting to find out. But I think what I’m saying here is that we usually say that attention limits all of our processing. And the evidence that I provided today shows that we still process quite a lot of the information that’s going on in the background, even when we’re attending to something. We get the global peace sure and the local picture. But we can’t get, what does get limited, is being able to follow simultaneous within stream events. Two voices, two streams of speech at the same time, two melodies on two different instruments at the same time. But we can listen to the melody and hear the global harmony. We can listen to our friend speak and hear other things, glasses clinking, other sort of random objects in the background that have different pitches or different elements that aren’t at the level of within stream characteristics. It would be very interesting to follow up to see what that would mean if there’s any way that we could capitalize on this to take advantage of it for children who have difficulties in noisy scenes, because it isn’t just that we filter out everything that’s in the background. We still monitor simultaneous streams of information going on for the use of it if we want to switch our attention around the room.

Audience Question:

Hello I’m Nick Stanley from the University of South Alabama. My question is really when you look at these three separate streams, and that occurring MMN, is there an extent in the number of streams where that MMN may not become obvious and present.

Haha that’s — we’ve we’ve asked that question. It may be, there may be a limitation to that. It’s really challenging to design — we tried to do five streams. It’s really challenging to design a paradigm that doesn’t confound the deviance across all the streams and find a way to do that. And we haven’t found a way to do that. So then we don’t know, because with five streams we weren’t able to see five clear MMNs. So then we weren’t able to — and of course it wasn’t published because you don’t publish null results, right? So then it’s not clear though whether we couldn’t design the paradigm in order to pick out the five streams, or whether you can’t represent more than four streams. If we take the seven plus or minus to the Barbara mentioned before, and we do the Nelson Cowan modulation of that theory to the four plus or minus three or whatever he said. If four thing this is the most we can — four chunks of things. It’s not seven individual events, but you know, you make the phone number into two chunks of things. And so Cowan sort of extended that to say, well you can do four chunks of things. Maybe five was beyond the limits of what could be done with the automatic system. I don’t know which is the answer. I would like to find out, I just haven’t followed down that pathway. It gets so complicated to try to segregate, and you know make a paradigm where you can have seven different discrete items. But maybe there is a way to do that.

Elyse Sussman
Albert Einstein College of Medicine

Presented at the 26th Annual Research Symposium at the ASHA Convention (November 2016).
The Research Symposium is hosted by the American Speech-Language-Hearing Association, and is supported in part by grant R13DC003383 from the National Institute on Deafness and Other Communication Disorders (NIDCD) of the National Institutes of Health (NIH).
Copyrighted Material. Reproduced by the American Speech-Language-Hearing Association in the Clinical Research Education Library with permission from the author or presenter.

Share on facebook
Share on pinterest
Share on twitter
Share on linkedin