Brainstem Correlates of Comodulation Masking Release for Speech in Normal Hearing Adults
Article information
Abstract
Background and Objectives
Weak signals embedded in fluctuating masker can be perceived more efficiently than similar signals embedded in unmodulated masker. This release from masking is known as comodulation masking release (CMR). In this paper, we investigate, neural correlates of CMR in the human auditory brainstem.
Subjects and Methods
A total of 26 normal hearing subjects aged 18-30 years participated in this study. First, the impact of CMR was quantified by a behavioral experiment. After that, the brainstem correlates of CMR was investigated by the auditory brainstem response to complex sounds (cABR) in comodulated (CM) and unmodulated (UM) masking conditions.
Results
The auditory brainstem responses are less susceptible to degradation in response to the speech syllable /da/ in the CM noise masker in comparison with the UM noise masker. In the CM noise masker, frequency-following response (FFR) and fundamental frequency (F0) were correlated with better behavioral CMR. Furthermore, the subcortical response timing of subjects with higher CMR was less affected by the CM noise masker, having higher stimulus-to-noise response correlations over the FFR range.
Conclusions
The results of the present study revealed a significant link between brainstem auditory processes and CMR. The findings of the present study show that cABR provides objective information about the neural correlates of CMR for speech stimulus.
Introduction
The detection and identification of signals in a noisy environment is one of the essential tasks of the auditory system. The mechanism whereby the auditory system separates weak signals from masking noise is a key challenge in auditory neuroscience [1]. The detection of weak signals embedded in modulated noise is more efficient than the detection of those embedded in unmodulated (UM) noise both in animals and human beings. Furthermore, the detection threshold of the signal can be improved by presenting additional sound energy that is remote in frequency from the signal having the same envelope modulations in different frequency bands [2]. Comodulation masking release (CMR) demonstrates how such coherent modulations can facilitates signal detection in comodulated masker [3]. CMR is defined as the difference between the threshold of a signal in the comodulated (CM) masker and its threshold with the UM maskers at the same bandwidth (UM-CM) [3].
CMR has been investigated for speech signals, and studies show that the speech detection threshold and the speech discrimination threshold can be improved by presenting modulated versus UM maskers in normal hearers [4-6]. However, neural mechanism that underlies CMR for speech has yet to be fully described.
Neural correlates of CMR have been found in different regions of the auditory system in animals, including the auditory nerve, cochlear nucleus, primary auditory cortex, auditory forebrain, and midbrain [1,7-12]. Compared to the cortex, cochlear nucleus neurons are more sensitive to tones embedded in broadband-modulated noise. These studies indicate the importance of subcortical mechanisms of CMR in animals. In this regard, inferior colliculus, medial geniculate body, and primary auditory cortex axis (IC-MGB-A1) have been introduced as forming the pathway for the gradual extraction of signals from noise and the representation of the auditory object in the primary auditory cortex [8]. Neural correlates of CMR in humans have been demonstrated in the cortex using magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) [13-16].
So far, objective measurements (EEG, MEG, fMRI) have shown neural correlates of CMR at the cortical level in humans. Our search did not show any study of the correlates of CMR at the auditory brainstem level in humans. The present study was aimed to study whether there is a neural correlate of CMR in the auditory brainstem in humans. This study is an attempt to determine whether CMR is related to neural representation of the fundamental frequency (F0) and other components of speech in the brainstem. Moreover, the study develops the knowledge of the subcortical encoding of speech in noise. Additionally, the present study has potential to show that neural representation of the F0 and other components of speech in brainstem can be used as neural indexes of CMR in individuals with poor CMR (for example, hearing-impaired listeners [17], children [18], adults with auditory processing disorder [19] and cochlear implant users [5]). We also explore the role of brainstem processes in CMR by the auditory brainstem response to complex sounds (cABR). The cABR is highly reliable [20] and unlike the cortical responses, it is pre-attentive and imitates the speech sounds remarkably well [21]. Thus, two separate experimental tests were conducted on the same normal hearing subjects. First, the impact of CMR was quantified by a behavioral experiment. After that, the brainstem correlate of the behavioral effect was investigated by cABR. We examined the hypothesis that brainstem encoding of the F0 and other components of speech are diminished to a less degree by CM noise in comparison with the UM noise masker.
Subjects and Methods
Subjects
A total of 26 listeners aged 18-30 years [mean (M)=23.3, standard deviation (SD)=2.4] participated in this study. None of the subjects had any history of otological, audiological or neurological problems. All subjects had pure tone thresholds less than 20 dB HL at octave frequency between 250 Hz and 8,000 Hz. The listeners had normal latencies for click-evoked ABR wave V (<6.8 ms at 80 dB SPL with a rate of 31.4 Hz). All subjects went through at least a one-hour experience in behavioral experiment prior to data collection. This study was approved by the Ethics Committee of Iran University of Medical Sciences with the approval code (95/D/27799) and all participants have signed an informed consent.
Stimulus
The same stimuli were used in the behavioral and electrophysiological experiments. Although previous studies have investigated the impact of CM noise on neural responses using sinusoidal target signals [14,22,23], we chose speech syllable /da/ as the target signal because that might occur under more natural listening conditions. All stimuli were processed using MATLAB R2014a (The MathWorks, Inc., Natick, MA, USA). The syllable /da/ was a 40 ms five-formant speech sound syllable synthesized using a Klatt synthesizer [24]. The syllable /da/ began with a noise burst. The frequency components were as follows: the F0: 103-125 Hz, the first formant (F1): 220-720 Hz, the second formant (F2): 1,700-1,240 Hz, the third formants (F3): 2,580-2,500 Hz, The fourth formant (F4): 3,600 Hz (constant) and the fifth formant (F5): 4,500 Hz (constant).
For the masked test conditions, a speech-shaped noise was added to the signal. The reason of using speech-shaped noise was it can be the best choice to mask the speech syllable /da/ [25]. The masker was either UM or CM. To make CM masking noise, the speech-shaped noise was squared-wave modulated at a rate 50 Hz, 100% amplitude modulation depth and 50% duty cycle. The noise masker was equated in RMS power. The stimuli consisted of the signal and noise masker had a sampling rate 44,100 Hz and converted to analog signal by a 16-bit sound card. The stimuli were attenuated by a programmable attenuator (PA5). In both experiments, all stimuli were delivered monaurally to the right ear, and the left ear was kept in silence.
Behavioral experiment
The subjects seated in a double-walled sound-treated test booth. The stimuli were delivered monaurally to the right ear using the TDH-39 earphones with MX-41/AR cushions. The noise masker was always assigned a level of 60 dB SPL. The duration masker was 100 ms, including 15 ms onset and offset raised-cosine ramps. A three-alternative, forced-choice procedure with adaptive signal-level adjustment was used to determine the threshold of the speech syllable /da/. Three intervals were separated equally in time by 500 ms of silence. The masker was presented in all three intervals and the signal was randomly added only to one of the three intervals. The signal was centered in the masker. The task for the subject was to indicate by a key press which interval contained the speech syllable /da/. The initial signal level was 70 dB SPL. The signal intensity was adjusted based on two-down, one-up procedure that tracked the 70.7% correct point on the psychometric function [26]. The initial step size was 4 dB. This step size was halved after every second upper reversal until a minimum step size of 1 dB was reached. The track was continued for another six reversals. The average of these 6 last reversals was taken as an estimate of the threshold. Three estimated threshold were collected for each masking condition and the average of these estimated thresholds was taken as the final estimate of the threshold. The masked thresholds of the signal were obtained for two masking conditions (CM and UM). CMR is calculated as the difference between the threshold in the UM condition and the threshold in the CM condition (UM-CM).
Electrophysiological experiment
Brainstem responses were recorded in response to the 40 ms speech syllable /da/ using the Bio-logic Navigator Pro system (Natus Medical Inc., Mundelein, IL, USA). The experiment was conducted in an electrically-shielded, soundtreated booth. During the experiment, subjects were seated in an office chair that was individually adjusted for comfort. Subjects watched a silent, subtitled movie. A vertical montage of 3 Ag-AgCl electrodes was used to record brainstem responses [vertex (Cz) as active, ipsilateral earlobe as reference and forehead as ground]. Contacts impedance for all electrodes was below 5 kΩ with less than 3 kΩ difference across electrodes. The time window of analysis was 60 ms including a 15 ms pre-stimulus period and 15 ms period after stimulus offset. The high-pass and low-pass filters of 100 and 2,000 Hz, respectively were used. Online artifact rejection was employed at ±23 μV. The speech syllable /da/ were presented with alternating polarity in order to minimize cochlear microphonic and stimulus artifact. Approximately 6,000 artifact-free sweeps were averaged for each condition.
The syllable /da/ was presented at 80 dB SPL with an interstimulus interval of 52 ms. The stimuli consisted of the signal and noise masker were presented to the right ear through insert earphones (ER-3A, Etymotic Research, Elk Grove Village, IL, USA). The speech syllable /da/ was presented in three conditions, quiet and with speech-shaped noise with and without comodulation (CM and UM) at signal-to-noise ratios (SNR) of +5 dB. The selected SNR values were selected based on pilot tests. The entire data collection for each subject was performed in a single session.
Statistical analysis
All statistical analyses were performed using the SPSS software v18.0. In the behavioral experiment, a paired samples t-test was conducted to compare the average thresholds of the signal in the two masking conditions (CM and UM). The effect of the noise masker on cABR was examined using repeated measures analysis of variance (rANOVA) and paired samples t-test. A p-value of less than 0.05 was considered statistically significant.
Results
Behavioral experiment
The average masked thresholds of the syllable /da/ are shown in Fig. 1. There was a significant differences between the masked thresholds of syllable /da/ for the CM condition (M=46.46, SD=2.58) and the UM condition (M=57.00, SD=2.43); [t(25)=14.30, p=0.005]. The average CMR was 10.54 (±3.75). The masked signal threshold was lower for the CM condition than UM condition, indicating CMR.
Electrophysiological experiment
The cABR consists of a waveform with seven peaks, including the onset (V and A), the onset of voicing (C), frequency-following response (FFR) (D, E, and F) and offset (O) peaks. In the quiet, all waves were identifiable for all subjects. Four transient waves (V, A, C, and O) were omitted from the analyses because they were diminished in the two masking conditions. The FFR components (D, E, and F) were reliably present in the two masking conditions. Hence, we restricted the analysis to those. The brainstem responses were characterized by measures of peak latencies and amplitudes in the time domain, magnitude of F0 in the frequency domain and stimulus-to-response correlations over the FFR (13-43 ms).
Fig. 2 shows the grand average speech-evoked brainstem responses from -15 ms to 60 ms in quiet and the two masking conditions (CM and UM) at the SNR +5 dB. Fig. 3 illustrates a significant effect of the masking conditions on the brainstem responses. The rANOVA showed that the latency and amplitude of the FFR waves for the two masking conditions were significantly different from the quiet condition. In other words, the maskers significantly increased latencies [wave D: F(2,75)=4.27, p=0.001; wave E: F(2,75)=8.05, p=0.001; wave F: F(2,75)=6.85, p=0.001] and decreased amplitudes [wave D: F(2,75)=12.07, p=0.003; wave E: F(2,75)=7.08, p=0.001; wave F: F(2,75)=11.65, p=0.002]. The paired samples t-test showed that FFR waves D, E and F latencies were significantly delayed in the UM condition from the CM condition [wave D: t(25)=3.45, p=0.001; wave E: t(25)=5.05, p=0.001; wave F: t(25)=6.34, p=0.001]. The peak amplitude of these waves also were significantly larger in the CM noise masker in comparison with the UM noise masker [wave D: t(25)=7.15, p=0.001; wave E: t(25)=2.46, p=0.001; wave F: t(25)=4.24, p=0.001]. In the frequency domain, the F0 amplitude was significantly higher in the CM condition from the UM condition [t(25)=3.46, p=0.001]. These results suggest that brainstem representation of the FFR components and F0 plays a role in CMR for speech stimuli.

Comparison of the grand average speech-evoked brainstem responses in quiet, the comodulated and the unmodulated masking conditions.

Comparison of grand average frequency-following responses to he speech syllable /da/ in the CM and the UM masking conditions. CM: comodulated, UM: unmodulated.
Neural synchrony in noise masker was evaluated by stimulus-to-response correlations over the FFR (13-43 ms). Stimulus-to-response correlations were larger in the CM condition from the UM condition [t(25)=6.79, p=0.001]. These results indicate that differences in stimulus-to-response correlations between these two masking conditions could be attributed to excessive response degeneration by noise masker in the UM masking condition relative to the CM condition. These differences are shown in Table 1.
Correlation between electrophysiology and behavior
Pearson’s correlations were used to explore a relationship between the measures of CMR and the brainstem responses in the CM condition. CMR was related to waves D, E, and F latencies [r(26)=-0.76, p=0.001; r(26)=-0.82, p=0.001; r(26)=-0.84, p=0.002, respectively] and amplitudes [r(26)=0.85, p=0.003; r(26)=0.76, p=0.002; r(26)=0.89, p=0.001, respectively], such that higher CMR were indicate of earlier latencies and greater amplitudes of these waves. Higher CMR was associated with larger F0 amplitude [r(26)=0.95, p=0.001]. Correlation analyses showed a relationship between CMR and stimulus-to-noise response correlations over the FFR range [r(26)=0.73, p=0.002], in which higher CMR was associated with reduced impact of CM masker on the brainstem responses.
Discussion
The results of the present study revealed a significant relation between brainstem auditory processes and CMR. The auditory brainstem responses are less susceptible to degradation in response to the speech syllable /da/ in the CM noise masker in comparison with the UM noise masker. In the CM noise masker, waves FFR were correlated with higher behavioral CMR. Furthermore, the subcortical response timing of subjects with higher CMR was less affected by the CM noise masker, having higher stimulus-to-noise response correlations over the FFR range. The findings show the importance of brainstem encoding the FFR components and the F0 of speech for CMR. Therefore, the F0 is an important cue that allows a listener to identify and track target signal from among CM noises. The magnitude and latency of the brainstem responses depend on a high degree of temporally synchronous firing among neurons. The FFR components also depend on a high degree of synchronized firing of neural action potentials [27]. Hence, the results suggest that spectral and temporal processing at the brainstem may be the neural basis of CMR.
According to the outcomes of the present study, subjects take advantage of brief temporal minima in the CM masker to extract speech cues in the auditory brainstem level, also known as “listening in the dips” [2]. The masking release is reduced for the maskers with relatively flat spectrum, and a steady-state noise is likely to mask the lower level portions of speech. Other studies show that masking release and grouping mechanism are interrelated since the latter is likely to vanish in conditions such as loss of temporal fine-structure information (as seen in cochlear implant users) and reduced spectral resolution [5]. According to our findings, the brainstem encoding of temporal fine-structure information is an important factor in CMR, which might result in stream segregation, auditory grouping, and auditory object formation. Thus, the F0 and temporal cues contribute to auditory object formation, underlying successful CMR.
The data of the present study are in line with the physiological correlates of CMR in animal auditory systems [1,12]. In Pressnitzer, et al. [12], responses in accordance with CMR were shown by some of the recorded units in the cochlear nucleus of guinea pigs. The response to the masker is significantly reduced when a CM flanking band is added to the experimental paradigm. In the ventral cochlear nucleus, the effect of CM masker was determined in the inhibitory side-bands of neurons. Researchers and experts generally believe that CMR is linked to the wideband inhibition cells in onset-chopper units of the cochlear nucleus, which is projected to the contralateral cochlear nucleus and inferior colliculus [28,29]. The neural correlates of CMR at higher stages have been explored by some studies [8,22]. According to these studies, while many auditory aspects and the physical features of sound might be processed in the brainstem (e.g., the inferior colliculus), the configuration of these features to a separate perceptual object occurs in the auditory cortex. These studies showed that the gradual extraction of signals from noise and the organization of auditory objects are revealed along the IC-MGB-A1 axis in modulated noise. Although, previous studies have not demonstrated a relation between CMR and brainstem auditory processes in humans, we show a neural advantage for CMR in the human auditory brainstem level.
The present data show that CMR is related to subcortical auditory processing at the level of the response generator (midbrain) in normal hearing adults. Traditionally, the brainstem is called the “old brain” and current theories are supported by evidence that the brainstem can provide important information about the processing of auditory signals [30]. Given that many natural background sounds fluctuate at a broad bandwidth, CMR can be considered as a type of evolutionary adaptation of auditory systems for detection of sounds in background noise under coherent fluctuations. In fact, the brainstem can be viewed as an active interface of auditory peripheral and central systems.
The present findings contribute to the understanding of the physiological mechanisms underlying CMR. Such understanding has other applications both in research as well as in clinical implications. The cABR to speech syllable /da/ provides an objective measure of the subcortical auditory functions contributing to CMR. Hence, this objective measurement might have an important role in the traditional audiological tests, particularly in individuals with poor hearing speech in noise that can be caused by hearing loss, aging, learning disability, auditory processing disorder, attention deficit disorder, dyslexia, specific language impairment, and autism spectrum disorder. An advantage of using cABR in the assessment of CMR in the young and old population is that responses are pre-attentive, objective, and reliable and can be recorded passively. Also, the findings of present study may have clinical implications for diagnostic and management strategies for children and adults with poor CMR and hearing speech in noise.
In conclusion, the results revealed a significant relation between temporal and spectral processing at the brainstem and CMR. The data of the present study reveal that cABR provides objective information about the neural correlates of CMR. The study show that neural representation of the F0 and other components of speech in brainstem can be used as neural indexes of CMR. This procedure is objective, effective and fast. In the future, this procedure may play a role in the audiological protocol, particularly in patients whose reported hearing difficulties in modulated noise. Therefore, the speech-evoked brainstem response is an objective measure of neural mechanisms underlying comodulated masking release for speech and has the potential to improve assessment and management of masking release difficulties.
Acknowledgements
This study was a part of PhD dissertation project in audiology that was supported by the Rehabilitation Research Center of Iran University of Medical Sciences. We would like to thank all the people who had participated in this study.
Notes
Conflicts of interest: The authors have no financial conflicts of interest.