Effects of Spatial Training Paradigms on Auditory Spatial Refinement in Normal-Hearing Listeners: A Comparative Study
Article information
Abstract
Background and Objectives
This study compared the effectiveness of two spatial training programs using real and virtual sound sources in refining spatial acuity skills in listeners with normal hearing.
Subjects and Methods
The study was conducted on two groups of 10 participants each; groups I and II underwent spatial training using real and virtual sound sources, respectively. The study was conducted in three phases: pre-training, training, and post-training phases. At the pre- and post-training phases, the spatial acuity of the participants was measured using real sound sources through the localization test, and virtual sound sources through the virtual acoustic space identification (VASI) test. The thresholds of interaural time difference (ITD) and interaural level difference (ILD) were also measured. In the training phase, Group I participants underwent localization training using loudspeakers in free field, while participants in Group II were subjected to virtual acoustic space (VAS) training using virtual sound sources from headphones. Both the training methods consisted of 5-8 sessions (20 min each) of systematically presented stimuli graded according to duration and back attenuation (for real source training) or number of VAS locations (for virtual source training).
Results
Results of independent t-scores comparing the spatial learning scores (pre vs. post-training) for each measure showed differences in performance between the two groups. Group II performed better than Group I on the VASI test, while the Group I out-performed Group II on the ITD. Both groups improved equally on the localization test and ILD.
Conclusions
Based on the present findings, we recommend the use of VAS training as it has practical implications due to its cost effectiveness, need for minimal equipment, and end user usefulness.
Introduction
Sensory plasticity is a life-long process and plasticity of auditory domain is no exception [1,2]. The studies on auditory spatial plasticity have shown that the mature brain retains a surprising capacity of relearning to localize sound in the presence of substantially altered auditory spatial cues even in adulthood [3-7]. Auditory spatial difficulties and the errors arising due to the same are not only well documented in the listeners with hearing related disorders, but are common in normal hearing (NH) listeners. The act of localization in NH listeners is often underpinned by some amount of inherent uncertainty and operational bias that results in source estimation/localization errors [8]. Reports on inherent bias in NH show that listeners were inherently rightward biased by 1° to 2° for the sources located the 0° azimuth [9-11].
Spatial errors in NH are often seen as the manifestation of such inherent bias. Most classically spatial errors are in NH ‘front-to-back’ errors [12]. Front-back confusions are spatial judgment errors that cross the interaural axis [13,14], wherein a source located in front is confused to source located behind and vice-versa. Best, et al. [15] reported that NH listeners in their study showed high prevalence of front/back errors accounting to 5% of total errors, while Makous and Middle-brooks [3] found 2%-10% front/back confusions in NH listeners on sound localization task.
Resolution of spatial errors in NH facilities ease of communication as it helps the listener to easily orient to the source and not miss about the critical information, while looking out for the source. The resolution of auditory space in NH listeners becomes important in complex auditory situations, such us in the presence of background or competing noise, called as cocktail party effect [16]. The inherent bias present in location judgements [3,15] can become increasingly annoying for speech perception in such adverse listening conditions, although hearing is normal. Another difficulty stemming from slight deviances in spatial information processing in NH listeners can become more pronounced in presence of acoustically similar linguistic masker: a phenomenon called informational masking [17] (which refers to higher order cognitive masking). Thus resolution of front-back errors which occur due to inherent bias in location judgement even in NH listeners, becomes very important in challenging listening environments which we encounter every day.
Auditory training aimed at resolution of spatial errors is documented in literature [5,18-21]. Auditory training focused on components of spatial perception is termed as spatial auditory training. The positive impact of spatial auditory training were realized as improvement on overall localization performance [3-6]. These studies use binaural cues such as interaural time and level differences (ITD & ILD) to induce changes in spatial hearing. The ITD & ILD cues for horizontal localization. On other hand, mechanisms for front/back localization are based on spectral cues of pinna [22,23], which are not addressed by manipulation of binaural cues.
The use of real sources and virtual sources in spatial training can overcome these disadvantages posed by binaural cue manipulation techniques. Spatial training using real sources has the application of loudspeakers in free-field environment, wherein all the binaural and monaural cues are readily available to the listener. In contrast, virtual source training relies upon the simulation of three-dimensional (3D) acoustical space under headphones. This reliability of virtual sources in spatial hearing experiments has been verified and used in different studies [22,23]. Spatial training paradigms using real [24,25] and virtual sources [26] have resulted in refinement of spatial skills in NH listeners. Although front-back confusions were resolved on both of these training protocols in NH listeners, the effectiveness of one over the other have not been tested. Comparisons of the training protocols help in establishment of the effective strategy in clinical settings. The present addresses the need of establishing the ideal training protocol for refinement of spatial skills in NH listeners. The present study is an off-shoot of the spatial remediation program for individuals with sensorineural hearing impairment [27], although its efficacy in NH individuals is tested in this study. Specific objectives of the study were to compare the spatial learning subsequent to training using real and virtual sources on different measures of spatial acuity.
Subjects and Methods
An experimental research design was adopted in the present study, which consisted of two groups of participants. Group I participants underwent spatial training using real sources in free-field while those in Group II underwent spatial training using virtual sources (virtual acoustic space [VAS] training). Each group consisted of 10 participants aged between 18-25 years with NH sensitivity (pure tone average of 0.5, 1, 2, and 4 kHz <15 dBHL) without any otological, speech and language, neurological and cognitive deficits. The sample size required for each group in the study was statistically estimated in accordance with Nisha and Kumar [24] study using G*Power version 3.1.9.4 [28]. Based on G*power analysis, the sample size calculated was eight for each sub-group, for an effect size of 0.86 and corresponding power of 0.97. The determined sub-group size used in the study (n=10) was thus verified to be appropriate for measuring spatial training-related changes (if any), in NH listeners. Ethical committee approval was taken from the institutional review board (no. SH/CDN/ARF-AUD-4/2018-19) and an informed consent of all the participants was obtained prior to their inclusion in the study.
Procedure
The study was conducted in three phases, including a pretraining, training, and post-training phases. At the pre- and posttraining phases, the spatial acuity of the participants was measured using real and virtual sources apart from quantifying their binaural processing abilities (ITD & ILD thresholds).
Phase I: pre-training evaluation phase
This phase established the baseline spatial performance of NH listeners on all the spatial hearing measures.
Test of spatial ability using real sources (test of localization)
Test of spatial acuity using real sources was carried in localization chamber with 18 loudspeakers (Genelec 8020B BI amplified monitoring system; Genelec, Iisalmi, Finland) array, placed 20° azimuth apart as shown in Fig. 1.
White band noise (WBN) of 250 ms duration was routed randomly at 65 dB SPL through Lynx mixer (Lynx Studio Technology Inc, Costa Mesa, CA, USA), through Cubase software (Stein-berg media technologies GmbH, Hamburg, Germany) to loudspeakers (total presentation, 18 speakers×5 repetitions=90 stimuli). The participants were asked to verbally report the digit corresponding to the loudspeaker from which the sound was emitted. The measure of accuracy, i.e., root mean square (rms) error [29] in localization was calculated for each participant using the following formula:
where n is the number of stimuli presented.
Test of spatial ability using virtual sources (test of VASI)
Test of spatial acuity using virtual sources, similar to the one used by Nisha & Kumar [26]. The WBN (250 ms) convolved with head-related transfer function (HRTF) was used to generate five VAS perceptions (corresponding to 0° azimuth, 45° azimuth towards left, 90° azimuth towards left, 45° azimuth towards the right, and 90° azimuth towards right) using Sound lab (slab3d) version 6.7.3 (NASA, Ames Research Center, Mount View, CA, USA, 2012) software. All the stimuli had a constant elevation of 0° azimuth and a distance of 1 m. The default HRTF used in slab was comparable to the head models provided in the Center for Image Processing and Integrated Computing database [30] and is shown to produce reliable lateralization responses [31].
The presentation level of all the stimuli in the ear of lateralization was calibrated to be at 65 dB SPL through a sound level meter (SLM; Bruel and Kjaer 2270; Bruel and Kjaer, Naerum, Denmark) attached to a manikin (Knowles Electronics Mannequin for Acoustics Research; G.R.A.S Sound & Vibration, type 45 BA, Holte, Demark). The VAS stimuli (total presentation, 5 VAS locations×10 repetitions=50) were presented randomly using paradigm [32] software through Sennheiser HD 280 PRO headphones (Wedemark, Germany). The user interface used for the data acquisition was also developed using the paradigm software, as depicted in Fig. 2.
The participants were asked to attend to the VAS stimuli and respond through a mouse press on the virtual location of the dummy head (Fig. 2), corresponding to the perceived location. The accuracy of virtual accoustic space identification (VASI) score (aggregate score formed by combining all the 5 individual location scores) is computed using a phyton script, running in the background of the paradigm software. The accuracy scores were absolute data, with no units of measurement. The higher the VASI accuracy score, the better is the spatial performance. Maximum VASI score that can be obtained is 80. The overall VASI accuracy score thus computed was stored as a output for each participant in an excel sheet.
Test of binaural processing-ITD & ILD thresholds
Psychoacoustic toolbox [33] implemented in Matlab version 2014a (The MathWorks Inc., Natick, MA, USA) was used to obtain thresholds of ITD and ILD. WBN (65 dB SPL) were presented in 3 interval forced choice method, using three-down one-up transformed up-down staircase procedure [34], as shown in Figs. 3 and 4. In the 3 intervals, two were standards (centralized stimuli) and one was variant stimuli (produced lateralization). Participants were asked to indicate the variant interval (interval in which the sound leads or is heard louder in the right ear) by pressing the number corresponding to the same on the keyboard. The time or level of the variable stimuli was varied adaptively in accordance with the response of the participant. The testing was terminated after 10 reversals, and the last four reversals were averaged to get the ITD and ILD thresholds.
Phase II: training evaluation phase
In the training phase, Group I participants underwent localization training using real sources (loudspeakers) in freefield [24], while participants in Group II were trained using virtual sources-VAS training [26]. Both the training methods consisted of 5-8 sessions (20 min each) of systematically presented stimuli. The training protocol is similar to the earlier published reports [35,36]. In brief, the training using real sources involved presentation of hierarchically presented stimuli through eight loudspeakers placed 45° azimuth apart at a distance of 1 m. The stimuli (bus horn, speech sound/da/, and telephone ring) were graded in duration (1,000, 800, 500, and 300 ms). Each duration stage was sub-divided into four nested levels amount of back attenuation (8, 4, 2 dB SPL back attenuation, ref: front loud-speaker). The participant progressed from one stage to another if he achieves an rms error score of <10° azimuth, which was calculated using a custom MATLAB script. The schematic representation of the hierarchy of stimulus (duration and attenuation parameters) presentation in Group I participants who were trained using real sources is shown in Fig. 5.
Similar to training using real source, the VAS training protocol was administered on Group II participants also involved graded progression of four to eight virtual sources training (which were located at 45° from each other). The stimuli used in the VAS training was similar to the one used in VASI test, with the stimulus presentation and the response acquisition being controlled by user interface designed using the paradigm software, as shown in Fig. 6. The stages were formed on basis of duration (2,000, 1,000, 500, and 300 ms), while the no. of locations (4, 6, 8 VAS locations) was the nested factor (Fig. 6).
At each level, the stimulus at each location was presented at least 7 times in a random manner and the response of the participant was obtained. The participant started at the easiest stage i.e., S1L1 (2,000 ms; 4 locations) and the percentage correct judgements were calculated after each stage. The participant moved from one level to next in the hierarchy based criterion of ≥70% of VASI accuracy scores in each level.
Phase III: post-training evaluation phase
Phase III included the re-administration of the all the spatial acuity tests used in Phase I. All the post-training evaluations are done immediately after training within span of 1 day. The difference in spatial acuity scores between post- and pre-training phases, hereby termed as ‘spatial learning’ was calculated. Use of spatial learning score helped us to obtained the benefit of training of each individual separately, as pre-training scores of same individual serves as the baseline, on which the post-training scores obtained from same individual are subtracted and a difference score pertaining to spatial learning is derived. This was calculated for each behavioral measure to serve as indicator of training benefits in both groups. Use of spatial learning scores (instead of pre- vs. post-training scores), better accounted for individual variations in the baseline spatial acuity thresholds/scores, as each individual served as a his/her own control.
Statistical analyses
Multivariate analysis of variance (MANOVA) was done to establish the equivalency of two groups at the start of training i.e. in pre-training evaluation. The spatial learning scores between the two groups were compared using independent t-tests for each measure separately. Cohen’s d was calculated as a measure of effect size wherever significant differences were seen. All analyses were conducted in SPSS version 26 (IBM Corp., Armonk, NY, USA).
Results
Results of Shapiro-Wilk test showed that data collected in all the three phases across the two groups adhered to normality in distribution (p>0.05). MANOVA with spatial acuity scores for each test as dependent variable and groups as independent variable showed no main effect of groups [localization error: F(1,18)=0.57, p>0.05; VASI: F(1,18)=0.24, p>0.05; ITD: F(1,18)=0.54, p>0.05; ILD: F(1,18)=0.20, p>0.05]. The descriptive statistics along with mean and variability (standard deviation) scores at pre-training phase and post-training phase for each group is shown in Table 1, which revealed an noticeable improvement in spatial performance for all the tests for both the trained groups (group I and group II).
The results of independent t-tests analyses for spatial learning scores (spatial learning score, which is the difference score of pre and post-training) for each group of listeners on the 4 spatial tests are shown in Fig. 7. The results of this comparison showed Group II (who underwent VAS training) performed significantly better [t(18)=2.81, p<0.05, effect size Cohen’s d=1.33] on virtual source identification test (VASI) compared to Group I (who underwent localization training). On the other hand, Group I improved significantly on ITD [t(18)=4.25, p<0.01, Cohen’s d=2.02] relative to Group II listeners. However, both the groups showed similar improvements on tests of spatial acuity using real sources [t(18)=0.21, p>0.05] and ILD [t(18)=0.02, p>0.05].
Discussion
Despite the evidence of spatial errors arising due to inherent bias [9-11] and front-back confusions [3,15] in the auditory spatial perception in the NH listeners, to date, very few studies have investigated if spatial judgements in NH can be fine-tuned. The spatial acuity of participants in Group I (trained using real sources) and II (trained using real sources) were measured at the pre-training and post-training phases. MANOVA showed no differences between the two groups before they were subjected to training, indicative of group equivalency. The mean spatial acuity scores of participants in Group I and Group II showed an improvement in the post-training (Table 1) phase compared to the pre-training phase. This was complimented by reduced variability (standard deviation) in the posttraining phase for both the groups, both suggestive of positive outcomes of spatial training. The improved scores in both spatial paradigms indicate the fine-tuning of the spatial acuity skills in NH listeners, whose consequences can be manifested in the challenging listening situations (explained in introduction). The authors postulate that the effect of spatial training in NH would have been better realized if the sound distracters in form of acoustically and linguistically similar maskers (in spatially conjoint and separated locations) were employed, although this aim was not explored in the current study.
For Group I, trained using real sources, improvement in spatial performance was not only seen in the task trained (localization of real sources), but the Group-I participants also showed improved performance on identification of virtual sources, decreased ITD & ILD thresholds (Fig. 7), all of which are indicative of transfer of learnt skills to other untrained tasks. On other hand, for the Group II trained using virtual sources, benefits of training were transferable to other three related tasks (localization, ITD, and ILD). These improvements on other untrained measures highlights the generalizability of spatial training. The transfer of skills to untrained tasks suggest procedural learning (training-induced changes occur not only in the task that is trained but can also be generalized to other finer tasks) in auditory spatial tasks. Similar training-induced procedural learning for spatial tasks was demonstrated earlier by Ortiz and Wright [18], who demonstrated the positive outcomes of ITD and ILD training for both of the former tasks, along with its generalization to GDT (temporal acuity). The transferability of training effects to untrained skill in auditory spatial processing can be explained on basis of process- and task-specific theories of learning. According to Dahlin, et al. [37], the transfer of skills to untrained skill occurs if the trained and transfer tasks engage specific overlapping processing components and brain regions. On the bases of this theory, it can be hypothesized that training (using real or virtual sources) activates similar brain areas (the what and where pathways), responsible for auditory spatial processing. The similarity in neural underpinnings of temporal (ITD), intensity (ILD), and spectral correlates of spatial processing would have facilitated the transfer of skills to untrained tasks.
The results of the independent t-test showed that Group I and Group II demonstrated similar spatial learning (improvements) in ILD and localization tasks. In addition, Group II showed significant spatial learning on the VASI test compared to Group I, reflective of advantages derived through stimulus learning, which was not readily observed for Group I. In contrast, the significant improvement in ITD scores in Group I can be attributed to two reasons. Firstly, localization training using real sources involves a free-field environment where inherent low-frequency information is available in abundance [38]. Secondly, humans are prone to listen more to low-frequency sounds in their day-to-day conversation (speech frequency ranges 300-3 kHz). Thus, spatial training using real sources would have facilitated the maximal use of ITD cues, which dominate in low frequency.
In conclusion, findings of the study showed that the spatial performance of NH listeners could be refined when subjected to brief periods of training using both real and virtual sources. The equivalent benefits derived from both training methods are suggestive of the effectiveness of both the training programs. In the light of the present findings, we recommended the use of virtual auditory space training (VAST) as it has more practical implications. VAST allows more flexibility, is cost effective, and requires only minimal equipment for achieving maximal benefits (comparable to the laboratory-based training) and can therefore be advocated in other clinical population as well. In addition, VAST can prove to be essential in professions such as vehicle drivers, men in defense sector, acoustic engineers etc., where spatial precision/orientation plays a crucial role.
Acknowledgements
We would like to thank the Director and HOD, Audiology (All India Institute of Speech and Hearing, Mysuru, affiliated to the university of Mysuru) for permitting us to carry out the study. The authors would like to thank all the participants of the study for their consent and cooperation during the data collection. This work did not receive any funding or grants.
Notes
Conflicts of interest
The authors have no financial conflicts of interest.
Author Contributions
Conceptualization: Kavassery Venkateswaran Nisha, Ajith Uppunda Kumar. Data curation: Kavassery Venkateswaran Nisha. Formal analysis: Kavassery Venkateswaran Nisha. Investigation: Kavassery Venkateswaran Nisha, Ajith Uppunda Kumar. Methodology: Kavassery Venkateswaran Nisha, Ajith Uppunda Kumar. Project administration: Ajith Uppunda Kumar. Resources: Kavassery Venkateswaran Nisha, Ajith Uppunda Kumar. Software: Kavassery Venkateswaran Nisha, Ajith Uppunda Kumar. Supervision: Ajith Uppunda Kumar. Validation: Kavassery Venkateswaran Nisha. Visualization: Kavassery Venkateswaran Nisha. Writing—original draft: Kavassery Venkateswaran Nisha. Writing—review & editing: Ajith Uppunda Kumar. Approval of final manuscript: Kavassery Venkateswaran Nisha, Ajith Uppunda Kumar.