Psychometric Evaluation of a Digitally Recorded Urdu Monosyllabic Word List for Word Recognition Score Testing
Article information
Abstract
Background and Objectives
Monosyllable words are the most common speech recognition stimuli since they test auditory perception and are used to assess speech recognition. However, there is a lack of resources available for the Urdu-speaking Pakistani population. This study aims to develop and psychometrically evaluate a digitally recorded Urdu monosyllabic word list for Word Recognition Score (WRS) testing.
Subjects and Methods
A total of 135 monosyllabic words were selected from a previous study. These words were digitally recorded by a native female Urdu speaker in a studio. The recordings were psychometrically assessed by 30 native Urdu speakers with normal hearing. The 100 most familiar words were selected and organized into two lists, each further divided into four halves to ensure that the words were relatively homogeneous in terms of audibility.
Results
The average psychometric slope between 20% and 80% for the full list was 4.78%/dB±0.22%/dB, while it was 4.81%/dB±0.35%/dB for the half list. No statistically significant difference in p-values was observed between the full and half lists. The mean psychometric slope for 50% intelligibility was 6.04%/dB for both the full list (SD=0.44) and the half lists (SD=0.40).
Conclusions
Digitally recorded Urdu monosyllabic word lists are valid for assessing speech recognition in native Urdu speakers with normal hearing.
Introduction
The ability to perceive and comprehend speech is essential to humans for normal communication. Hearing-impaired individuals frequently express their inability to comprehend speech in everyday life situations [1]. Pure-tone audiometry alone is insufficient for hearing evaluation, especially in real-life circumstances. So, speech audiometry must be part of regular audiological evaluations [2]. Speech audiometry evaluates the patient’s communication ability by assessing the functionality of their speech perception system [3]. However, the accuracy of this examination relies on the knowledge of patient with the speech material. If a patient is tested in a non-native language, this may be misinterpreted as nonsense stimuli leading to falsely low scores. The patients should be given speech material in their native language for the most effective assessment and diagnosis of speech audiometry [4].
The Word Recognition Scores (WRS) test is the most common and essential component of speech audiometry. The patient listens to a list of phonetically or phonemically balanced monosyllable words and repeats them. WRS can be measured at various intensity levels to generate a performance-intensity (PI) function. The PI function describes the effect of presentation level on a patient’s speech recognition ability [5]. Several stimuli can be used to assess a patient’s speech recognition skills. These stimuli can be meaningless syllables, monosyllabic phrases, or sentences. Monosyllable words are the most common speech recognition stimuli as they test auditory perception. These words provide few hints to ensure test sensitivity [6].
The literature on speech identification contains a lot of psychometric data as many studies have investigated the aspects that can influence the intelligibility of speech [7,8]. The psychometric function illustrates the correlation between perceptual sensitivity and the acoustic intensity of a stimulus. The two most important elements of a psychometric function are the threshold and the slope. The threshold is the minimum stimulus level needed to produce a certain performance level (such as 50% accuracy), and the slope is the maximum rate at which performance improves as a function of stimulus level [9].
When measuring speech audiometry, several methodological issues must be considered, such as word list construction, speech signal intensity, familiar word list, and live voice or recorded presentation. Recorded word lists are recommended for standardizing procedures and maintaining speech signal consistency, unlike live voice presentations. Preferably, recorded materials should be used, especially for the suprathreshold speech recognition testing [10]. Digital recordings of materials reduce numerous issues associated with tape recordings, including increased distortion and unnecessary background noise. Additionally, recorded materials enhance consistency in stimulus intensity and speech patterns across clients and clinics. One can manage the randomization and timing of playing back specific words from an audio CD or a hard disc by using presentation software. This can improve test-retest reliability and shorten the test time [11,12].
Several languages have developed standardized word lists for speech audiometry. Numerous monosyllabic word lists have been developed and frequently used for native English speakers in the United States including W-22 and NU-6 [10]. The word recognition score is also used in many other languages including Arabic [1], Dutch [13], Persian [14], and Thai [15].
In most languages, monosyllable words are used for the assessment of speech recognition. However, there is a lack of availability of such resources for Urdu-speaking Pakistani populations. There is a dearth of standardized recording material for Urdu speech audiometry. This shortcoming prevents the audiologists from conducting appropriate and culturally relevant hearing assessments. This study aims to address the gap in clinical resources and enable accurate and culturally acceptable speech audiometry by evaluating and establishing a standardized recorded list of Urdu monosyllable words for WRS testing. Ultimately, it improves the quality of audiological care for Urdu speakers as well as the accuracy of diagnosis and treatment outcomes. Further, the lists will be tested in clinical settings and integrated into local Urdu audiology clinics and routine examinations. This ensures practical use, improves the accuracy of diagnostics, and eventually leads to better patient outcomes.
The objective of this study was to record and conduct a psychometric evaluation of Urdu monosyllable words for use in WRS testing.
Subjects and Methods
This study was conducted at a tertiary care hospital in Islamabad, Pakistan from May 1 to December 15, 2024. The research ethical review board approved the study (letter number: AMC-HI-PUB-ERC/24/10). A total of 30 native Urdu speakers (12 male and 18 female) participated, with ages ranging from 14 to 44 years (mean=27.16). All participants were native Urdu speakers, and although some had a different first language, they regularly used Urdu in daily conversations. All participants had a normal hearing with no history of ear infections or surgeries. To select participants, pure-tone audiometry and tympanometry tests were conducted. The puretone audiometry thresholds were ≤20 dB, and the tympanograms were normal. A set of 135 Urdu monosyllabic words was taken from a previous study [16]. These words were selected based on familiarity rating and content validation from expert and non-expert Urdu speakers.
Digital recording
The words were recorded by one female speaker out of three native Urdu speakers. The studio was contacted to arrange for the speakers. They were evaluated by two judges using recorded voices during their audition. The speaker with the best standard accent, pronunciation, and voice quality was chosen.
The recording took place in a double-walled, sound-treated room. A microphone (TLM 103 Studio, Neumann) was placed 6 cm away from the speaker at a 0° azimuth, encased in a 20 cm windscreen. The microphone was linked to an Apogee Ensemble Thunderbolt audio interface (Apogee Electronics). The Logic Pro software (Apple Inc.) was used to edit the recorded words. The recording was done using a 44.1 kHz sampling rate with 24-bit quantization.
The speaker was directed to reiterate each word at least five times, with a brief interval between repetitions. She was asked to speak naturally with a normal intonation pattern. Any words with mispronunciations or unnatural intonation were re-recorded. To minimize list effects, the initial and final repetitions of each word were removed. A native Urdu judge then assessed the quality of the production of three medial words, selecting the best version for listener evaluation. Subsequently, the intensity of each word was calibrated to align with the root mean square power of a 1 kHz reference tone. A 4-second interval was introduced between each word with no carrier phase, and the final edited words were stored as 24-bit WAV files.
Procedure
When it was confirmed that the participants had normal hearing. Audacity software (version 3.7.1, The Audacity Team) was used to play and randomly present the targeted words to listeners. Audacity is a free, open-source cross-platform software for recording and editing sounds [17]. The audio signals were routed from the computer to the external input of a Maico MA 42 audiometer (MAICO Diagnostics), then delivered to the participant’s test ear via TDH DD45 headphones (MAICO Diagnostics). Testing was conducted monaurally on the ear with the lowest audiometry threshold. If the threshold difference between ears was minimal, the test ear was selected randomly.
Listeners were given breaks during the assessment to reduce the risk of fatigue. The test took place in a double-walled, sound-treated room that complied with ANSI S3.1 standards for maximum permissible ambient noise levels in situations where the ears were not covered, using one-third octave bands. Prior to each data collection session, the audiometer’s external inputs were calibrated to 0 VU (volume unit) with a 1 kHz tone.
The participants had no prior exposure to the monosyllabic words before the assessment. Five lists of 27 words were made from 135 words. The lists were presented at nine distinct levels, with 5 dB steps between 0 to 40 dB HL. The order of the word lists and their presentation was randomized for each participant. Each word was presented equally at each intensity level to all participants.
The participants were instructed to say the words loud out what they heard. An Urdu-speaking judge scored their responses as correct or incorrect. The participants were given the following instructions in Urdu before the test began (translated to English): “You will listen to Urdu monosyllabic words at various loudness levels. The words may be challenging for you to hear at lower levels. Please listen attentively and then repeat the word you have heard. Guess the word if you’re not sure. If you can’t guess, remain silent until the next word. Any questions?”
Results
Once the raw data was analyzed, each monosyllabic word was rated by frequency of correct responses across all participants and intensity levels. Words with higher correct recognition were given more weight. The 100 most recognizable words were divided into two lists of 50 words each, and then into four lists of 25 words each. The S-curve distribution pattern was used to create psychometrically equivalent lists. The randomized 50-word lists are shown in Supplementary Table 1 (in the online-only Data Supplement).
Once the two balanced 50-word lists were created, the four 25-word half-lists were subsequently developed. Each pair of half-lists was derived from one of the full lists, ensuring similar psychometric audibility. The initial word of each full list was randomly allocated to either half-list A or B. The remaining words were allocated using the S-curve distribution pattern. The Urdu monosyllabic half-lists are presented in Supplementary Table 2 (in the online-only Data Supplement).
After constructing the lists, the regression slope and intercept were determined for each of the two full lists and four half-lists. A modified regression equation was used to determine the percentage of correct recognition at each intensity level:
Finally, psychometric functions for each full and half-list were generated (Table 1).
In Table 1, the intensity threshold, slope at threshold, and slope from 20% to 80% were determined for the list and half list using the following equation:
where p is proportion of correct recognition, a is intercept, b is slop, and i presentation of intensity levels in dB HL.
The psychometric function slope for the full lists, between 20% and 80%, ranged from 4.62%/dB to 4.94%/dB. For the half-lists, the slope ranged from 4.52%/dB to 5.32%/dB.
A two-way chi-square (χ2) analysis was performed, using intensity and lists as independent variables, and revealed no significant differences between the full lists, χ2(64, n=30)=72.00, p=0.230, and the half-lists, χ2(64, n=30)=72.00, p=0.230.
Although the WRS lists and half-lists did not show significant differences in audibility, small adjustments to the intensity levels were made to equalize the 50% recognition threshold for each list to the midpoint level (15.55 dB HL), based on the average threshold of the full lists and half-lists. These intensity adjustments are presented in Table 1. The psychometric functions of the lists and half-lists after intensity adjustments are shown in Fig. 1, while Fig. 2 illustrates the mean psychometric function before and after the intensity adjustments.
Discussion
The aim of this study was to digitally record and psychometrically equate lists and half-lists of Urdu monosyllabic words for WRS testing. The results of the two-way chi-square test indicate that the full lists and half-lists are homogeneous in terms of audibility and the slope of the psychometric function in subjects with normal hearing.
The results of the current study are comparable to other languages. The mean psychometric function at 50% was 6.04%/dB and the slope from 20% to 80% for the Urdu monosyllable words list was 4.78%/dB, which is slightly steeper than previously reported values for word recognition (WR) materials in the English language. Beattie, et al. [18] found a mean slope of 4.2%/dB for NU-6 and CID W-22 word lists have a 4.6%/dB slope from 20% to 80%. Wilson and Oyler [19] found somewhat different mean slopes for NU-6 and CID W-22 word lists (mean: 4.4%/dB and 4.8%/dB, respectively) using Auditec of St. Louis CD recordings. The variation in the slope of the Urdu WR list may be because of differences in word length and linguistic structure. It is important to compare the psychometric function slope of Urdu materials to those of English, which has had standardized word recognition tests for almost 50 years. Researchers have extensively evaluated the validity and reliability of many English tests [10,20]. Urdu possesses distinct linguistic features that influence the perceptual attributes of monosyllabic words. Languages characterized by longer or more phonetically complex syllables typically demonstrate steeper psychometric slopes [7].
Harris, et al. [21] found that Korean monosyllabic word lists had mean slopes of 5.0%/dB for male speakers and 5.1%/dB for female speakers. Harris and colleagues [22] created monosyllabic materials for native Polish speakers. The male and female lists had mean slopes of 5.8%/dB and 5.9%/dB, respectively. The recording quality and speaker characteristics (male vs. female speakers in Korean and Polish studies) affect the slope. Word recognition across languages may be subject to perceptual differences due to variations in articulation, prosody, and voice quality among native speakers, which may account for minor slope variations [23].
The mean psychometric function slope at 50% for Vietnamese monosyllabic lists and half-lists was 5.1%/dB for male talkers and 5.2%/dB for female talkers. The average psychometric function slopes for monosyllabic lists and half-lists were 4.4%/dB for male recordings and 4.5%/dB for female recordings at 20%–80% [24]. Linguistic and phonological differences explain the relatively small slope differences between Urdu and Vietnamese. Vietnamese is tonal, so tone affects word meaning. Since tonal and phonemic elements are involved in word recognition, this tonal arrangement may increase sensitivity as intensity changes [25]. Conversely, Urdu uses consonantal and vowel patterns that respond differently to sound intensity [26].
The psychometric function slope of the Urdu monosyllable WR list is comparable to other languages. These findings show that the Urdu WR materials created in this work can capture complex word recognition performance variations across intensity levels. The minor variation in slope can be attributed to the individual phonetic characteristics of Urdu, such as its monosyllabic structure, which may affect the rate at which recognition improves as the presentation level increases. These findings enable cross-linguistic comparisons of WR materials and emphasize the necessity for language-specific WR test development. The observed differences in slope were minimal, indicating that the lists and half-lists of Urdu words created in this study are appropriate for the Urdu-speaking population.
Limitations and suggestions
The present word lists and half-list are suitable for Urdu speakers; however, lexical items are only for adults, not children. Additional study is needed to produce word recognition materials for Urdu-speaking children. Testing Urdu-speaking people with words presented in background noise is another recommendation for future research.
Conclusion
This project developed audiometric materials for word recognition score testing in Urdu, thereby enhancing the existing testing resources for the language. The materials are located on a CD titled Urdu Word Recognition Test Material. Additionally, these materials have documented psychometric characteristics, establishing a baseline for the comparison and assessment of future Urdu materials. These efforts are expected to enhance the validity and efficiency of Urdu speech audiometry testing.
Supplementary Materials
The online-only Data Supplement is available with this article at https://doi.org/10.7874/jao.2025.00024.
Supplementary Table 1.
Urdu monosyllable 50 words lists and their meaning in English
Supplementary Table 2.
Urdu monosyllable half lists
Notes
Conflicts of Interest
The authors have no financial conflicts of interest.
Author Contributions
Conceptualization: all authors. Data curation: Muhammad Zubair, Waqar Ahmed Awan. Formal analysis: Muhammad Zubair, Waqar Ahmed Awan. Investigation: Muhammad Zubair. Methodology: all authos. Project administration: Muhammad Zubair, Waqar Ahmed Awan. Resources: all authors. Supervision: Satheesh Babu Nataranjan, Waqar Ahmed Awan. Validation: Satheesh Babu Nataranjan, Waqar Ahmed Awan. Visualization: Muhammad Zubair. Writing—original draft: Muhammad Zubair. Writing—review & editing: Satheesh Babu Nataranjan, Waqar Ahmed Awan. Approval of final manuscript: all authors.
Funding Statement
None
Acknowledgments
None