Skip to main content

Research Repository

Advanced Search

All Outputs (18)

Fast speech intelligibility estimation using a neural network trained via distillation (2020)
Presentation / Conference
Cox, T., Bailey, W., & Tang, Y. (2020, January). Fast speech intelligibility estimation using a neural network trained via distillation. Poster presented at 12th Speech in Noise Workshop, Toulouse, France

Objective measures of speech intelligibility have many uses, including the evaluation of degradation during transmission and the development of processing algorithms. One intrusive approach is to use a method based on the audibility of speech glimpse... Read More about Fast speech intelligibility estimation using a neural network trained via distillation.

Background adaptation for improved listening experience in broadcasting (2019)
Presentation / Conference
Tang, Y., Cox, T., Fazenda, B., Liu, Q., & Wang, W. (2019, May). Background adaptation for improved listening experience in broadcasting. Presented at 44th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, UK

The intelligibility of speech in noise can be improved by modifying the speech. But with object-based audio, there is the possibility of altering the background sound while leaving the speech unaltered. This may prove a less intrusive approach, affor... Read More about Background adaptation for improved listening experience in broadcasting.

Improving intelligibility prediction under informational masking using an auditory saliency model (2018)
Presentation / Conference
Tang, Y., & Cox, T. (2018, September). Improving intelligibility prediction under informational masking using an auditory saliency model. Presented at International Conference on Digital Audio Effects, Aveiro, Portugal

The reduction of speech intelligibility in noise is usually dominated by energetic masking (EM) and informational masking (IM). Most state-of-the-art objective intelligibility measures (OIM) estimate intelligibility by quantifying EM. Few measures m... Read More about Improving intelligibility prediction under informational masking using an auditory saliency model.

Speech-to-screen : spatial separation of dialogue from noise towards improved speech intelligibility for the small screen (2018)
Presentation / Conference
Demonte, P., Tang, Y., Hughes, R., Cox, T., Fazenda, B., & Shirley, B. (2018, May). Speech-to-screen : spatial separation of dialogue from noise towards improved speech intelligibility for the small screen. Presented at 144th International Pro Audio Convention (AES Milan 2018), Milan, Italy

Can externalizing dialogue when in the presence of stereo background noise improve speech intelligibility? This has been investigated for audio over headphones using head-tracking in order to explore potential future developments for small-screen dev... Read More about Speech-to-screen : spatial separation of dialogue from noise towards improved speech intelligibility for the small screen.

Automatic speech-to-background ratio selection to maintain speech intelligibility in broadcasts using an objective intelligibility metric (2018)
Journal Article
Tang, Y., Fazenda, B., & Cox, T. (2018). Automatic speech-to-background ratio selection to maintain speech intelligibility in broadcasts using an objective intelligibility metric. Applied Sciences, 8(1), 59. https://doi.org/10.3390/app8010059

While mixing, sound producers and audio professionals empirically set the speech-to-background ratio (SBR) based on rules of thumb and their own perception of sounds. There is no guarantee that the speech content will be intelligible for the general... Read More about Automatic speech-to-background ratio selection to maintain speech intelligibility in broadcasts using an objective intelligibility metric.

A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones (2017)
Journal Article
Tang, Y., Liu, Q., Wang, W., & Cox, T. (2018). A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones. Speech Communication, 96, 116-128. https://doi.org/10.1016/j.specom.2017.12.005

A non-intrusive method is introduced to predict binaural speech intelligibility in noise directly from signals captured using a pair of microphones. The approach combines signal processing techniques in blind source separation and localisation, with... Read More about A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones.

A study on the relationship between the intelligibility and quality of algorithmically-modified speech for normal hearing listeners (2017)
Journal Article
Tang, Y., Arnold, C., & Cox, T. (2017). A study on the relationship between the intelligibility and quality of algorithmically-modified speech for normal hearing listeners. Journal of Otorhinolaryngology, Hearing and Balance Medicine, 1(1), https://doi.org/10.3390/ohbm1010005

This study investigates the relationship between the intelligibility and quality of modified speech in noise and in quiet. Speech signals were processed by seven algorithms designed to increase speech intelligibility in noise without altering speech... Read More about A study on the relationship between the intelligibility and quality of algorithmically-modified speech for normal hearing listeners.

Learning static spectral weightings for speech intelligibility enhancement in noise (2017)
Journal Article
Tang, Y., & Cooke, M. (2018). Learning static spectral weightings for speech intelligibility enhancement in noise. Computer Speech and Language, 49, 1-16. https://doi.org/10.1016/j.csl.2017.10.003

Near-end speech enhancement works by modifying speech prior to presentation in a noisy environment, typically operating under a constraint of limited or no increase in speech level. One issue is the extent to which near-end enhancement techniques req... Read More about Learning static spectral weightings for speech intelligibility enhancement in noise.

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions (2017)
Presentation / Conference
Liu, Q., Wang, W., Jackson, P. J., & Tang, Y. (2017, August). A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions. Presented at EUSIPCO 2017, the 25th European Signal Processing Conference, Kos Island, Greece

Deep neural networks (DNN) have recently been shown to give state-of-the-art performance in monaural speech enhancement. However in the DNN training process, the perceptual difference between different components of the DNN output is not fully ex... Read More about A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions.

The effect of situation-specific non-speech acoustic cues on the intelligibility of speech in noise (2017)
Presentation / Conference
Ward, L., Shirley, B., Tang, Y., & Davies, W. (2017, August). The effect of situation-specific non-speech acoustic cues on the intelligibility of speech in noise. Presented at INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden

In everyday life, speech is often accompanied by a situation-specific acoustic cue; a hungry bark as you ask ‘Has anyone fed the dog?’. This paper investigates the effect such cues have on speech intelligibility in noise and evaluates their interactio... Read More about The effect of situation-specific non-speech acoustic cues on the intelligibility of speech in noise.

A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers (2016)
Journal Article
Tang, Y., Cooke, M., Fazenda, B., & Cox, T. (2016). A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers. ˜The œJournal of the Acoustical Society of America (Online), 140(3), 1858-1870. https://doi.org/10.1121/1.4962484

One criterion in the design of binaural sound scenes in audio production is the extent to which the intended speech message is correctly understood. Object-based audio broadcasting systems have permitted sound editors to gain more access to the metad... Read More about A metric for predicting binaural speech intelligibility in stationary noise and competing speech maskers.

Glimpse-based metrics for predicting speech intelligibility in additive noise conditions (2016)
Presentation / Conference
Tang, Y., & Cooke, M. (2016, September). Glimpse-based metrics for predicting speech intelligibility in additive noise conditions. Presented at 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, San Francisco, USA

The glimpsing model of speech perception in noise operates by recognising those speech-dominant spectro-temporal regions, or glimpses, that survive energetic masking; hence, a speech recognition component is an integral part of the model. The current... Read More about Glimpse-based metrics for predicting speech intelligibility in additive noise conditions.

Predicting binaural speech intelligibility from signals estimated by a blind source separation algorithm (2016)
Presentation / Conference
Liu, Q., Tang, Y., Jackson, P., & Wang, W. (2016, September). Predicting binaural speech intelligibility from signals estimated by a blind source separation algorithm. Presented at 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016, San Francisco, USA

State-of-the-art binaural objective intelligibility measures (OIMs) require individual source signals for making intelligibility predictions, limiting their usability in real-time online operations. This limitation may be addressed by a blind source... Read More about Predicting binaural speech intelligibility from signals estimated by a blind source separation algorithm.

Evaluating a distortion-weighted glimpsing metric for predicting binaural speech intelligibility in rooms (2016)
Journal Article
Tang, Y., Hughes, R., Fazenda, B., & Cox, T. (2016). Evaluating a distortion-weighted glimpsing metric for predicting binaural speech intelligibility in rooms. Speech Communication, 82, 26-37. https://doi.org/10.1016/j.specom.2016.04.003

A distortion-weighted glimpse proportion metric (BiDWGP) for predicting binaural speech intelligibility were evaluated in simulated anechoic and reverberant conditions, with and without a noise masker. The predictive performance of BiDWGP was compare... Read More about Evaluating a distortion-weighted glimpsing metric for predicting binaural speech intelligibility in rooms.

A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions (2015)
Presentation / Conference
Tang, Y., Cooke, M., Fazenda, B., & Cox, T. (2015, September). A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions. Presented at Interspeech 2015, Dresden, Germany

A distortion-weighted glimpsing metric developed for estimating monaural speech intelligibility is extended to predict binaural speech intelligibility in noise. Two aspects of binaural listen- ing, the better ear effect and the binaural advantage, ar... Read More about A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions.

Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech (2015)
Journal Article
Tang, Y., Cooke, M., & Valentini-Botinhao, C. (2015). Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech. Computer Speech and Language, 35, 73-92. https://doi.org/10.1016/j.csl.2015.06.002

Several modification algorithms that alter natural or synthetic speech with the goal of improving intelligibility in noise have been proposed recently. A key requirement of many modification techniques is the ability to predict intelligibility, both... Read More about Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech.

A corpus of noise-induced word misperceptions for Spanish (2015)
Journal Article
Tóth, M., García Lecumberri, M., Tang, Y., & Cooke, M. (2015). A corpus of noise-induced word misperceptions for Spanish. ˜The œJournal of the Acoustical Society of America (Online), 137(2), EL184-EL189. https://doi.org/10.1121/1.4905877

Word misperceptions are valuable in designing and evaluating detailed computational models of speech perception, especially when a number of listeners agree on the misperceived word. The current paper describes the elicitation of a corpus of Spanish... Read More about A corpus of noise-induced word misperceptions for Spanish.

From macroscopic to microscopic glimpse-based models of intelligibility prediction
Journal Article
Cooke, M., Tang, Y., & Toth, M. (in press). From macroscopic to microscopic glimpse-based models of intelligibility prediction. ˜The œJournal of the Acoustical Society of America (Online), 139(4), 2187-2187. https://doi.org/10.1121/1.4950509

Miller and Licklider's explorations of the intelligibility of temporally interrupted speech, and later studies extending their findings to the spectro-temporal plane, have shown how the twin factors of sparseness and redundancy confer a high degree o... Read More about From macroscopic to microscopic glimpse-based models of intelligibility prediction.