S3A: Future Spatial Audio for an Immersive Listener Experience at Home

Fast speech intelligibility estimation using a neural network trained via distillation (2020)
Presentation / Conference
Cox, T., Bailey, W., & Tang, Y. (2020, January). Fast speech intelligibility estimation using a neural network trained via distillation. Poster presented at 12th Speech in Noise Workshop, Toulouse, France

Objective measures of speech intelligibility have many uses, including the evaluation of degradation during transmission and the development of processing algorithms. One intrusive approach is to use a method based on the audibility of speech glimpse... Read More about Fast speech intelligibility estimation using a neural network trained via distillation.

Pupil dilation reveals changes in listening effort due to energetic and informational masking (2019)
Presentation / Conference
Woodcock, J., Fazenda, B., Cox, T., & Davies, W. (2019, September). Pupil dilation reveals changes in listening effort due to energetic and informational masking. Presented at ICA 2019, Aachen, Germany

Pupil dilation has previously been shown to be a useful involuntary marker of listening effort. An inverse relationship between pupil diameter and signal to noise ratio has been shown when speech is energetically masked by noise. The work reported he... Read More about Pupil dilation reveals changes in listening effort due to energetic and informational masking.

Perceptual audio evaluation of media device orchestration using the multi-stimulus ideal profile method (2018)
Presentation / Conference
Wilson, A., Cox, T., Zacharov, N., & Pike, C. (2018, October). Perceptual audio evaluation of media device orchestration using the multi-stimulus ideal profile method. Presented at Audio Engineering Society 145th Convention, New York, USA

The evaluation of object-based audio reproduction methods in a real-world context remains a challenge as it is difficult to separate the effects of the reproduction system from the effects of the audio mix rendered for that system. This is often comp... Read More about Perceptual audio evaluation of media device orchestration using the multi-stimulus ideal profile method.

Improving intelligibility prediction under informational masking using an auditory saliency model (2018)
Presentation / Conference
Tang, Y., & Cox, T. (2018, September). Improving intelligibility prediction under informational masking using an auditory saliency model. Presented at International Conference on Digital Audio Effects, Aveiro, Portugal

The reduction of speech intelligibility in noise is usually dominated by energetic masking (EM) and informational masking (IM).
Most state-of-the-art objective intelligibility measures (OIM) estimate intelligibility by quantifying EM. Few measures m... Read More about Improving intelligibility prediction under informational masking using an auditory saliency model.

Qualitative evaluation of media device orchestration for immersive spatial audio reproduction (2018)
Journal Article

The challenge of installing and setting up dedicated spatial audio systems can make it difficult to deliver immersive listening experiences to the general public. However, the proliferation of smart mobile devices and the rise of the Internet of Thin... Read More about Qualitative evaluation of media device orchestration for immersive spatial audio reproduction.

Automatic speech-to-background ratio selection to maintain speech intelligibility in broadcasts using an objective intelligibility metric (2018)
Journal Article

While mixing, sound producers and audio professionals empirically set the speech-to-background ratio (SBR) based on rules of thumb and their own perception of sounds. There is no guarantee that the speech content will be intelligible for the general... Read More about Automatic speech-to-background ratio selection to maintain speech intelligibility in broadcasts using an objective intelligibility metric.

An audio-visual system for object-based audio : from recording to listening (2018)
Journal Article

Object-based audio is an emerging representation for
audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for im... Read More about An audio-visual system for object-based audio : from recording to listening.

A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones (2017)
Journal Article

A non-intrusive method is introduced to predict binaural speech intelligibility in noise directly from signals captured using a pair of microphones. The approach combines signal processing techniques in blind source separation
and localisation, with... Read More about A non-intrusive method for estimating binaural speech intelligibility from noise-corrupted signals captured by a pair of microphones.

A study on the relationship between the intelligibility and quality of algorithmically-modified speech for normal hearing listeners (2017)
Journal Article
Tang, Y., Arnold, C., & Cox, T. (2017). A study on the relationship between the intelligibility and quality of algorithmically-modified speech for normal hearing listeners. Journal of Otorhinolaryngology, Hearing and Balance Medicine, 1(1), https://doi.org/10.3390/ohbm1010005

This study investigates the relationship between the intelligibility and quality of modified speech in noise and in quiet. Speech signals were processed by seven algorithms designed to increase speech intelligibility in noise without altering speech... Read More about A study on the relationship between the intelligibility and quality of algorithmically-modified speech for normal hearing listeners.

A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions (2017)
Presentation / Conference
Liu, Q., Wang, W., Jackson, P. J., & Tang, Y. (2017, August). A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions. Presented at EUSIPCO 2017, the 25th European Signal Processing Conference, Kos Island, Greece

Deep neural networks (DNN) have recently been
shown to give state-of-the-art performance in monaural speech
enhancement. However in the DNN training process, the perceptual
difference between different components of the DNN
output is not fully ex... Read More about A perceptually-weighted deep neural network for monaural speech enhancement in various background noise conditions.

S3A: Future Spatial Audio for an Immersive Listener Experience at Home

People Involved