UK Researchers Make Progress in Speech Recognition, Aiding Speech Impairment Patients

UK Researchers Make Progress in Speech Recognition, Aiding Speech Impairment Patients
UK Researchers Make Progress in Speech Recognition, Aiding Speech Impairment Patients (Image: MobiHealthNews)

Research conducted by engineers and physicists at the University of Glasgow could revolutionize the way we approach speech recognition technology, offering new possibilities for individuals with speech impairments and advancing voice recognition applications.

Revolutionizing Speech Recognition: Glasgow’s Groundbreaking Research

In this groundbreaking study, researchers delved into the intricate physical processes responsible for producing speech sounds. They closely examined the internal and external muscle movements involved in speech by observing volunteers who used various wireless sensing devices.

The University of Glasgow team, pioneers in this field, are generously sharing the extensive data obtained from 400 minutes of analysis with fellow researchers. This collaborative effort aims to fuel the development of innovative technologies centered around speech recognition.

The potential applications of these emerging technologies are vast, particularly in assisting individuals facing speech challenges or voice loss. By employing sensors to interpret lip and facial movements, future devices could synthesize a voice for those in need.

One notable application includes the ability of voice-controlled devices, such as smartphones, to read users’ lips while they speak silently. This breakthrough could usher in an era of silent speech recognition, enhancing the accessibility of technology.

University of Glasgow research can aid speech impairment patients
University of Glasgow research can aid speech impairment patients (Image: University of Glasgow)

Moreover, the dataset holds promise for improving speech recognition in noisy environments, enhancing the quality of video and phone calls. It could even contribute to bolstering security measures for sensitive transactions, where unique facial movements act as a personalized identifier, similar to a fingerprint, before unlocking confidential information.

The researchers detail their comprehensive multi-modal analysis of speech formation in a recently published paper in the journal Scientific Data, a publication by Springer Nature. To gather their valuable data, 20 volunteers participated by articulating vowel sounds, single words, and complete sentences, while their facial movements and voices were meticulously recorded and analyzed.

Intricacies of Speech Production: A Multimodal Approach

The researchers utilized two distinct radar technologies, namely impulse radio ultra wideband (IR-UWB) and frequency modulated continuous wave (FMCW), to capture detailed images of volunteers’ facial skin, tongue, and larynx movements as they spoke.

In addition to radar technologies, the team incorporated a laser speckle detection system to scan vibrations on the surface of the volunteers’ skin. This system utilized a high-speed camera to capture the vibrations of emitted laser speckle. Simultaneously, a Kinect V2 camera, capable of measuring depth, recorded the deformations of volunteers’ mouths as they articulated various sounds.

Collaborating with researchers from the University of Dundee and University College London, the University of Glasgow researchers synchronized and compiled the dataset. Named RVTALL, the dataset encapsulates radio frequency, visual, text, audio, laser, and lip landmark information.

To ensure the accuracy and reliability of the dataset, the researchers employed signal processing and machine learning techniques for validation. This meticulous approach allowed them to construct a uniquely detailed picture of the physical mechanisms that enable individuals to articulate sounds with precision.

Professor Qammer Abbasi, of the University of Glasgow’s James Watt School of Engineering, is the paper’s corresponding author. Professor Abbasi has previously led research on speech recognition which used multimodal sensing to read lip movements through masks.
Professor Abbasi said: “This type of multimodal sensing for speech recognition is a still a relatively new field of research, and our review of existing public data found that there wasn’t much available to help support future developments. What we set out to do in collecting the RVTALL dataset was create a much more complete set of analyses of the visible and invisible processes which create speech to enable new research breakthroughs, and we’re pleased that we’re now able to share it.”
Professor Muhammad Imran, leader of the University of Glasgow’s Communications, Sensing and Imaging hub, is a co-author of the paper. He said: “Contactless sensing has huge potential for improving speech recognition and creating new applications in communications, healthcare and digital security. We’re keen to explore in our own research group here at the University of Glasgow how we can build on previous breakthroughs in lip-reading using multi-modal sensors and find new uses everywhere from homes to hospitals.”
The team’s paper, titled ‘A comprehensive multimodal dataset for contactless lip reading and acoustic analysis’, is published in Scientific Data. The research was supported by funding from the Engineering and Physical Sciences Research Council and the Royal Society of Edinburgh.

Google News Icon

Get latest updates on Google News

Source(s): University of Glasgow

The information above is curated from reliable sources, modified for clarity. Slash Insider is not responsible for its completeness or accuracy. Please refer to the original source for the full article. Views expressed are solely those of the original authors and not necessarily of Slash Insider. We strive to deliver reliable articles but encourage readers to verify details independently.