Visual Phrase Recognition

As well as analysing the lip movements to verify the user’s identity, the LipVerify performs visual phrase recognition (VPR) to ensure the challenge phrase has been entered correctly. This visual phrase recognition element has an application as a supporting technology for audio-based speech recognition (ASR) platforms. In a mobile environment, where environmental factors may impact the quality of the voice analysis, the Liopa solution can provide extra validation on the words/digits spoken by the user, providing audio-visual speech recognition(AVSR).

Our VPR technology is a result of  research carried out by Dr Darryl Stewart (Chief Scientist) and his research team at Queen’s University, Belfast.

As highlighted above, a common weakness of most modern ASR systems is their inability to cope well with signal corruption, and there are many ways in which this may occur. There may be other sound sources (e.g., background noise, other people speaking), wave reflections (e.g., reverberation or echoes), or transmission channel distortions caused by the hardware.  (usually the microphone) used to capture the speech signal. Thus, one of the main challenges in the ASR domain is how to develop systems that are more robust to the kinds of noise that are typically encountered in real-world situations.  Liopa’s extensive research concluded that the integration of video (of the speaker’s lips) using certain modelling techniques provided marked improvements in Digit/Word detection where the audio signal was corrupted.  Testing was performed using large, publicly available audio-visual databases for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The modelling approach used maintained robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.