Voice-driven apps like Alexa can be augmented by this lip-reading tech

Voice applications are all the rage right now, with Alexa, Siri, Cortana and Google Assistant seeing an upsurge of new users. “Meanwhile, […]

November 20, 2018

Techwatch

Voice applications are all the rage right now, with Alexa, Siri, Cortana and Google Assistant seeing an upsurge of new users.

“Meanwhile, voice-activation is becoming more popular in cars,” says Liopa’s co-founder Liam McQuillan.

For the automotive industry alone, the voice recognition market is projected to be worth $3.9 billion USD by 2025.

For decades now, computer scientists have championed voice as being the holy grail of the human/computer interface.

But it’s taken a long time to come to fruition, due to the intricacies of languages, and the fact that no two people speak exactly alike. The introduction of machine learning techniques have accounted for a huge leap forwards in the technology. Higher processing capability through GPUs, more training data, and sophisticated developments in Google DeepMind have made their impact.

Liam McQuillan and his team have used machine learning to create a unique automated lip reading application, called Liopa.

Today, speech recognition is based on analysis of the speaker’s audio signal. “These audio systems can be very accurate but the problem is that, when there’s background noise, the accuracy and usability degrades rapidly,” Liam says.

Instead, Liopa’s technology analyses a video of the speaker’s lip movements. Using an AI-based core, the software deciphers what the person is saying. The technology is agnostic to audio noise and, when combined with audio speech recognition, will improve accuracy of the overall system.

It works on any device with a standard camera, especially in situations where you can train the camera directly at the speaker’s face. It helps in real-world noisy environments – for example, using a voice activation system in a car, or a virtual assistant in a restaurant, or outside.

“We’re commercialising research that’s been done for the past 10 years at QUB on lip reading technology,” says Liam.

Liopa is the product of two academic researchers (Drs Darryl Stewart and Fabian Campbell-West) joining forces with two proven commercial entrepreneurs, Liam and his colleague Richard McConnell.

Liam tells me the scope of Liopa is still being determined.

“We want to develop a product that supports a large vocabulary of the 130,000 words in the English language, along with other languages, and will perform in real time,” Liam says. “That’s a little bit down the line, but there are plenty of lucrative use cases that we’re addressing initially, that require less vocab support.”

He says that voice activated cars degrade over time because the car starts to emit more engine noise, and lets in more road noise. “It’s been shown that the accuracy of in-car voice-activation degrades badly with passengers, whether the radio is on, and the age of vehicle.”

Liam says, “In cars, we’d combine our solution into AVSR – audio/visual speech recognition.”

Another important use case involves checking for someone’s identification.

Liam explains, “There is something called liveness checking for digital identification. It’s when a facial recognition system needs to ascertain that there’s a live person presenting and not a high-res static image. If someone had a picture of you, they could fool the system into ID’ing you.”

Liopa’s technology can ensure these systems are secure to almost 100% degree of accuracy.

“When the person presents to the screen, we pop up a random sequence of digits and ask that person to mime or speak those digits into the camera. If they’ve said the right digits, you can be pretty sure it’s them,” says Liam.

Liopa could help in healthcare, for patients with trouble speaking due to vocal chord or throat injuries. In the security industry, an important application involves CCTV footage.

Liam says, “For a lot of CCTV usage, it’s illegal to capture the audio of what someone is saying, but even if it’s not illegal, the microphone is too far away to hear their voice. Using HD cameras and our system, you could analyse video footage and ascertain what someone is saying – which could provide key insights into what is actually happening.”

Liopa has been incorporated since 2015 and its first commercial trials will kick off in the next few weeks. Liam cannot say who they are, but one trialist is a company “several orders of magnitude bigger than us.”

Will the sales model be based on throughput?

“Yes, we’re going for a usage-based licensing model – charged per transaction,” he says.

Liopa has already enjoyed a substantial seed funding round. Liam says, “We’ll be raising a more substantial round by Q2 of next year.”