The current LipREAD demonstrator has been implemented with a “client-server” architecture. The server sits in a cloud resident data centre, accepts videos from clients and returns the text of speech contained in the video along with probability score (likelihood of the text returned being the actual words that were spoken) LipREAD offers a REST API to the server capability which can be invoked from any type of client, e.g. laptop, smart mobile device, in car entertainment system, which has data connectivity to the internet.
For some use cases the deployment environment will be such that data connectivity bandwidth will be restricted – and therefore sending of video over the data connection to the Liopa cloud server could cause unacceptable latency. In the future, a client SDK will be provided which processes video on the client (feature extraction etc.) leaving only a small amount of data per video to be sent to the LipREAD server for speech recognition to be performed.
On Device model
For some use cases, e.g. in car command and control recognition, data connectivity to the internet is not always available. An on-device variant of LipREAD will be available to be run on, for example, in the “head end” of an in-car entertainment and control system. Such as system would need to have infrequent access to internet connectivity to perform operations such as Universal Model updates.