Vlingo: Voice Control

„computer, tell me…“, ist seit Star Trek eine der Idealformen eines einfachen Mensch-Maschine-Interfaces. Schon öfters haben Unternehmen Technologien gesucht, die so eine einfache Bedienung von Geräten ermöglichen kann, jedoch war der Reifegrad bisher nicht wirklich für den Massenmarkt akzeptabel. Gerade mobile Geräte mit beschränkten Eingemöglichkeiten sind prädistiniert für solche Bedienkonzepte.

Vlingo ist ein neuer Anlauf eine Technologie und eine Applikation zu schaffen die eine stimmenbasierte Steuerung von „höheren“ Applikation wie SMS schreiben, etc zu ermöglichen. Ein Demo wie die Technologie funktioniert und welche Applikationen Vlingo bereits stellt kann man sich hier angucken: Klick

Beendruckend neben der tollen Demo ist auch das technische Konzept hinter Vlingo: Hierarchical Language Model Based Speech Recognition (HLMs)

We have replaced constrained grammars and statistical language models with very large vocabulary (millions of words) Hierarchical Language Models (HLMs). These HLMs are based on well-defined statistical models to predict what words users are likely to say and how words are grouped together (for example, „let’s meet at ___“ is likely to be followed by something like „1 pm“ or the name of a place). While there are no hard constraints, the models are able to take into account what this and other users have spoken in the particular text box in the particular application, and therefore improve with usage. Unlike previous generations of statistical language models, the new HLM technology being developed by vlingo scales to tasks requiring the modeling of millions of possible words (such as open web search, directory assistance, navigation, or other tasks where users are likely to use any of a very large number of words).

In order to achieve high accuracy, vlingo makes use of significant amounts of automatic and continual adaptation. In addition to adapting the HLMs, the system adapts to many user and application attributes such as learning the speech patterns of individuals and groups of users, learning new words, learning which words are more likely to be spoken into a particular application or by a particular user, and learning pronunciations of words based on usage. Adaptation is applied to individual users (for example, the system learns over time that a particular user tends to ask for Mexican food) as well as across users (a first-time user with a Southern accent benefits from other users who have spoken into the system with a southern accent). Unlike other speech recognition technologies that require intensive manual labor to tune recognition inputs, vlingo adaptation is automated and comprehensive, leading to continual improvements for users. The adaptation process can be seen in the figure below.

Zur Technologie und den Vlingo Applikation gibt es noch ein handliches Whitepaper zum Mitnehmen: Klick.

Schreibe einen Kommentar

Eine Antwort auf „Vlingo: Voice Control“

Schreibe einen Kommentar