C/Net has an interesting story on computers that read lips. Intel has released software in their open library that does just that. It could lead to better voice recognition applications.
The Audio Visual Speech Recognition (AVSR) software tracks a speaker’s face and mouth movements. By matching these movements with speech, the application can provide a computer with enough data to respond to voice recognition commands, even when these are given in noisy environments. The AVSR program is part of the OpenCV computer vision library, a collection of open-source applications and tools that help computers interpret visual data.
Computer companies have tried to popularize voice recognition applications for years, but have been stymied by a shortfall in processing power and software. Both have improved. Johns Hopkins, Microsoft and IBM are researching automatic lip-reading by computers.
Microsoft Research (demos and white paper), has developed a prototype application called GWindows, with which a person can scroll through files or move windows though a combination of voice commands and hand gestures. A video camera follows moving objects, such as a hand or pointer, that come within 20 inches of the screen. The application interprets any hand movements as computer commands: Placing a finger over a window and then moving a finger left will move the window left, for example. If a voice command such as “scroll” is given, the computer will combine the finger and voice commands and scroll down. No special gloves are needed.
Microsoft’s prototype application works better than a simple voice recognition system because the gestures improve accuracy. The computer can follow voice commands in a crowded room filled with multiple conversations and lots of interference.
Such visual signal software relies in part on Bayesian mathematics. In Bayesian math (XML format), if a computer “sees” a sweeping hand gesture toward the left a number of times, it will consistently interpret that gesture as a command to move a file toward the left.
Intel has other visual applications to AVSR in the works. It uses cameras to monitor hospital patients for risk of strokes and uses a security camera to detect potential criminals when it sees something unusual.
Intel has released a test version of a technical library for building Bayesian networks, said Gary Bradski, a senior researcher in Intel’s Microprocessor Research Labs who helped create the OpenCV library. A final version of the technical library, called the Probability Network Library, will come out by the end of the year, he said.
Daily Wireless has more on text to speech translation products, natural language input, internet kiosks, talking books and tracking ship movements.






