The MIT Technology Review updates us on the progress in computer lip reading:
In one project, a team from the University of Oxford’s Department of Computer Science has developed a new artificial-intelligence system called LipNet. As Quartz reported, its system was built on a data set known as GRID, which is made up of well-lit, face-forward clips of people reading three-second sentences….When tested, the system was able to identify 93.4 percent of words correctly. Human lip-reading volunteers asked to perform the same tasks identified just 52.3 percent of words correctly.
….Another team from Oxford’s Department of Engineering Science, which has been working with Google DeepMind, has bitten off a rather more difficult task. Instead of using a neat and consistent data set like GRID, it’s been using a series of 100,000 video clips taken from BBC television. These videos have a much broader range of language, with far more variation in lighting and head positions….The Oxford and DeepMind team managed to create an AI that was able to identify 46.8 percent of all words correctly.
….Differences aside, both experiments show AI vastly outperforming humans at lip-reading.
For some reason, this had never occurred to me as a near-term use of AI—despite Stanley Kubrick’s warning half a century ago about the dangers of berserk computers who can lip read.
But of course, it makes total sense. Reliable, widespread use of computer lip reading is still a little ways off, but it’s pretty obvious that it will start to work tolerably well in real-life situations within a few more years. The implications for our future panopticon society are pretty obvious.