IDF tech uses AI to watch and describe objects in video footage
New system could eclipse human monitors’ ability to watch security camera feeds and identify threats, R&D officer says
Shoshanna Solomon was The Times of Israel's Startups and Business reporter
The R&D department of the Israeli Defense Forces has come up with a way to enable the automatic transcription of videos into text, a method that one day might be used for surveillance footage.
The technology, which is based on artificial intelligence, is able to read and understand video images and translate what it sees into text. It can cross-reference the images it sees with other relevant information to provide a broader perspective into the footage, and can send an emergency alert if it spots anything that requires special attention.
“Soldiers who are monitoring borders or any other activity through the use of video images on screens can identify accurately what is happening on the ground most of the time,” Maj. Seffi Cohen, 33, who heads the operational data and research department of the IDF, told The Times of Israel.
But human monitors observing videos can take in only a limited amount of information at any given time, he said. Software, in contrast, can look at a massive number of images at once and cross-reference any of them with other information that could be relevant.
There is no certainty that the technology, which is still at the prototype stage, will eventually be turned into an army project to develop a product. “It is still too early to know,” Cohen said. “Meanwhile we are refining the technology even more.”
The software combines two types of artificial intelligence: a convolutional neural networks and a recurrent neural networks.
“We gave the [convolutional neural networks] videos and pictures and taught the system to correctly identify objects,” Cohen said. “Then we took these objects and taught the recurrent neural networks to read a series of objects and transform what it sees into words. Similar to the brain of a newborn, we fed the blank slate system with millions of video images. Then, like a brain, the software processes the information and produces an output.”
For example, in the case of images of people running with numbers pinned on their shirts, the text output identifies them as taking part in a race or marathon, Cohen said. “We also tried to trick it and gave it an image of a dog on a bike,” he said. “The software identified what it was accurately.”
There are still some flaws that are being ironed out, he said. “The information the technology was fed was based on civilian imagery and language. It was not taught with enough army examples, so sometimes it doesn’t get things right. We are working on that.” The system also can’t distinguish a dog from a wolf yet, for example.
Cohen’s team is made up of some 10 soldiers who come from the IDF’s elite Talpiot program, which trains cadets for key technology posts in various army units. To produce the technology, one soldier worked full-time and another worked part-time for a period of three months, he said.
“We go out into the field and see what the needs are. Then we make a list of possible projects and go ahead with developing technologies. Some projects are successful, others less so. We do high-risk developments,” he said.