Three researchers, Amarjot Singh (University of Cambridge), Devendra Patil (NIT Warangal India), and SN Omkar (IISc Bangalore) are working on the use of a drone and artificial intelligence to spot fighting people in a crowd.

Their paper "Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network" is on arXiv. A video shows how their system works.

DroneDJ summed up their approach, saying that they use an "off-the-shelf consumer  load it with AI and have it monitor a crowded area such as a sports stadium or a protest and look for acts of violence such as punching, kicking, strangling, shooting or stabbing."

Why bother? Are not standard CCTV cameras adequate? Standard CCTV cameras do not do the best job in monitoring violent criminals in large public areas. Enter drones.

The paper will appear in a workshop at IEEE Computer Vision and Pattern Recognition (CVPR) 2018 this month. The system detects violent individuals in real-time by processing the drone images in the cloud.

They addressed five violent types of acts in their paper: punching, kicking, strangling, shooting or stabbing.

Their research introduced what they refer to as "the aerial violent individual dataset used for training the deep network." Hopefully it might encourage other researchers interested in using deep learning for aerial surveillance, they said.

James Vincent in The Verge explained that an algorithm trained using deep learning estimates the poses of humans in the video and matches them to postures the researchers have designated as violent. The video noted that violent people are marked with bounding boxes.

How effective is their system? The level of accuracy goes down when more people enter the scene. James Vincent: "However, the research needs to be taken with a pinch of salt, particularly with regard to its claims of accuracy. Singh and his colleagues report that their system was 94 percent accurate at identifying 'violent' poses, but they note that the more people that appear in frame, the lower this figure. (It fell to 79 percent accuracy when looking at 10 individuals.)"

The Illustration shows the skeleton corresponding to the humans in an image. The angles (shown in green for few limbs) between the various limbs in this structure are used by the SVM to recognize the humans engaged in violent activities. Credit: arXiv:1806.00746 [cs.CV]

Their work reflects a research interest in exploring ways to use machine learning to analyze live video footage. They plan to test it during two upcoming festivals in India, said DroneDJ.

The paper also introduced the Aerial Violent Individual (AVI) Dataset which can benefit other researcher aiming to use  for aerial surveillance applications.

In the bigger picture, it is obvious by now that the word "surveillance" in and of itself is a loaded term, and one thinks of repressive governments eager to silence protestors by putting them under lock and key for flimsy reasons. On the other hand, societies are coping with vandals, hate groups and kidnappings.

"Anything can be used for good. Anything can be used for bad," said Singh, lead researcher, in The Verge.