In today\'s insecure world, video surveillance systems play a significant role in keeping both indoors and outdoors secure. Real-time applications can utilize video surveillance components, such as behavior recognition, understanding and classifying activities as normal or suspicious. People are at risk from suspicious activities when it comes to the potential danger they pose. Detecting criminal activities in urban and suburban areas is necessary to minimize such incidents as criminal activity increases. The early days of surveillance were carried out manually by humans and involved a lot of fatigue, since suspicious activities were rare compared to everyday activities. Various surveillance approaches were introduced with the advent of intelligent surveillance systems. This paper analyzes two cases that could pose a threat to human lives if ignored, namely the detection of gun-related crimes, the detection of abandoned luggage, the detection of human violence, the detection of lock hammering, the theft of wallets, and the tempering of ATMs on surveillance video frames. In these papers they have used a neural network model that is Faster R-CNN and YOLOv3 technique to detect these activities.
Video surveillance systems are the only way to detect crimes such as stealing bags, abandoning bags on stations, stabbing with knives, and using guns, which are on the rise every day. However, video surveillance systems have the disadvantage of requiring continuous human attention, reducing their efficiency. Video surveillance has been automated to solve this problem. It is impossible to manually monitor all events on CCTV cameras today. A manual search in the recorded video would waste a lot of time, even if the event had already occurred. Automated video surveillance systems are investigating abnormal events from video footage. Video surveillance can be automated to solve this problem. Automated systems give indications in the form of alarms or other forms when predefined abnormal activities occur. As stated in the papers, they used a semantic based approach which involves defining suspicious activities, background subtraction, object detection, tracking & classification of suspicious activities within the framework of a system.
II. RELATED WORK
Any suspicious movement can be detected through video surveillance, which acquires and processes the data. Many research studies have been conducted on the detection of anomalies in video data. Most researchers deal with the problem of abandoned bag detection. A framework for detecting abandoned objects in a scene with multiple interacting objects was described by James David Hogg et al. The datasets they use are the standard ones. Using Gaussian Mixture Models (GMM), the object (bag) is detected by the dual background approach. A modified multi hypothesis tracker is used for tracking extended objects. Based on the relationship between bags and people, a situation analysis is conducted. An approach based on logic is then used to assess threats.
According to Fuentes & Velastin, a video surveillance algorithm is based on trajectories for detecting events.The position, trajectory, and split/merge events can be used to describe any event. Tracking is then done using the matching matrices. Through a single camera, Kim et al detect and track multiple moving objects. To extract moving regions, they use RGB color background modeling. Moving objects are grouped using the blob labeling. The foreground image is typically obtained by background subtraction in anomaly detection. The background subtraction technique is employed in our system because it does not require any prior training.
In order to detect objects, most researchers use a machine learning approach. For training, a standard reliable dataset is required, which is difficult to obtain. The machine learning approach becomes less reliable as a result. The hierarchical semantic approach is used in our system. There will be a focus on areas such as early detection and recognition of activities. A method for predicting human activity is presented in the research paper. Their primary concern is recognizing events early (for instance, a man picking up a gun with his hand) . A probabilistic activity prediction problem is formulated, and new methodologies are introduced to solve it. Spatio-temporal features are analyzed using an integral histogram. As a result of considering the sequential nature of human activities and handling noisy data, they named their new recognition methodology dynamic bag-of-words.
A. Input Data
The input for the system is a video stream. As the system is to be implemented to detect the suspicious activity its input is to be taken from the CCTV. But for the project/demo we use the standard datasets. These input images are not in proper form so the different image preprocessing techniques are used to enhance the quality of the image.
B. Background Image Acquisition
The illumination effect can be corrected by the background image. A reference image/ standard background is taken as reference for the further image processing. The background image is dynamically updated so that any new object entered in the scene can be captured.
C. Image Preprocessing
The different image preprocessing techniques are used to improve the image so that the unwanted distortions are suppressed or some required features enhanced. The block diagram of image processing is described below i.e Fig 1.
The changing light conditions, movement of reference background cause some noise introduced in the image. We use the thresholding technique to remove the noise. Then the image undergoes the morphological operations. The Morphologically open operation is used to shrink the area which is distorted by the noise. The opening operation is defined as the opening of an image A by a structuring element B.
It is basically an erosion followed by a dilation.
It is given as
AοB=(A? B) ⊕ B ....
where, ο - indicates opening operation,
? - is the erosion of image A by structuring element B,
⊕ - indicates dilation
The open operation causes the holes to create in the image. These holes are covered by allowing the image to pass through the morphological close operation. The closing operation is defined as the closing of an image A by a structuring element B.
Basically it is a dilation followed by erosion.
It is given as
A•B=(A ⊕ B) ? B ....
where • - is closing operation,
⊕ - dilation of image A by structuring element B,
? – indicates erosion
D. Object Detection
The foreground image is obtained by the subtraction of the input image from the background image. From this foreground image the required object is detected.
E. Object Tracking
The detected object is tracked in the scene so that we can determine if any new object is entered in the scene or if any object left the scene i.e. the person walked off the scene. The detected object (human being or bag) is tracked using correlation tracking algorithm.
Detecting suspicious activity in video data is a challenging task. There are various difficulties, such as the complexity of the scene, the illumination of the light, the angle of the camera, and so on. In addition, suspect activities are determined by the scene and place in which they occur. Illustration of weapon detection and abandoned bag is described below i.e Image 5.1 and Image 5.2
for eg. A bag left in a classroom for more than half an hour is normal, but a bag left at a railway station for half an hour is suspicious. The standard and challenging data sets are not easily available for testing, which is another problem.To test the proposed framework, we used standard public data sets.
In video surveillance models, computation time is a crucial factor. To calculate computation time, we divide the time that has elapsed since inference for the entire video by the number of frames rendered from the dataset.
The advantage of paper using a semantic based approach is unavailability of standard datasets, generalizing ability of the classifier can be overcome but the disadvantage is accuracy of the object detection is just 57%.
The advantage of paper using the Faster R-CNN is it can specifically detect gun based crime and abandoned luggage but disadvantage is detecting abandoned luggage does not address issues like identification of objects in sudden changes of illumination.
The advantage of paper using YOLOv3 technique is it outperforms Faster R-CNN and accuracy achieved is around 95% but due to small amount of training dataset there were still some mismatch in comparison between the test results and the ground truth.
As per the proposed techniques to analyze surveillance footage considering two specific cases which are detecting potential gun-based crime and detecting abandoned luggage. To detect guns in surveillance footage we present a deep neural network that can identify guns in images. This is particularly important given the mitigation of risk to human lives if the models were integrated into existing surveillance systems. Also generated an algorithm to detect ATM Loitering, Cheating in exams, and wall climbing. The method we propose for detecting abandoned baggage is computationally efficient and our findings indicate that, while achieving an extremely low false alarm rate, we detected most of the abandoned items effectively. Will manage to solve shortcomings like having a long-standing individual identified as a left-behind object by adding one extra step to validate our results from the stationary Object Detector. Because of the considerably smaller computational time for each frame, this technique can be used for any implementation in real time.
 U. M. Kamthe and C. G. Patil, \"Suspicious Activity Recognition in Video Surveillance System,\" 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018, pp. 1-6, doi: 10.1109/ICCUBEA.2018.8697408.
 S. Loganathan, G. Kariyawasam and P. Sumathipala, \"Suspicious Activity Detection in Surveillance Footage,\" 2019 International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2019, pp. 1-4, doi: 10.1109/ICECTA48151.2019.8959600.
 N. Bordoloi, A. K. Talukdar and K. K. Sarma, \"Suspicious Activity Detection from Videos using YOLOv3,\" 2020 IEEE 17th India Council International Conference (INDICON), 2020, pp. 1-5, doi: 10.1109/INDICON49873.2020.9342230.