Authors: Mubasher Rashid Dand, Er. Sahilpreet Singh
Certificate: View Certificate
The goal of this project is to create a deep learning-based visual crowd counting system. The objective of this project is to build a functioning system that can analyze pictures and determine the number of individuals present within these images. It will also demonstrate its approximate density map, a graph comparing the expected count to the actual count, and information on its accuracy in terms of Mean Absolute Error (MAE). To routinely supervise audiences, researchers recently moved to computer vision. This research analysis proposed the implementation of a Deep learning algorithm CNN for which the aim was to detect the crowd and estimate increased influx of people which has been successfully achieved by employing deep learning technique. We have used the shanghai tech dataset part B for our research purpose. CNN model detected the crowd and estimated the density with an absolute error of 21. In 312 we obtained a validation mean absolute error of 21.3, which means on average, the model will estimate 21 persons in excess or deficit.
The human population has been expanding at an astronomical rate in recent years, which has indirectly increased the prevalence of crowding. The purpose of gathering has a significant impact on a wide variety of assets and crowded behaviors. The examination of crowd movements and behaviors has aroused a lot of scholarly interest in public service, security, and safety, as well as computer vision. Because human faces vary in color, stance, emotion, location, orientation, and illumination, the task of distinguishing a face in a crowd is difficult. A jam-packed situation causes enormous hordes of bewilderment, finishing in pushing, mass frenzy, rushes, or crowd pulverizes, and a deficiency of control. Weighty downpours killed 22 individuals and harmed hundreds more in the early evening between Mumbai, Parel, and Elphinstone Road in 2017, while 27 walkers were killed in Andhra Pradesh state of India in 2015 as displayed in Fig. 1, 32 individuals kicked the bucket in a rush on the banks of the Godavari waterway in 2015. what's more, 26 others were harmed during the Stampede occurred on the event of Diwali at Gandhi Maiden 2014.
To keep away from these misfortunes, the programmed location of dire and startling conditions in enormous groups is fundamental. Accordingly, it will certainly support the execution of crisis gauges just as suitable security and wellbeing measures. Group location is quite possibly the most troublesome errand in visual reconnaissance frameworks. This innovation might be utilized to identify and tally individuals, just as screen swarm levels and convey alerts when there is an enormous group.
The goal of a crowded headcount is to determine the number of persons present in congested areas. There are a few uses for crowd detection, such as
A crowd is formed when a massive amount of people meets together and they principally agree on a common goal. This gathering may be noisy, laid-back, cheerful, and, interestingly, may initiate with unbelievable displays of negativity. The crowd is mainly referred to as the average number of people present in a particular place. A place is said to be crowded if the population of an area becomes much more than the capacity of that place. Thus, a crowd potentially result in a broad range of incidents. Excessive overpopulation sometimes results in individuals losing control & tear down the surroundings. People typically enjoy the benefits of this type of gathering to engage in brutal behavior like insulting ladies. Therefore, it is crucial to determine the number of people in a gathering in order to assure everyone's safe. To assess any crowd, it is necessary to estimate the density of the crowd. If the safe limit of the crowd is crossed, the warning signs can be easily given to avoid certain mishaps. This can lead to maintain the infrastructure and management of the area. The counting of people can help us to classify the area as crowded or non-crowded and if the place is crowded, the place can be monitored.
There is numerous population identification comment section that relate with the population detecting and analysis process, which include:
a. Man, identification as well as tracking
b. Object detection and analysis
Among the above two sections, the detection of humans is a difficult process as it is influenced by various possible appearances due to expressed pose, outfits, illumination, and the surroundings but these limitations can improve the performance of detection task if we have its prior knowledge.
Human detection can be used in a variety of situations, like: -
Crowd control is assessed by intellectuals, theorists, as well as surveyors, however information technology engineers are mostly interested in detection, smart environments, as well as abstract conditions. The present method of keeping an eye on mobs involves security systems that are individually operated by very far human employees. Since there are often more cops watching visual information than there are clips, of that kind monitoring models are important worthless for detecting and preventing in actual environments.
A. Approaches To Crowd Detection
As illustrated in Fig. 2, the Crowd Detection System includes Embedding Information, Methods, Aspects, as well as Outcome. There are three approaches for crowd detection. These approaches are detection based, regression based and density based.
B. Approaches On The Basis Of Detection
The recognition model attempts to sort out the number of people there are by recognizing a solitary individual and their areas at the same time. Jones and Snow et al. portrayed a spatiotemporal data based filtering window walker identifier. .
Supreme contrast, Haar- like channels. To catch moving items, three sorts of channels are utilized: the Haar channel, the moved distinction channel, and the moved contrast channel. The Adaboost learning calculation was utilized to prepare eight distinct passerby finders for eight unique movements. Moreover, utilizing both development and appearance information, this procedure is utilized to create and endeavor to mastermind the moving person. In a hierarchical division situation, Leibe et al.  recommended a strategy for recognizing walkers in clogged regions that utilizes a calculation that joins nearby and worldwide information. Their trials showed that they are dependant on the framework, and people on foot can be distinguished in any event, when there is a ton of cross-over. Lin et al.  proposed a recognition approach for swarmed assessments dependent on wavelet layouts and vision-based advances. The Haar Wavelet Transform (HWT) was used to determine the component's determination of head shape. Any vector support devices (SVM) was used to arrange a featured district as having or not having a head. In complex situations where the head was not apparent, this strategy was restricted, and it end up being a computationally requesting arrangement continuously applications . For distinguishing and searching for swarmed individuals, Zhao and Nevatia  proposed a 3D human shape model. The solution they offer is based on separating the foreground blobs to determine the top of the head. For the identification and tracking of people, a Posteriori problem was developed. There is an occlusion problem in the Using the Markov-Chain Monte Carlo method that prevents the combined likelihood of different people (MCMC).
C. Items And Administrations Related Upon Regression
Local picture patches are utilized in the relapse based way to deal with get include planning for tallying purposes. A portion of the qualities used to encode low-level data are forefront, highlights, edge components, surface, and inclination highlights. Nearby Binary Pattern (LBP), Practices In order Clustering (HOG), and Gray Level Co-event Matrices (GLCM) are instances of approaches to further develop results by catching neighborhood and worldwide scene highlights. Subsequent to removing nearby and worldwide information, distinctive relapse calculations, like straight relapse, edge relapse, and neural networks , are utilized to figure out how to plan for swarm tallying purposes. Idrees et al.  found that nobody element or identification technique is adequately dependable to precisely evaluate the presence of the great thickness issue, hence they offered Fourier examination, head recognition, SIFT interest point, or alternate approaches to separate elements.
D. Arrangements Focused Solely Upon Intensity
A thickness-based procedure is utilized to attempt a straight planning between nearby way includes and related thing thickness maps. It ought to be referenced, nonetheless, that dominating direct planning is troublesome. Pham et al.  proposed learning a non-direct planning between neighborhood fix elements and thickness maps. Backwoods of Chance Voting for thickness of a few objective options is finished utilizing relapse from various picture patches
II. RESEARCH OBJECTIVES
The proposed work is to achieve a set of research objectives which are listed below:
A. Research Methodology
To solve any problem, a systematic approach must be followed efficiently to reach the desired solution. The research methodology that might be executed in this work is presented in Figure 4.
To understand the extent of work done in this field, a comprehensive literature survey has been done for better understanding the concepts related to the problem. A detailed review of literature about the research problem has been done by going through various books, papers, journals etc. and framework or model will be proposed.
The next phase would be the implementation of these models on hardware or software-based system and then experiments might be executed to calculate the efficiency of the model. The hardware and the software required would be identified for effective implementation. Then the results obtained will be compared and analyzed for validation.
This chapter starts with a brief introduction to crowd detection. An overview of the research gaps and challenges is presented followed by the problem formulation. The research objectives are also mentioned along with the research methodology to carry out the research work. The diagrammatic flow of the research work is also presented in the chapter. The next chapter (chapter 4) presents the proposed models for human tracking and crowding recognition.
IV. RESULTS AND DISCUSSION
The dataset that we use for training the model is Shanghai Tech Part B. We have implemented our model in python using the Pytorch Lightening library. We will discuss each of our code blocks here. We will begin by making all the necessary imports.
A. Simulation Step
???????B. Data Visualization
After making the necessary imports, we define a function to display our images. Once, we have defined this function, we call it for some of the images in the dataset.
im = cv2.imread('../input/shanghaitech/ShanghaiTech/part_B/train_data/images/IMG_1.jpg', cv2.IMREAD_COLOR)
After making the necessary imports, we define a function to display our images.
V. DATA PRE-PROCESSING
Once we have split the data into training and testing sets, we proceed to perform some data augmentation. Data augmentation routines are found in the pytorch lighting. The most important augmentation routines that we will use are Horizontal flip and Random Brightness contrast. Once these routines are called, we will define a class to implement them. The next step after data augmentation is to apply Gaussian filters to head annotations for density estimation. All of this is implemented in the following lines of code.
A. This research analysis proposed the implementation of a Deep learning algorithm CNN for which the aim was to detect the crowd and estimate increased influx of people which has been successfully achieved by employing deep learning technique. B. For this study, a dataset was downloaded from the GITHUB website, which has a variety of datasets. This website also has the ShanghaiTech dataset. C. This dataset from ShanghaiTech has 1198 photos with 330,165 labeled heads. There are two aspects to this: part A and part B. D. Part A is made up of 482 photographs chosen at random from the internet, while Part B is made up of images taken from surveillance on Shanghai\'s streets and consists of 716 images. E. We have used the shanghai tech dataset part B for our research purpose. F. The result acquired in this technical report is promising. CNN model detected the crowd and estimated the density with an absolute error of 21.
 S. Haque, M. S. Sadi, M. E. H. Rafi, M. M. Islam, and M. K. Hasan, “Real-Time Crowd Detection to Prevent Stampede,” 2020, vol. 56, pp. 665–678.  D. B. Sam, S. V. Peri, N. S. Mukuntha, A. Kamath, and R. V. Babu, “Locate, Size and Count: Accurately resolving people in dense crowds via detection,” arXiv, pp. 1–12, 2019, vol.3,DOI: 10.1109/tpami.2020.2974830.  A. Ghotkar, M. D. Chaudhari, and A. S. Ghotkar, “A Study on Crowd Detection and Density Analysis for Safety Control Indian sign language interpretation View project Hand Gesture Recognition for HCI View project A Study on Crowd Detection and Density Analysis for Safety Control,” Artic. Int. J. Comput. Sci. Eng., 2018, vol.6,doi: 10.26438/ijcse/v6i4.424428.  Z. Huo, B. Lu, A. Mi, F. Luo, and Y. Qiao, “Learning Multi-level Features to Improve Crowd Counting,” IEEE Access, pp. 211391–211400, 2020, vol.8,DOI: 10.1109/ACCESS.2020.3039998.  M. B. Shami, S. Maqbool, H. Sajid, Y. Ayaz, and S. C. S. Cheung, “People Counting in Dense Crowd Images Using Sparse Head Detections,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 9, pp. 2627–2636, 2019, DOI: 10.1109/TCSVT.2018.2803115.  M. Liu, Z. Guo, Z. Wang, N. Y. City, and the U. States, “CROWD COUNTING WITH FULLY CONVOLUTIONAL NEURAL NETWORK JD Finance, Beijing , China . School of Geosciences and Info-physics , Central South University , Changsha , 410083 , China . Department of Gastroenterology , Beijing ChaoYang Hospital , Beijing , C,” pp. 953–957, 2018. https://ieeexplore.ieee.org/document/8451787/citations#citations.  Chen, K.; Loy, C.C.; Gong, S.; Xiang, T. Feature mining for localized crowd counting. In Proceedings of the British Machine Vision Conference; BMVA Press: Surrey, UK, 2012, Volume 1, p. 3. [CrossRef]  Idrees, H.; Saleemi, I.; Seibert, C.; Shah, M. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23 Jun 2013; pp. 2547–2554. 43. Vu, T.H.; Osokin, A.; Laptev, I. Context-aware cnns for person head detection. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7 December 2015; pp. 2893–2901.  Lempitsky,V.;Zisserman,A.Learningtocountobjectsinimages. In Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6 December 2010; Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2010; pp. 1324–1332. 47. Loy,C.C.;Chen,K.;Gong,S.;Xiang,T.Crowd counting and profiling: Methodology and evaluation. InModeling, Simulation and Visual Analysis of Crowds; Springer: Berlin/Heidelberg,Germany,2013;pp.347–382. 48. Teo, C.H.; Vishwanthan, S.; Smola, A.J.; Le, Q.V. Bundle methods for regularized risk minimization. J. Mach. Learn. Res. 2010, 11, 311–365.  S Pham, V.Q.; Kozakaya, T.; Yamaguchi, O.; Okada, R. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 3–17 December 2015; pp. 3253–3261.  Dollar, P.; Wojek, C.; Schiele, B.; Perona, P. Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 743–761. J. Imaging 2020.  Li, M.; Zhang, Z.; Huang, K.; Tan, T. Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection. In Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), Tampa, FL, USA, 8 December 2008; pp. 1–4.  Brox, T.; Bruhn, A.; Papenberg, N.; Weickert, J. High accuracy optical ?ow estimation based on a theory for warping. In EuropeanConferenceonComputerVision; Springer: Berlin/Heidelberg, Germany, 2004; pp. 25–36.  Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA, June 20 2005; Volume 1, pp. 886–893.  Wu, B.; Nevatia, R. Detection of multiple, partially occluded humans in a single image by bayesian combination of edge let part detectors. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17 October 2005; Volume 1, pp. 90–97.  Ali,S.;Shah, M.Alagrangian particle dynamics approach for crowd ?ow segmentation and stability analysis. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 22 June 2007; pp. 1–6.  Sabzmeydani, P.; Mori, G. Detecting pedestrians by learning shapelet features. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’07), Minneapolis, MN, USA, 17 June 2007; pp. 1–8.  Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27.  Gall, J.; Yao, A.; Razavi, N.; Van Gool, L.; Lempitsky, V. Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2188–2202.  Lempitsky, V.;Zisserman, A .Learning to count objects in images. In Advances in Neural Information Processing Systems, Proceedings of the Neural Information Processing Systems 2010, Vancouver, BC, Canada, 6 December 2010; Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2010; pp. 1324–1332.  Sirmacek, B.; Reinartz, P. Automatic crowd density and motion analysis in airborne image sequences based on a probabilistic framework. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–11 November 2011; pp. 898–905.  Chen, K.; Loy, C.C.; Gong, S.; Xiang, T. Feature mining for localised crowd counting. In Proceedings of the British Machine Vision Conference; BMVA Press: Surrey, UK, 2012, Volume 1, p. 3.  Idrees, H.; Saleemi, I.; Seibert, C.; Shah, M. Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23 Jun 2013; pp. 2547–2554.
Copyright © 2023 Mubasher Rashid Dand, Er. Sahilpreet Singh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.