Rainfall Prediction Model Using Machine Learning Techniques

Authors: Govind Wakure, Rahul Lotlikar, Pushant Mangilipelli, Saif Khan

DOI Link: https://doi.org/10.22214/ijraset.2022.42244

Abstract

Predicting rainfall is one of the most difficult aspects of weather forecasting. Accurate and timely rainfall forecasting can be extremely useful in preparing for ongoing building projects, transportation activities, agricultural jobs, aviation operations, and flood situations, among other things. By finding hidden patterns from available elements of past meteorological data, machine learning techniques may accurately predict rainfall. This study adds to the body of knowledge by giving a comprehensive examination and assessment of the most recent Machine Learning algorithms for rainfall prediction. This study looked at publications that were published between 2013 and 2017 and were found in reputable internet search libraries. This study will aid academics in analysing recent rainfall prediction work with a focus on data mining approaches, as well as providing a baseline for future directions and comparisons.

Introduction

I. INTRODUCTION

In today's society, global warming is hurting people all over the world, having a tremendous impact on humanity and hastening climate change. As a result of this, the air and oceans are warming, sea levels are rising, and flooding and drought are becoming more common. Rainfall is one of the most devastating repercussions of climate change. Rainfall forecast is a difficult task these days, and most of the major world authorities are taking it into consideration. Rainfall is a climatic phenomenon that has an impact on a variety of human activities, including agricultural production, construction, power generation, and tourism, to name a few. As a result, rainfall is a major concern, necessitating better rainfall forecasting. Rainfall is a complicated atmospheric phenomenon that has gotten more difficult to anticipate as a result of climate change. Rainfall series are frequently labelled by a stochastic process due to their arbitrary properties. Floods and droughts are becoming increasingly regular, as the Indian state of Uttarakhand had its greatest natural calamity in June 2013. In comparison to normal monsoon rainfall, there was around 400 percent more rainfall. Roads and bridges were entirely wrecked by such strong rains, trapping 100,000 pilgrims and tourists on their "Char Dham Yatra." This calamity could not have been predicted by the government, huge industries, risk management experts, or the scientific community prior to the incident. These factors may also contribute to a land slide, which is a significant geohazard that has resulted in the loss of lives and property all over the world.

Since the previous decade, scientists and engineers have successfully developed a number of models for making accurate predictions in a variety of fields. Machine learning is another discipline that is commonly utilised for making predictions and classifying objects. There are a variety of approaches available, ranging from KNN to more complicated methods like SVM and ANN (Artificial Neural Network). ANNs, as an alternative to standard methods for metrology predictions, are based on self-adaptive processes that learn from examples and capture functional links between data, even if the relationships between the data are unknown or difficult to express.

Deep learning has recently emerged as a viable approach in ANN for solving complex problems and coping with large amounts of data. Deep learning is essentially a succession of taught multilayer architecture. The weight and learning rate of the layers are the two most important modifications that have an impact on the model. Computer vision, image recognition, natural language processing, and bioinformatics have all benefited from the deep learning technique.

Rainfall prediction is still a major topic, attracting the attention of governments, businesses, risk management organisations, and scientists. Rainfall is a climatic phenomenon that has an impact on a variety of human activities, including agriculture, construction, power generation, forestry, and tourism, to name a few. Rainfall prediction is critical in this regard since it is the variable that has the strongest link to severe natural phenomena such as landslides, flooding, mass movements, and avalanches. These incidents have had a long-term impact on society. As a result, having a good rain fall prediction method allows you to take preventative and mitigation measures for these natural occurrences.

Rainfall forecasting is useful in avoiding floods, which saves lives and property. It also aids in the management of water resources. Rainfall data from the previous year assists farmers in better managing their crops, resulting in increased economic growth for the country. Rainfall prediction is difficult for meteorological scientists due to fluctuations in rainfall timing and quantity. Weather forecasting is the most important of all the services offered by the meteorological department for all countries throughout the world. The task is difficult since it necessitates a large number of skilled personnel, and all calls are made without guarantee.

These forecasts also make it easier to monitor agricultural activity, construction, tourism, transportation, and health, among other things. Providing accurate meteorological predictions to agencies in charge of disaster prevention can aid decision-making in the case of natural disasters. There are a variety of approaches for making these predictions, ranging from simple procedures to more complicated techniques like artificial intelligence (AI) and artificial neural networks (ANNs).

Statistical approaches and the Numerical Weather Prediction (NWP) model are two extensively used methods for rainfall forecasting. Rainfall data is non-linear in nature. The key characteristics of time series rainfall are frequency, intensity, and amount. These values can differ from one location on the earth to the next, as well as from one moment to the next. Every statistical model has its own set of flaws. The combination of AR and MA results in the ARMA model, which is a general and useful time series model. The ARMA model is only suitable for stationary time-series data for short-term rainfall predictions. Nonlinear patterns and irregular trends in time series are not detectable using statistical tools.

II. RELATED WORKS

Researchers have been enhancing and integrating data mining techniques to improve the accuracy of rainfall forecast. This section discusses a few of the studies that were chosen. In this paper, the author compares rainfall prediction using Support Vector Machine (SVM), Artificial Neural Networks (ANN), and Adaptive Neuro Fuzzy Inference System (ANFIS). The prediction models were compared in four ways by the authors: I using varied delays as modelling inputs; (ii) solely using heavy rainfall events as training data; (iii) forecasting performance for 1 hour to 6 hours; (iv) performance analysis in peak values and all values.

According to the findings, when trained with a dataset of severe rainfall, ANN performed better. For all three modelling strategies, the prior 2-hour input data was advised for 1 to 4 hour ahead forecasting (ANN, SVM and ANFIS).Using varying input lags, ANFIS demonstrated a greater capacity to avoid information noise.

Finally, SVM proven to be more resilient during peak values during extreme typhoon situations. In Malaysia, researchers compared various data mining algorithms for rainfall prediction, including Random Forest, Support Vector Machine, Naive Bayes, Neural Network, and Decision Tree.

The data for this experiment came from a number of weather stations in Selangor, Malaysia.To deal with the noise and missing values in the dataset, pre-processing activities were used before the classification procedure. The results showed that Random Forest performed well in the (IJACSA) International Journal of Advanced Computer Science and Applications, properly classifying a large number of examples with a modest amount of training data. In this paper, the author conducted a survey of different Neural Network architectures that have been utilised for rainfall prediction in the last 25 years.

The authors point out that most researchers achieved substantial results in rainfall prediction utilising the Propagation Network, and that forecasting approaches such as SVM, MLP, BPN, RBFN, and SOM are more appropriate than other statistical and numerical techniques. There have also been some drawbacks mentioned.

In Thailand, researchers [4] employed an Artificial Neural Network to predict rainfall. For prediction, they employed a Back Propagation Neural Network, which had an adequate accuracy. For the future, it was suggested that a few other features, such as Sea Surface Temperature for the areas surrounding Andhra Pradesh, be included in the input data for rainfall prediction.India's southernmost region.

Researchers used Back Propagation, Radial Basis Function, and Neural Network to estimate monthly rainfall. The dataset for prediction was gathered in the Nilgiri district's Coonoor region (Tamil Nadu).The Mean Square Error was used to assess performance.

Radial Basis Function Neural Networks had higher accuracy and less Mean Square Error, according to the findings. Furthermore, the researchers employed these algorithms to forecast future rainfall. Integrating Artificial Neural Networks and Genetic Algorithms, researchers presented a Hybrid Intelligent System. MLP is used as the Data Mining engine in ANN to create predictions, while the Genetic Algorithm was used for the inputs, the connection structure between the inputs, the output layers, and to make the Neural Network training more effective.

III. PROPOSED WORK

A. Methodology

Predicting heavy rainfall is a huge challenge for meteorologists since it is so strongly linked to the economy and human existence. It is the cause of annual natural disasters such as floods and droughts that affect people all over the world. For countries like India, where agriculture is the primary source of income, rainfall forecasting accuracy is critical. Statistical strategies for rainfall forecasting are ineffective due to the dynamic character of the atmosphere. Artificial Neural Network is a better technique due to the nonlinearity of rainfall data. In a tabular format, researchers' work and comparisons of different methodologies and algorithms for rainfall prediction are presented. The goal of this work is to provide non-experts with easy access to rainfall prediction methodologies and approaches. The general architecture of our suggested model is described in this section. We use a deep learning architecture to estimate the cumulative rainfall for the next day, as indicated throughout the paper. Two networks make up the architecture: an autoencoder network and a multilayer perceptron network. The autoencoder network is in charge of feature selection, and as previously said, autoencoder is a deep learning technique that promises to treat time series features. The classification and prediction tasks are handled by a multilayer perceptron network. Following that, we'll go through each network in detail. The autoencoder is the first component in our architecture. An autoencoder is an unsupervised network with the goal of extracting non-linear characteristics from a data input. An autoencoder, to be more specific, is made up of three layers: the input layer, a hidden layer that uses the sigmoid activation function, and the output layer.

Autoencoders are trained differently than standard neural networks in that the output layer tries to be as similar to the input layer as feasible. Because of the sigmoid activation function, the hidden layer produces a non-linear compact representation of the input layer.

The logic behind this treatment is that data will be more compact (i.e., less prone to overfifitting) and that some intriguing non-linear correlations will be uncovered, perhaps improving the explanation of the output variable. The sort of autoencoder we used in our architecture was a denoising autoencoder from Theano, a Python GPU-based toolkit for mathematical optimization. A Multilayer perceptron is directly connected to the autoencoder's hidden layer, which is a non-linear compact representation of the original input. By using the new problem representation as an input, this network is in charge of producing predictions in our problem. The sigmoid activation function is used by the MLP, which has one hidden layer.

We propose a machine learning-based solution to overcome the existing system's flaws and improve efficiency and accuracy. We use Hadoop to store and retrieve data from a distributed file system in order to handle large amounts of data (hdfs). Data can be loaded into a Hadoop cluster by the user. The Random Forest algorithm is also utilised, which is a classifier that uses a number of decision trees on different subsets of a dataset and averages their results to increase the dataset's predictive accuracy. For analysing and forecasting time series data, the ARIMA model is also utilised.The prediction would be displayed on a website from which we would be able to select the data to be predicted and weight that results in a low cost, in order to reduce cost C(w,b) as a function of bias and weight to a smaller degree. The optimizer can now match the global faster thanks to an artificial learning rate reduction technique .

B. System Architecture

An autoencoder network and a multilayer perceptron network form the foundation of the architecture. The autoencoder network is in charge of feature selection, and as previously said, the autoencoder is a deep learning technique that promises to treat time series features. The task of classifying and predicting is carried out by a multilayer perceptron network. We'll go through each network in more detail after that. The autoencoder is the most basic component of our system. An autoencoder is an unsupervised network that attempts to extract non-linear characteristics from data. An autoencoder has three layers: an input layer, a hidden layer that uses the sigmoid activation function, and an output layer.

Auto encoders are trained differently from traditional neural networks in that the output layer tries to match the input layer as closely as possible. As a result of the sigmoid activation function, the hidden layer produces a non-linear compact representation of the input layer. The rationale for this modification is that the data will be more compact (i.e., less prone to over fitting) and that some intriguing non-linear correlations will be uncovered, which will improve the explanation of the output variable. A denoising auto encoder supplied by Theano, a Python GPU-based framework for mathematical optimization, was used in our architecture.

A Multilayer perceptron is directly connected to the autoencoder's hidden layer, which is a non-linear compact representation of the original input. By using the new problem representation as an input, this network is in charge of producing predictions in our problem. The sigmoid activation function is used by the MLP, which has one hidden layer.

C. Process Flow

IV. EXPERIMENTAL ANALYSIS

A. Model Implementation

Data Collection and Pre-processing: The rainfall data for the previous three or four years is collected in a comma separated values (CSV) file. The month-by-month aggregate is included in the dataset. There may be empty values, negative values, or errors in the dataset. During pre-processing, the dataset is cleansed. The pre-processing procedures entail the removal of incomplete records. Once the clean dataset has been obtained, it must be prepared for use by the machine learning algorithm.
Random Forest Model Generation: Random forests, also known as random decision forests, are an ensemble learning method for classification, regression, and other tasks that works by training a large number of decision trees and then outputting the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees. To generate a more precise and reliable prediction, Random Forest creates many decision trees and blends them together. Random forest has the advantage of being able to solve classification and regression issues, which make up the majority of contemporary machine learning systems. We use a dataset to train our system and construct a model for future prediction
Prediction, Result Presentation: Random forest has the advantage of being able to be utilised for both regression and classification problems, as well as displaying the relative priority it gives to the input characteristics. Because its default hyper settings frequently yield a decent prediction result, Random Forest is also regarded as a very useful and simple to use method. The amount of hyper parameters is likewise not excessive, and they are simple to comprehend. To forecast rainfall for a specific month, a Random forest trained model is utilised. The forecast period is a couple of months. To create a graphical representation of data in a visual format, the Python matplotlib module can be used.Along with the current history data, the anticipated data is presented in the graph..

Conclusion

Rainfall has a significant impact on agriculture and the economy in India, as well as the rest of the world. In this research, we offer a method for predicting rainfall based on an analysis of a rainfall dataset produced using fuzzy logic. So that we can forecast rain in the coming year based on climate conditions, which is extremely beneficial to farmers for agricultural purposes. Only a rain prediction, but not an accurate result, is analysed due to climatic considerations. As we all know, climate factors alter for a variety of reasons, and we\'ve utilised a few here to show how other things can influence the rain. Rainfall forecasting is a useful yet difficult endeavour. By extracting and utilising the hidden knowledge from prior meteorological data, data mining algorithms can predict rainfall. Many scholars have tried over the last decade to improve rainfall prediction accuracy by refining and integrating data mining approaches. Various models and methodologies for effective rainfall prediction are now available, but there is still a need for a comprehensive literature review and systematic mapping study that can reflect current difficulties, proposed solutions, and current trends in this domain. By focusing on data mining approaches, this study presented a comprehensive systematic mapping as well as a critical assessment of recent research in the area of rainfall prediction from 2013 to 2017. A list of important research topics was created in this study, and then a systematic research approach was used to select and shortlist the most relevant research articles from renowned digital search libraries. Critical reviews of the shortlisted papers were used to investigate the answers to the identified questions. Since the last decade, the study focus on the domain of rainfall prediction has increased, as have the issue areas. As a result, it was determined that data mining method upgrades, optimizations, and integrations are required to investigate and fix these issues.

References

[1] S. Zhang, L. Lu, J. Yu, and H. Zhou, “Short-term water level prediction using different artificial intelligent models,” in 2016 5th International Conference on Agro-Geoinformatics, Agro-Geoinformatics 2016, 2016. [2] S. Zainudin, D. S. Jasim, and A. A. Bakar, “Comparative Analysis of Data Mining Techniques for Malaysian Rainfall Prediction,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 6, no. 6, pp. 1148–1153, 2016. [3] D. Nayak, A. Mahapatra, and P. Mishra, “A Survey on Rainfall Prediction using Artificial Neural Network,” Int. J. Comput. …, vol. 72, no. 16, pp. 32–40, 2013. [4] B. K. Rani and A. Govardhan, “RAINFALL PREDICTION U SING DATA MINING TECHNIQUES - A SURVEY,” pp. 23–30, 2013. [5] N. Tyagi and A. Kumar, \"Comparative analysis of backpropagation and RBF neural network on monthly rainfall prediction,\" Proc. Int. Conf. Inven. Comput. Technol. ICICT 2016, vol. 1, 2017 [6] N. Solanki and G. P. B, “A Novel Machine Learning Based Approach for Rainfall Prediction,” Inf. Commun. Technol. Intell. Syst. (ICTIS 2017) - Vol. 1, vol. 83, no. Ictis 2017, 2018. [7] M. Ahmad, S. Aftab, and I. Ali, \"Sentiment Analysis of Tweets using SVM,\" Int. J. Comput. Appl., vol. 177, no. 5, pp. 25-29, 2017 [8] C. S. Thirumalai, “Heuristic Prediction of Rainfall Using Machine Learning Techniques,” no. May, 2017. [9] N. Mishra, H. K. Soni, S. Sharma, and A. K. Upadhyay, “Development and Analysis of Artificial Neural Network Models for Rainfall Prediction by Using Time-Series Data,” Int. J. Intell. Syst. Appl., vol. 10, no. 1, pp. 16–23, 2018. [10] H. Vathsala and S. G. Koolagudi, “Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches,” Comput. Geosci., vol. 98, pp. 55–63, 2017. [11] R. Venkata Ramana, B. Krishna, S. R. Kumar, and N. G. Pandey, “Monthly Rainfall Prediction Using Wavelet Neural Network Analysis,” Water Resour. Manag., vol. 27, no. 10, pp. 3697–3711, 2013. [12] M. P. Darji, V. K. Dabhi, and H. B. Prajapati, “Rainfall forecasting using neural network: A survey,” 2015 Int. Conf. Adv. Comput. Eng. Appl., no. March, pp. 706–713, 2015. [13] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil, “Lessons from applying the systematic literature review process within the software engineering domain,” J. Syst. Softw., vol. 80, no. 4, pp. 571–583, 2007. [14] B. a. Kitchenham et al., “Preliminary guidelines for empirical research in software engineering,” IEEE Trans. Softw. Eng., vol. 28, no. 8, pp. 721–734, 2002. [15] B. Kitchenham and S. Charters, \"Guidelines for performing Systematic Literature reviews in Software Engineering Version 2.3,\" Engineering, vol. 45, no. 4ve, p. 1051, 2007. [16] K. Petersen, R. Feldt, S. Mujtaba, and M. Mattsson, “Systematic Mapping Studies in Software Engineering,” 12th Int. Conf. Eval. Assess. Softw. Eng., pp. 1–10, 2008. [17] B. Kitchenham, O. P. Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, “Systematic literature reviews in software engineering – A systematic literature review,” Inf. Softw. Technol., vol. 51, pp. 7–15, 2008. [18] M. Ahmad, S. Aftab, M. S. Bashir, and N. Hameed, “Sentiment Analysis using SVM?: A Systematic Literature Review,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 2, pp. 182–188, 2018. [19] F. Selleri Silva et al., “Using CMMI together with agile software development: A systematic review,” Inf. Softw. Technol., vol. 58, pp. 20–43, 2015. [20] [20] F. Anwer and S. Aftab, “Latest Customizations of XP?: A Systematic Literature Review,” Int. J. Mod. Educ. Comput. Sci., vol. 9, no. 12, pp

Copyright

Copyright © 2022 Govind Wakure, Rahul Lotlikar, Pushant Mangilipelli, Saif Khan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET42244

Publish Date : 2022-05-05

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here