Access to clean and potable water is essential for human health and environmental sustainability. In this study, we present a machine learning-based water quality analysis system designed to determine the potability of water using sensor data and predictive modelling. The system integrates an Arduino Uno microcontroller, interfaced with pH and turbidity sensors, to collect real-time readings from various water samples. This data is then processed and fed into a Random Forest machine learning algorithm, which evaluates multiple parameters—pH level, turbidity concentration, temperature, and algae formation risk—to classify whether the water is safe or unsafe for consumption. The Random Forest model is trained on a dataset of water quality parameters labelled as potable or non-potable, allowing it to make reliable predictions based on incoming sensor readings. By leveraging ensemble learning techniques, the system enhances prediction accuracy, ensuring robust assessment under varying environmental conditions. The real-time nature of data acquisition and analysis makes this system highly effective for continuous monitoring of water quality, providing an automated, scalable solution for water safety evaluation. The findings of this research demonstrate the efficacy of machine learning in environmental monitoring, enabling proactive measures in ensuring safe drinking water. The proposed system can be integrated into industrial water treatment plants, municipal facilities, and remote communities, offering a cost-effective, data-driven approach to water safety management. Future advancements may incorporate cloud-based monitoring, IoT connectivity, and advanced filtration recommendations, further improving water quality assessment frameworks.
Introduction
Overview:
Access to clean drinking water is vital, but many regions face contamination challenges that traditional testing methods—lab-based, slow, and expensive—cannot efficiently address. This project proposes a real-time, low-cost water quality monitoring system that integrates IoT sensors, Arduino Uno, and machine learning (ML) to evaluate water potability automatically and efficiently.
Key Objectives:
Enable automated, real-time monitoring of water quality.
Use sensor technology to measure critical parameters like pH, turbidity, temperature, and algae risk.
Employ ML models (Random Forest and XGBoost) for accurate potability prediction.
Provide a web-based interface (via Streamlit) for live data visualization and alerts.
System Components & Workflow:
Hardware Setup:
Arduino Uno collects data from pH and turbidity sensors.
Sensors detect water conditions and send analog signals to Arduino, which transmits data via USB or wireless modules (e.g., ESP8266).
Optional LCD display provides on-site feedback.
Data Processing & Machine Learning:
Data is preprocessed: cleaned, normalized, and missing values handled.
ML models (Random Forest and XGBoost) analyze data to classify water as potable or non-potable.
Performance evaluated using F-score and confidence score for reliability.
User Interface:
Developed using Streamlit for interactive visualization.
Provides live updates, risk alerts, and sensor readings.
Supports remote monitoring and potential cloud integration.
Benefits of the System:
Real-time monitoring reduces response time to contamination.
Low-cost and scalable, suitable for underserved and remote areas.
High accuracy through ensemble ML models.
Supports predictive analytics for early contamination detection.
Easily deployable in households, industries, and municipal systems.
Case Study – Sample Testing Results:
Freshwater:
pH ~5.5–7 and low turbidity → Classified as potable.
Lemon Water:
High acidity (low pH) despite low turbidity → Classified as non-potable.
Muddy Water (not shown but inferred):
Likely high turbidity → Likely non-potable.
Future Enhancements:
Integration with cloud storage for long-term trend analysis.
Remote control of filtration systems.
Addition of more sensors for expanded contamination detection.
Implementation of AI-driven filtration recommendations.
Conclusion
The successful implementation of the Water Potability Prediction system highlights the effectiveness of combining real-time sensor data, Arduino-based hardware, and machine learning algorithms for assessing water quality. By integrating pH and turbidity sensors with an Arduino Uno microcontroller, the system captures essential water parameters, which are then processed using a Random Forest classifier. This classification model efficiently determines whether water is potable or non-potable, offering an automated solution for continuous monitoring. The real-time predictions reduce human intervention and error, ensuring that users receive instant feedback about the safety of their drinking water. This approach is particularly beneficial in resource-limited regions, where manual water testing is impractical or infrequent. While the system successfully classifies water potability, some limitations remain. One key limitation is the restricted number of sensor parameters being monitored. Expanding the system to incorporate additional sensors—such as temperature, dissolved oxygen, and conductivity meters—could provide a more detailed and comprehensive assessment of water quality. Another challenge lies in the accuracy of machine learning predictions, which heavily depend on the quality and diversity of training datasets. If the dataset lacks wide-ranging water samples or environmental variations, the model’s predictions may be less reliable, potentially affecting potability classification. Looking forward, there is substantial potential for system enhancements through advanced technologies. Cloud integration would allow for remote data access, centralized storage, and long-term analytics, enabling authorities and consumers to track water quality trends over time. Additionally, incorporating edge computing techniques could process sensor data locally, reducing latency in predictions and enabling instant decision-making.
By leveraging AI-powered filtration recommendations, the system could provide adaptive solutions, suggesting appropriate water treatment measures based on contamination levels. Further improvements could also include the development of mobile applications, making real-time potability assessments more accessible. A user-friendly app could enable individuals to receive instant alerts and notifications, informing them about potential contamination risks or required corrective actions. This feature would be particularly useful for households, industries, and municipal water providers, ensuring proactive water management without dependence on manual testing. The Water Potability Prediction system presents a scalable and cost-effective framework for automated water safety assessment. With continued refinements in sensor integration, AI-based classification, and cloud-enabled monitoring, this approach can revolutionize water quality management, providing a smart and reliable solution for safe drinking water worldwide. Let me know if you need adjustments or a formatted journal-ready version
References
[1] Hussein, E. A., Thron, C., Ghaziasgar, M., Bagula, A., & Vaccari, M. (2020). Groundwater prediction using machine-learning tools. Algorithms, 13, 1–16. https://doi.org/10.3390/a13110300
[2] Jha, D. K., Devi, M. P., Vidyalakshmi, R., Brindha, B., Vinithkumar, N. V., & Kirubagaran, R. (2015). Water quality assessment using water quality index and geographical information system methods in the coastal waters of Andaman Sea, India. Marine Pollution Bulletin, 100, 555–561. https://doi.org/10.1016/j. marpolbul.2015.08.032
[3] Kalini, Z., International Journal of Information Management A SEM-neural network approach for predicting antecedents of m-commerce acceptance, 37 (2017) 14–24. https://doi.org/10.1016/j.ijinfomgt.2016.10.008.
[4] Kar, A. K., & Varsha, P. S. (2023). A review and research agenda. International Journal of Information Management Data Insights. , Article 100176. https://doi.org/10.1016/j. jjimei.2023.100176
[5] Kothari, N., Shreemali, J., Chakrabarti, P., & Poddar, S. (2021). Design and implementation of IoT sensor based drinking water quality measurement system. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2020.12.1142
[6] Lamare, R.E., Singh, O.P., Application of Ccme water quality index in evaluating the water quality status in limestone mining area of, 10 (2016) 149–154. Liu, P., Wang, J., Sangaiah, A., Xie, Y., & Yin, X. (2019). Analysis and prediction of water quality using LL.S.T.M.Deep neural networks in IoT environment. Sustainability, 11, 2058. https://doi.org/10.3390/su11072058
[7] Lumb, A., Sharma, T. C., & Bibeault, J.-F. (2011). A review of genesis and evolution of water quality index (WWQI and some future directions. Water Quality, Exposure and Health: Purpose and Goals, 3, 11–24. https://doi.org/10.1007/s12403-011-0040-0
[8] Mancuso, P., Piccialli, V., & Sudoso, A. M. (2021). A machine learning approach for forecasting hierarchical time series. Expert Systems with Applications, 182, Article 115102. https://doi.org/10.1016/j.eswa.2021.115102
[9] Najah Ahmed, A., Binti Othman, F., Abdulmohsin Afan, H., Khaleel Ibrahim, R., Ming Fai, C., Shabbir Hossain, M., Ehteram, M., & Elshafie, A. (2019). Machine learning methods for better water quality prediction. Journal of Hydrology, 578. https://doi. org/10.1016/j.jhydrol.2019.124084
[10] Nampak, H., Pradhan, B., & Manap, M. A. (2014). Application of GGISbased data driven evidential belief function model to predict groundwater potential zonation. Journal of Hydrology, 513, 283–300. https://doi.org/10.1016/j.jhydrol.2014.02.053
[11] Nayak, J. G., Patil, L. G., & Patki, V. K. (2020). Development of water quality index for Godavari River (India) based on fuzzy inference system. Groundwater for Sustainable Development, 10, Article 100350. https://doi.org/10.1016/j.gsd.2020.100350
[12] Nayar, R., Dr. (2020). Assessment of water quality index and monitoring of pollutants by physico-chemical analysis in water bodies: A review. International Journal of Engineering Research, 9, 178–185. https://doi.org/10.17577/ijertv9is010046
[13] Pasika, S., & Gandla, S. T. (2020). Smart water quality monitoring system with cost- effective using IoT. Heliyon, 6, e04096. https://doi.org/10.1016/j.heliyon.2020. e04096
[14] Roseela, J. A., Godhavari, T., Narayanan, R. M., & Madhuri, P. L. (2020). Design and deployment of IoT based underwater wireless communication system using electronic sensors and materials. Materials Today: Proceedings, 45, 6229–6233. https://doi.org/10.1016/j.matpr.2020.10.586