Authors: Akshay Dumbre, Dishant Koli, Pritee Vaivude, Shravani Dumbre
Certificate: View Certificate
In this research, a comprehensive examination of Dissolved Oxygen (DO) levels in the Mississippi River is undertaken, employing a Polynomial Regression model driven by temperature data for predictive estimation. Dissolved oxygen serves as a pivotal indicator of water quality and the health of aquatic ecosystems, making its accurate forecasting crucial for effective environmental monitoring and management. By utilizing temperature as a primary predictor, this study seeks to advance our understanding of the intricate relationship between temper- ature and DO within the context of the Mississippi River. The research involves an extensive dataset comprising measurements of water temperature and dissolved oxygen collected over an extended duration. A Polynomial Regression model is employed to establish a mathematical link between temperature and DO, thus providing a predictive tool for estimating DO levels at specific locations along the river. The model’s performance is subjected to a rigorous evaluation, involving the assessment of diverse statistical metrics and validation techniques. The research outcomes yield valuable insights into the dynamics of DO in the Mississippi River, emphasizing the pivotal role of temperature as a primary driver of DO fluctuations. This study introduces a practical and efficient method for the monitoring and prediction of DO levels, which can be instrumental in the preservation and sustainable management of this vital aquatic ecosystem. Moreover, it makes a meaningful contribution to the broader realm of water quality assessment and has the potential to inform policies and practices aimed at ensuring the environmental well- being of the Mississippi River.
The Mississippi River, one of the most significant river sys- tems in North America, plays a pivotal role in the ecological and environmental landscape of the region. Understanding and monitoring the key factors influencing water quality in this vast river system is essential for effective resource management and environmental stewardship. Among the various parameters that influence water quality, Dissolved Oxygen (DO) is a critical indicator, reflecting the river’s capacity to support aquatic life and overall ecosystem health. The relationship between water temperature and dissolved oxygen levels in rivers has long been recognized as a fundamental aspect of aquatic ecology.
Temperature directly affects the solubility of oxygen in water, with warmer water generally holding less dissolved oxygen. This relationship is particularly crucial in river systems like the Mississippi, where variations in tempera- ture are influenced by seasonal and climatic changes. Ac- curate estimation of dissolved oxygen levels in such dynamic and ecologically significant water bodies is therefore a subject of great importance. In recent years, the development of predictive models for estimating dissolved oxygen levels from temperature data has gained momentum. Polynomial regres- sion, a versatile and flexible modeling technique, has shown promise in capturing the complex, nonlinear relationships between Temperature and Dissolved Oxygen. This research endeavors to contribute to this area of study by developing a polynomial regression model tailored to the Mississippi River. Leveraging a comprehensive dataset collected over three years, this study aims to provide a robust tool for the estimation of dissolved oxygen levels based on temperature, offering valuable insights for ecological management and water quality assessment in this iconic river system.
This paper outlines the methodology, data collection, and analysis techniques employed in developing the polynomial regression model. It also discusses the potential implications of the research for environmental conservation, aquatic ecosystem management, and the broader field of water quality assessment. With a focus on both the technical aspects of model development and the practical applications in ecological preservation, this study seeks to enhance our understanding of the intricate interplay between temperature and dissolved oxygen in the Mississippi River.
II. LITERATURE REVIEW
Water quality and the factors influencing it are of paramount concern in managing the health of aquatic ecosystems. The Mississippi River, a colossal river system, is no exception to this concern, as it serves as a vital conduit for both commerce and ecology. Among the many parameters used to evaluate water quality, Dissolved Oxygen (DO) stands out as a critical indicator of the river’s ecological health. The extent to which temperature affects DO levels in river systems has been a subject of scientific inquiry for decades, with various machine learning and neural network methodologies employed to model DO content in surface waters, as noted by , , . This literature review explores the background and key findings in the context of predicting DO levels from temperature data, with a focus on the Mississippi River.
Temperature-DO Relationship in Rivers: A Historical Per- spective. The relationship between temperature and DO levels in rivers has long been recognized as an essential aspect of aquatic ecology. It is well-established that as water temperature rises, the solubility of oxygen decreases, leading to lower DO concentrations. This negative correlation between temper- ature and DO is fundamental and serves as the foundation for understanding the oxygen dynamics in rivers. What role does temperature play in changing dissolved oxygen levels, according to early research .
Modeling Approaches for Temperature-DO Relationships Numerous modeling approaches have been employed to quantify the temperature-DO relationship in river ecosystems. Traditional linear regression models have been used, and while they can capture the basic relationship, they often fall short in capturing the complex, nonlinear nature of this interaction. This limitation has led to the exploration of more sophisticated modeling techniques. Among these, polynomial regression models have shown great promise . These models allow for the incorporation of higher-order terms and are better suited to capturing the nonlinear effects of temperature on DO.
Polynomial Regression Models in Water Quality Studies Polynomial regression models have been applied effectively in various water quality studies, offering flexibility in modeling complex relationships, which is particularly valuable in ecosystems as dynamic as the Mississippi River , .
Challenges and Limitations in Temperature-DO Modeling While polynomial regression models offer advantages, it’s important to acknowledge the challenges and limitations. One challenge is the potential for over-fitting when using high-order polynomial terms. Careful model selection and validation techniques are crucial to ensure the robustness and reliability of the model. Additionally, variations in local conditions and seasonal patterns in the Mississippi River may introduce complexities that require careful consideration in modeling , .
Significance of the Present Study The development of a polynomial regression model to predict DO levels from tem- perature data in the Mississippi River can contribute to the ongoing efforts to monitor and manage water quality in this ecologically significant system. This research, conducted over a three-year period, aims to provide a robust and reliable predictive tool for estimating DO levels, incorporating ad- vanced techniques like neural networks combined with fuzzy computing . This contributes to the wider understanding of how temperature and dissolved oxygen interact within river ecosystems and may guide policy and management decisions aimed at protecting the environmental health of the Mississippi River , , .
A. Data Collection
The cornerstone of meaningful research hinges upon the acquisition and meticulous preparation of top-tier data. In our investigation, we conducted an exhaustive review of the water quality database specific to the Mississippi River, accessi- ble through the U.S. Geological Survey (USGS). USGS is renowned for its extensive coverage of water quality metrics and graciously provided us with comprehensive records span- ning the years 2020 to 2022. Our research centers on the Mississippi River at Baton Rouge, LA, with a specific focus on the monitoring station denoted as USGS 07374000. This USGS water database stands as an invaluable asset, significantly enriching our comprehension of water quality, streamlining its effective management, and shouldering a pivotal role in the preservation of the nation’s water resources.
It acts as an essential cornerstone that bolsters scientific research, informs the development of well-founded policies, and advocates for responsible water resource management across the expanse of the United States. The availability of such painstakingly detailed and high-quality data has been instrumental in easing our nuanced analysis, empowering us to delve into the intricate connections between temperature and Dissolved Oxygen within the realm of water quality analysis and forecasting.
Table 1 shows the first 5 rows of our time series dataset.
Temperature: Water temperature is a critical parameter in the river systems of the United States, playing a central role in shaping the health and sustainability of these vital aquatic ecosystems. The temperature of river water is far from a passive variable; it exerts a profound influence on various facets of riverine environments, with its effects extending far beyond the simple reading on a thermometer. Understanding the importance of water temperature in the USA’s river systems is pivotal for the effective management and preservation of these essential water bodies. One of the most significant conse- quences of water temperature in river systems is its impact on the levels of dissolved oxygen. This relationship is paramount to the health of aquatic life. As water temperature rises, its capacity to hold oxygen decreases, leading to lower dissolved oxygen concentrations. Dissolved oxygen is a lifeline for fish, invertebrates, and other aquatic organisms, serving as the oxygen source they depend on for respiration. A decrease in dissolved oxygen levels due to elevated water temperature can have detrimental effects on aquatic biodiversity and the overall health of river ecosystems.
Water temperature is a key indica- tor in the assessment of water quality in river systems. Rapid or sustained increases in temperature can signify pollution sources, such as industrial discharges or urban development, which can lead to reduced dissolved oxygen levels. Monitoring water temperature is indispensable for the early detection of environmental stressors and the implementation of effective management and conservation strategies. Water temperature is undeniably one of the linchpins of the ecological intricacies of river systems in the USA. Its effects on dissolved oxygen, seasonal dynamics, and its role in responding to climate change emphasize the importance of meticulous monitoring and understanding of this variable. To maintain the health and resilience of these crucial ecosystems, water temperature should be recognized as a cornerstone of river system man- agement and conservation.
Figure 2 shows the frequency of days from 2020 to 2022 with respect to the mean temperature ranges (in degree Cel- sius)
Dissolved Oxygen: Dissolved oxygen is a critical parame- ter in the river systems of the United States, holding profound significance for the health and sustainability of these vital aquatic ecosystems.
D. Data Integrity and Preprocessing:
Ensuring the integrity of data stands as a paramount con- cern. Initial checks were carried out to detect any instances of missing values, outliers, or irregularities. During the pre- processing phase, identified missing values within the dataset were thoughtfully managed through exclusion to maintain the consistency and dependability of our analyses.
E. Data Analysis
The process of translating raw data into actionable insights necessitates a structured and methodical approach to analysis. Our research employed a series of analytical techniques, with each step building upon the insights gained in the preceding phase.
F. Data Exploration and analysis
The dataset employed for this research was sourced from ”United States Geological Survey(USGS)” and primarily fo- cuses on the relationship between the mean dissolved oxygen levels and mean temperature values from the Mississippi River.
G. Preliminary Data Exploration
Upon initial review, the dataset consisted of multiple vari- ables, with particular emphasis placed on the average measure- ments of dissolved oxygen and temperature. The structure of the dataset was carefully examined, revealing a considerable volume of records. Subsequent scrutiny detected instances of missing data, which were systematically excluded from the dataset to ensure the integrity of the analysis. This refinement of the dataset was imperative for conducting a rigorous and dependable statistical assessment.
H. Statistical Analysis
Upon initial review, the dataset was found to contain a multitude of attributes, with investigative efforts particularly honed in on the mean values for dissolved oxygen and tem- perature. A thorough examination of the dataset’s framework was conducted, revealing a substantial number of records. This in-depth evaluation also uncovered a series of missing entries, necessitating their removal to preserve the dataset’s analytical validity. Such meticulous data curation was fundamental to ensuring the subsequent analysis would be both precise and credible.
The Correlation Heatmap (Fig. 4) clarified the connections among the different attributes. A significant finding was the strong correlation between the average dissolved oxygen and average temperature, affirming the choice to delve deeper into this particular association.
Scatter plots are a powerful tool for illustrating the nuances of the relationship between pairs of numerical variables. The scatter plot detailing the relationship between the mean dis- solved oxygen and the mean temperature, in particular, sheds light on the nature of their correlation. By plotting each data point, it provides a compelling visualization that highlights
patterns or trends, such as the apparent direct correlation showcased in Fig. 5, offering a deeper insight into how these variables might influence one another.
I. Visual Data Exploration
Several visualizations were crafted to explore the data further:
The visual clarity provided by these histograms is in- valuable. It facilitates a rapid and intuitive understanding of where the majority of data points lie, the tails of the distribution, and any deviations from the expected normal distribution. By revealing outliers and anoma- lies, histograms guide researchers in making informed decisions about which statistical methods are appropriate for further analysis. They are a foundational step in data exploration, offering a preliminary check on the assump- tions underlying parametric statistical tests. Moreover, these visual aids can direct the researcher’s attention to underlying phenomena that warrant more sophisticated modeling techniques, and they can also serve as an aid in communicating findings to a broader audience, making complex data more accessible.
A. Residual Analysis
Residual plots serve as a diagnostic tool to verify the assumptions of a regression model. For the polynomial regression model developed in this study, Figures 10 and 11 display the residual plots for both the training and test datasets. These plots are critical for assessing the model’s accuracy in predicting dissolved oxygen levels based on water temperature. A key feature of an effective model is the random scatter of residuals, which suggests that the model captures the underlying pattern without systematic errors. The uniform distribution of residuals around the horizontal axis in our plots suggests that the variances of the error terms are constant (homoscedastic- ity) and that the residuals are independent of each other. This indicates that the model accounts well for the non- linear relationship between temperature and dissolved oxygen without overfitting to the training data or failing to capture the variance in the test data.
Moreover, the absence of distinct patterns in the resid- ual plots implies that the polynomial regression model does not suffer from omitted variable bias, where the exclusion of a relevant variable would otherwise lead to systematic errors in prediction. The plots do not exhibit any clear signs of heteroscedasticity or autocorrelation, further supporting the model’s appropriateness for the data at hand. The homogeneity in variance across the range of predicted values is particularly important when dealing with environmental data, which often contain outliers or high-leverage points due to natural variability. The examination of residual plots also offers insights into the potential for model improvement. While the current model performs well, any outliers or patterns that could emerge in residual plots with larger datasets could indicate the need for additional explanatory variables or transformation of the data. Nonetheless, the current residual plots corroborate the polynomial regression model’s capability to generalize beyond the observed data, making it a valuable tool for predicting dissolved oxygen levels in the Mississippi River. This analytical strength underpins the model’s utility for environmental scientists and policymakers who rely on accurate wa- ter quality predictions to make informed decisions for ecosystem management and conservation efforts.
B. Distribution Plot
The distribution plot for mean dissolved oxygen levels, as seen in the Mississippi River dataset, is a pivotal tool in exploratory data analysis. This visual representa- tion offers immediate insight into the central tendency, variability, and the form of the data distribution. By assessing the plot, researchers can determine the degree of skewness and kurtosis, which are measures of data asymmetry and the peakedness of the distribution, re- spectively. These metrics are crucial because they inform the suitability of various data analysis techniques and guide the choice of predictive models.
For example, a distribution with high skewness may suggest the need for data transformation to meet the assumptions of parametric tests and models, which typi- cally assume data normality. Kurtosis, on the other hand, provides information on the data’s tendency to produce outliers, which can have a disproportionate impact on certain statistical analyses.
In the context of this study, the distribution plot of dissolved oxygen levels is more than a preliminary step; it’s a foundational aspect that influences the entire modeling process. If the data exhibits a normal distri- bution, this supports the use of polynomial regression models since these models assume that the residuals are normally distributed. On the contrary, if significant deviations from normality are present, it may necessitate a reevaluation of the model choice or the adoption of data transformation techniques.
Moreover, the distribution plot can reveal outliers which, depending on their nature and the goals of the study, might be candidates for further investigation or ex- clusion from the dataset to avoid skewing the results. Outliers can be particularly informative or problematic in environmental data; they could represent rare but ecologically significant events or data errors.
In the analysis of the Mississippi River data, the distribu- tion plot contributes to a comprehensive understanding of the environmental factors at play. It ensures that the subsequent polynomial regression model is constructed on a solid foundation of data that is well-understood and properly conditioned, thereby enhancing the credibility of the model’s predictions and the conclusions drawn from it. This is indispensable for researchers and poli- cymakers who rely on accurate predictions of dissolved oxygen levels for effective water quality management and the conservation of aquatic ecosystems.
In our research study, we have implemented the ma- chine learning technique of polynomial regression, while concurrently recognizing the substantial potential for embracing a diverse spectrum of artificial intelligence methods in the domain of water quality analysis. Specifi- cally, artificial neural networks, encompassing advanced deep learning models like Convolutional Neural Net- works (CNNs) and Recurrent Neural Networks (RNNs), have emerged as formidable assets for the tasks of pattern recognition and predictive modeling within the realm of water quality data. CNNs, renowned for their prowess in image analysis and spatial data interpretation, hold the potential to offer invaluable capabilities when assessing water quality, particularly in visually complex scenarios. Their application extends to the analysis of water quality-related images, such as those derived from satellite or drone sources, facilitating the identification of environmental factors, pollution sources, and temporal dynamics within water bodies. In parallel, RNNs exhibit excellence in processing sequential data, rendering them exceptionally suitable for time series analysis in the context of water quality monitoring. These networks exhibit proficiency in capturing temporal dependencies and variations in water quality parameters, thereby em- powering the detection of prolonged trends and the anticipation of future conditions.
The utilization of artificial neural networks bears a mul- titude of advantages, principally anchored in their apti- tude for discerning intricate relationships from extensive and multifaceted datasets. This adaptability equips them to harmonize with the distinct nuances present in water quality data. By leveraging the capabilities inherent in deep learning models, water quality predictions stand poised to benefit from elevated precision and enhanced generalization. Moreover, the inherent potential for con- tinuous monitoring conferred by these models equips them for the early identification of anomalies, facilitating a proactive response conducive to the preservation of water resources and the safeguarding of public health.
The United States Geological Survey (USGS) played a pivotal role as a priceless asset for our research paper. It provided water quality dataset for the Mississippi River at Baton Rouge, LA (USGS 07374000) covering a span of three years. This extensive dataset formed the cornerstone of our efforts to construct a polynomial regression model aimed at predicting dissolved oxygen levels based on temperature, significantly elevating the precision and reliability of our analysis. The accessibility of water data from USGS was indispensable in advanc- ing our comprehension of the intricate dynamics within the Mississippi River and, by extension, the broader field of water quality assessment in river systems throughout the United States.
Our research endeavors to shed light on the relationship between temperature and dissolved oxygen levels in the Mississippi River. We collected and analyzed a substantial dataset, providing a detailed account of the river’s water quality dynamics. Through the development and application of a polynomial regression model, we have uncovered valuable insights into the predictive ca- pabilities of temperature as a key variable in estimating dissolved oxygen levels. Our findings emphasize the sig- nificance of temperature as a primary driver of dissolved oxygen fluctuations in the Mississippi River. By employ- ing this polynomial regression model, we have enhanced our understanding of the intricate interactions within this dynamic ecosystem. This research not only contributes to the body of knowledge in water quality assessment but also offers a practical and efficient tool for monitoring and predicting dissolved oxygen levels, essential for the preservation and responsible management of this crucial aquatic environment. Furthermore, the implications of this research extend beyond the Mississippi River. The methodologies employed and the insights gained from our study have the potential to inform broader water quality assessment practices, benefiting environmental management and conservation efforts in river systems across the United States. In a time when sustainable water resource management is of utmost importance, our research offers a valuable contribution towards achieving this goal.
 S?iljic´ Tomic´, D. Antanasijevic´, M. Ristic´, A. Peric´-Grujic´, V. Pocajt, ”A linear and non-linear polynomial neural network modeling of dissolved oxygen content in surface water: Inter- and extrapolation performance with inputs’ significance analysis”, Elsevier, 2017, pp. 9.  W. Li, H. Fang, G. Qin, X. Tan, Z. Huang, F. Zeng, H. Du, S. Li, ”Concentration estimation of dissolved oxygen in Pearl River Basin using input variable selection and machine learning techniques”, Elsevier, 2020, pp. 12.  P. Mart´?, J. Shiri, M. Duran-Ros, G. Arbat, F. Ram´?rez de Cartagena, J. Puig-Bargue´s, ”Artificial neural networks vs. Gene Expression Programming for estimating outlet dissolved oxygen in micro-irrigation sand filters fed with effluents”, Elsevier, 2013, pp. 10.  A. Csa´bra´gia, S. Molna´ra, P. Tanosa, J. Kova´csb, M. Molna´rc, Szabo´a, I. G. Hatvanid, ”Estimation of dissolved oxygen in riverine ecosystems: Comparison of differently optimized neural networks”, Ecological Engineering, 2019, pp. 17.  H. Wang, M. Hondzo, C. Xu, V. Poole, A. Spacie, ”Dissolved oxygen dynamics of streams draining an urbanized and an agri- cultural catchment”, Elsevier, 2002, pp. 17.  M. Ay, O¨ . Kis¸i, ”Modeling of Dissolved Oxygen Concentration Using Different Neural Network Techniques in Foundation Creek, El Paso County, Colorado”, American Society of Civil Engineers, 2012, pp. 9  J. Li, N. S. Xu, W. W. Su, ”Online estimation of stirred-tank microalgal photobioreactor cultures based on dissolved oxygen measurement”, Elsevier, 2002, pp. 15  K. Galajit, S. Duangpummet, P. Dangsakul, R. Keinprasit, P. Dillon, J. Intha, K. Rungprateepthaworn, J. Karnjana, ”Prediction of Dissolved Oxygen Concentration for Shrimp Farming Using Quadratic Regression and Artificial Neural Network”, Not pro- vided, 2006, pp. 6.  B. Keshtegar, S. Heddam, H. L. Krauss, ”Modeling daily dis- solved oxygen concentration using modified response surface method and artificial neural network: a comparative study”, The Natural Computing Applications Forum, 2017-2018, pp. 12.  S. Nacar, B. Mete, A. Bayram, ”Estimation of dissolved oxygen concentration using conventional regression analysis, multivari- ate adaptive regression splines, and TreeNet techniques”, KSCE Journal of Civil Engineering, 2017, pp. 9.  B. Keshtegar, S. Heddam, H. Hosseinabadi, ”The employment of polynomial chaos expansion approach for modeling dissolved oxygen concentration in river”, Environmental Earth Sciences, 2019, pp. 18.  M. Ay, O¨ . Kis¸i, ”Estimation of Dissolved Oxygen by using Neural Networks and Neuro Fuzzy Computing Techniques”, KSCE Journal of Civil Engineering, 2017, pp. 9.  B. Keshtegar, S. Heddam, H. Hosseinabadi, ”The employment of polynomial chaos expansion approach for modeling dissolved oxygen concentration in river”, Environmental Earth Sciences, 2019, pp. 18.
Copyright © 2023 Akshay Dumbre, Dishant Koli, Pritee Vaivude, Shravani Dumbre. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.