Advancing non-optical water quality monitoring in Lake Tana, Ethiopia: insights from machine learning and remote sensing techniques
Abstract
Water quality is deteriorating in the world's freshwater bodies, and Lake Tana in Ethiopia is becoming unpleasant to biodiversity. The objective of this study is to retrieve non-optical water quality data, specifically total nitrogen (TN) and total phosphorus (TP) concentrations, in Lake Tana using Machine Learning (ML) techniques applied to Landsat 8 OLI imagery. The ML methods employed include Artificial Neural Networks (ANN), Support Vector Regression (SVR), Random Forest Regression (RF), XGBoost Regression (XGB), AdaBoost Regression (AB), and Gradient Boosting Regression (GB). The XGB algorithm provided the best result for TN retrieval, with determination coefficient (R2), mean absolute error (MARE), relative mean square error (RMSE) and Nash Sutcliff (NS) values of 0.80, 0.043, 0.52, and 0.81 mg/L, respectively. The RF algorithm was most effective for TP retrieval, with R2 of 0.73, MARE of 0.076, RMSE of 0.17 mg/L, and NS index of 0.74. These methods accurately predicted TN and TP spatial concentrations, identifying hotspots along river inlets and northeasters. The temporal patterns of TN, TP, and their ratios were also accurately represented by combining in-situ, RS and ML-based models. Our findings suggest that this approach can significantly improve the accuracy of water quality retrieval in large inland lakes and lead to the development of potential water quality digital services.