This work proposes a machine learning–oriented framework for tsunami prediction that seeks to overcome the constraints of conventional numerical forecasting. Four algorithms—Decision Tree Classifier, Decision Tree Regressor, Linear Regression, and Random Forest Classifier—are compared using 22,797 tsunami records obtained from NOAA’s Global Historical Tsunami Database. Among these, the Random Forest Classifier performed best, achieving 95.92% accuracy and precision scores of 99%, 96%, and 95% for Safe, Moderate, and Severe classes, respectively. Unlike physics-based simulations that may require over 30 minutes to produce alerts, the system is capable of near real-time prediction, making it suitable for emergency decision-making. Synthetic data generation techniques are also applied to address the scarcity of historical tsunami events, resulting in a more robust early warning framework that can be integrated with international monitoring systems for disaster risk reduction.
Introduction
Tsunamis are highly destructive, often giving coastal areas only minutes to respond.
Traditional numerical forecasting methods are too slow (often taking over 30 minutes), limiting evacuation and emergency response.
Machine Learning (ML) offers a faster, data-driven alternative by identifying patterns in seismic data for real-time risk assessments.
2. Research Aim
The study compares four ML models to identify the most effective algorithm for real-time tsunami prediction:
Decision Tree Classifier
Decision Tree Regressor
Linear Regression
Random Forest Classifier
3. Literature Insights
Past studies show Random Forest (RF) consistently outperforms other models in accuracy and interpretability:
Mulia et al. achieved 95.92% accuracy using RF.
Lu used RF Regression to predict tsunami water heights with improved MSE.
Fauzi & Mizutani showed ML models outperform traditional methods in speed and precision.
Sukmana et al. found that RF had better recall, while SVM had higher precision.
Emphasis is placed on ensemble methods like Random Forest for both accuracy and explainability.
4. Methodology
A. Model Selection:
Chosen for balance between accuracy, efficiency, and interpretability.
B. Data Handling:
Used NOAA’s tsunami database (22,797 records).
Synthetic data augmentation applied to overcome event rarity.
Preprocessing included normalization, feature selection, and outlier detection.
C. Algorithm Overview:
Decision Tree Classifier: Tuned via cross-validation for categorical risk prediction.
Decision Tree Regressor: Used MSE for splitting; pruned to avoid overfitting.
Linear Regression: Used forward selection and regularization; best for simple, interpretable trends.
Random Forest Classifier: Combined 100 trees with bootstrap aggregation for high accuracy and robustness.
5. Key Findings
???? Random Forest Classifier (RF Cls) – Best Overall
Accuracy: 95.92% overall; with precision for:
Safe (99%), Moderate (96%), Severe (95%)
Robustness: High across various data subsets and regions.
Feature Handling: Effective with mixed data types (temporal, seismic, geographic).
Computational Cost: Higher than others, but acceptable for real-time use.
This study establishes the Random Forest Classifier as the most reliable algorithm for tsunami prediction when evaluated against Decision Tree Classifier, Decision Tree Regressor, and Linear Regression. It delivered an overall accuracy of 95.92%, with consistently high precision across Safe, Moderate, and Severe categories, as shown in Fig. 3.
The analysis demonstrates that Random Forest successfully combines the clarity of tree-based methods with the improved performance of ensemble learning. This advantage is emphasized in Fig. 1, while Fig. 2 highlights its robustness across varied tsunami scenarios. By enabling near real-time forecasting, the proposed approach addresses the delays of conventional simulation-based techniques and provides a solution that can be adapted to global monitoring networks. Future directions include expanding the dataset to additional regions, applying deep learning methods for further comparison, and designing hybrid models that integrate multiple approaches. In summary, this framework marks a significant step forward in tsunami prediction research, offering faster, more dependable, and interpretable early warning capabilities that can directly support disaster preparedness and save lives.
References
[1] Mulia, I. E., Gusman, A. R., & Satake, K. \"Machine learning-based tsunami inundation prediction derived from offshore observations,\" Nature Communications, vol. 13, pp. 5489, 2022, doi: 10.1038/s41467-022-33253-5.
[2] Lu, S. \"Predicting the Characteristics of Tsunamis Using Machine Learning,\" in International Conference on Data Management, Analytics and Innovation, pp. 128279, 2024.
[3] Fauzi, A., & Mizutani, N. \"Machine learning algorithms for real-time tsunami inundation forecasting: A case study in Nankai Trough,\" Pure and Applied Geophysics, vol. 177, pp. 1437-1450, 2020, doi: 10.1007/s00024-019-02364-4.
[4] Ramalingam, N. R. \"Machine Learning Approaches for Tsunami Hazard and Risk Assessment,\" in EGU General Assembly 2025, 2025, doi: 10.5194/egusphere-egu25-19939.
[5] Kolev, T., Ghattas, O., & Dunham, E. M. \"Real-time tsunami forecasting using exascale computing and physics-informed machine learning,\" Nature Computational Science, vol. 2, pp. 145-158, 2025.
[6] Sukmana, H. T., Rahman, A., & Sari, D. P. \"Comparative Analysis of SVM and RF Algorithms for Tsunami Prediction in Disaster Management,\" Journal of Applied Data Sciences, vol. 5, no. 1, pp. 84-99, 2024, doi: 10.47738/jads.v5i1.159.
[7] Goda, K., Mai, P. M., Yasuda, T., & Mori, N. \"Sensitivity of tsunami wave profiles and inundation simulations to earthquake slip and fault geometry for the 2011 Tohoku earthquake,\" Earth, Planets and Space, vol. 66, pp. 105, 2014, doi: 10.1186/1880-5981-66-105.
[8] Okada, Y. \"Surface deformation due to shear and tensile faults in a half-space,\" Bulletin of the Seismological Society of America, vol. 75, pp. 1135-1154, 1985.
[9] Kajiura, K. \"The leading wave of a tsunami,\" Bulletin of the Earthquake Research Institute, vol. 41, pp. 535-571, 1963.
[10] Breiman, L. \"Random forests,\" Machine Learning, vol. 45, pp. 5-32, 2001, doi: 10.1023/A:1010933404324.
[11] Quinlan, J. R. \"Induction of decision trees,\" Machine Learning, vol. 1, pp. 81-106, 1986, doi: 10.1007/BF00116251.
[12] Hastie, T., Tibshirani, R., & Friedman, J. \"The Elements of Statistical Learning: Data Mining, Inference, and Prediction\", 2nd ed. New York: Springer, 2009.
[13] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. \"Scikit-learn: Machine learning in Python,\" Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
[14] McKinney, W. \"Data structures for statistical computing in Python,\" in Proceedings of the 9th Python in Science Conference, pp. 56-61, 2010.
[15] Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., ... & Zhou, T. \"Xgboost: extreme gradient boosting,\" R package version 0.4-2, pp. 1-4, 2015.
[16] NOAA National Centers for Environmental Information \"Global Historical Tsunami Database,\" 2024. [Online]. Available: https://www.ngdc.noaa.gov/hazard/tsu_db.shtml. [Accessed: 2024
[17] Satake, K., Fujii, Y., Harada, T., & Namegaya, Y. \"Time and space distribution of coseismic slip of the 2011 Tohoku earthquake as inferred from tsunami waveform data,\" Bulletin of the Seismological Society of America, vol. 103, pp. 1473-1492, 2013, doi: 10.1785/0120120122.
[18] Ward, S. N. \"Landslide tsunami,\" Journal of Geophysical Research: Solid Earth, vol. 106, pp. 11201-11215, 2001, doi: 10.1029/2000JB900450.
[19] Tanioka, Y., & Satake, K. \"Tsunami generation by horizontal displacement of ocean bottom,\" Geophysical Research Letters, vol. 23, pp. 861-864, 1996, doi: 10.1029/96GL00736.