The increasing relevance of space tourism necessitates the development of predictive systems for ensuring passenger safety and transportation efficiency. This paper presents a hybrid machine learning pipeline that utilizes advanced techniques including feature engineering, anomaly detection (Isolation Forest), stacking classifiers, SHAP explainability, and NLP with transformer models to predict survival rates based on the Spaceship Titanic dataset. Additionally, we propose an Augmented Reality (AR) simulation tool to enhance passenger experience and operational preparedness. Our methodology achieves robust predictive performance, improved model interpretability, and practical integration for real-time space tourism applications.
Introduction
This paper presents a hybrid machine learning (ML) pipeline for risk assessment in space tourism using the synthetic "Spaceship Titanic" dataset, which contains both structured and unstructured data simulating a catastrophic event. The pipeline integrates ensemble learning (Random Forest, LightGBM, SVM with logistic regression stacking), natural language processing (NLP) for semantic feature extraction, anomaly detection (Isolation Forest), and SHAP-based explainability to ensure accurate and interpretable predictions.
The model achieved 82–84% accuracy and ~0.85 AUC, with key features like CryoSleep, TotalSpending, and Age identified as most important. Excluding NLP features reduced performance, highlighting their value. An AR-based simulation system is proposed for immersive visualization of passenger data, risk predictions, and spaceship layout, supporting safety training, passenger experience, and real-time decision-making.
Future work includes integrating real-world data, time-series modeling, real-time AR feedback, ethical AI bias monitoring, and collaborative multi-user AR simulations. Overall, the study demonstrates the effectiveness of combining explainable AI, ensemble modeling, NLP, and AR to improve safety and transparency in commercial space tourism.
Conclusion
In recent years, space tourism has transitioned from science fiction to a tangible commercial reality. With ventures like SpaceX and Blue Origin leading the way, the demand for intelligent, data-driven systems to ensure passenger safety and improve travel experience has grown. This paper proposes a hybrid machine learning (ML) pipeline tailored for space tourism risk assessment using the synthetic \'Spaceship Titanic\' dataset. The dataset includes both structured and unstructured data, simulating a catastrophic event and the need to predict which passengers were \'transported.\' Our pipeline integrates ML techniques, NLP features, anomaly detection, and SHAP-based interpretability to ensure predictive accuracy and transparency.
References
[1] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, pp. 785–794.
[2] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Adv. Neural Inf. Process. Syst., vol. 30, pp. 4765–4774, 2017.
[3] A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., 2017, pp. 5998–6008.