As organizations increasingly adopt cloud platforms for critical data and service delivery, ensuring system resilience and uninterrupted access during failures has become a key concern. Traditional disaster recovery methods often reliant on periodic backups and manual intervention struggle to meet the demands of real-time recovery in dynamic cloud environments. Real-time monitoring mechanisms track the health of the primary node and trigger an automatic failover to the backup in case of system failure. Experimental results demonstrate that the system maintains data consistency, minimizes downtime to under two seconds during failovers, and operates autonomously without human intervention.
The use of JSON-based synchronization, lightweight compression, and automated recovery logic makes the framework both efficient and scalable
Introduction
Cloud computing has transformed modern organizations by enabling scalable, cost-efficient, and globally accessible infrastructure. However, manual disaster recovery (DR) methods are slow, error-prone, and unfit for real-time cloud operations.
This paper proposes a dynamic, intelligent disaster recovery system that provides:
Real-time monitoring
Automated failover
Continuous data replication
Transparent logging and alerts
? The goal is to offer a scalable, automated, and user-friendly DR solution tailored to real-world cloud needs.
2. Literature Survey
Research has explored different DR components like resource scheduling, monitoring, or data synchronization. However, most systems lack:
A fully integrated, real-time DR interface
User-friendly tools for monitoring and control
???? This study bridges that gap by combining existing techniques into a comprehensive, real-time DR framework.
3. Methodology
The system is built with Python, Flask, and JSON logic for lightweight, cross-platform deployment. It includes five core modules:
A. System Architecture
User Module – Simulates regular cloud usage (uploads, edits)
Primary Node – Active server handling data
Backup Node – Real-time mirror of primary with JSON-based sync
Monitoring Module – Tracks node health and detects failures
Admin Dashboard – Visual interface for control, logs, and alerts
B. Real-Time Monitoring
Monitoring agent pings the primary node continuously
Failures are detected in sub-2-second intervals
Data transferred as compressed JSON to minimize latency
C. Data Synchronization
Sync uses versioned JSON objects and checksums
Resolves conflicts with version control
Ensures real-time data integrity
D. Admin Dashboard (Flask UI)
Visual indicators for node status
Manual failover tests and synchronization controls
Threat alerts and recovery logs
Designed for non-technical users
E. Failover Process
Monitor detects primary node failure
Backup node is promoted to active
User requests are redirected automatically
Admins are notified
Once restored, the primary becomes backup again
F. Testing Setup
Includes simulated uploads, node crashes, sync delays, and failover scenarios
Analyzed using logs and performance benchmarks
4. Results and Evaluation
? Key Metrics:
Failover Time:
Average: 1.74 seconds
?? Much faster than traditional DR systems (30 sec to minutes)
Data Integrity:
99.7% accuracy using checksums and hash comparison
System Availability:
Maintained high uptime during 24-hour test with random faults
RPO (Recovery Point Objective):
Near-zero data loss; backup resumes from latest state
RTO (Recovery Time Objective):
Near-instant recovery due to automated failover logic
Resource Efficiency:
CPU/memory usage stayed below 65%, making it viable for low-resource environments
Conclusion
The proposed dynamic disaster recovery framework addresses a critical challenge in modern cloud computing: ensuring continuous service availability and data integrity in the face of unexpected failures. The layered architecture—comprising user input simulation, primary-backup node orchestration, an intelligent monitoring module, and an interactive dashboard—enables seamless transition during system disruptions without compromising performance or data consistency. In the future, the system can be enhanced by integrating blockchain technology for immutable recovery logs and audit trails, ensuring tamper-proof traceability. Additionally, edge-based monitoring and federated learning can be incorporated to improve response times and enable predictive failure detection without compromising privacy. These enhancements will strengthen the system\'s adaptability and security, making it even more suitable for next-generation intelligent cloud infrastructure
References
[1] M. S. Aslanpour, A. H. Abdullah, and A. Razzaque, \"Cloud Disaster Recovery Approaches and Challenges: A Survey,\" International Journal of Computer Applications, vol. 130, no. 9, pp. 1–7, 2015.
[2] A. Khosravi, S. Garg, and R. Buyya, \"Energy and Performance Efficient Resource Scheduling in Cloud Computing Environments,\" Future Generation Computer Systems, vol. 45, pp. 304–316, 2015.
[3] A. Singh and R. Bose, \"Disaster Recovery Strategy for Cloud Computing Environment,\" International Journal of Scientific and Research Publications, vol. 4, no. 5, pp. 1–5, 2014.
[4] S. Dinesh, \"Cloud-Based Disaster Recovery and Real-Time Dashboard for Fault Management,\" International Journal of Cloud Computing, vol. 9, no. 2, pp. 120–130, 2018.
[5] V. Ranjith and S. Karthik, \"Dynamic Replication for Data Integrity in Cloud Disaster Recovery,\" International Journal of Engineering Research & Technology, vol. 6, no. 6, pp. 748–752, 2017.
[6] M. Ahmed and N. Abouzakaria, \"A Survey on Cloud Computing Resilience Mechanisms,\" Journal of Cloud Computing, vol. 10, no. 1, pp. 1–15, 2021.