Authors: Shiva Sharma, Simranjeet Singh, Shyam Sharma, Neerja Negi
Certificate: View Certificate
Cloud computing is a potent tool for sophisticated and massive-scale computation. It removes the need for expensive hardware, specialized space, and software maintenance. It has been noticed that cloud computing has resulted in a massive increase in the volume of data, or big data. Managing massive amounts of data is a complex and time-consuming operation that requires an extensive computer infrastructure for effective data processing and analysis. Many industries, including minor and major organizations, healthcare, education, and many more, are attempting to harness the potential of big data. In healthcare, for example, big data is used to reduce treatment costs, predict pandemic outbreaks, and prevent infections, among other things. This article discusses comprehensive data processing strategies from system and application perspectives to offer an orderly picture of the issues that application developers and database management system (DBMS) designers face while designing and deploying internet-scale applications. While big data has various uses in various industries, it has challenges.
Cloud computing has shown to be a practical paradigm for SOAP. This development has ushered in changes in the abstraction and use of computer infrastructure. The flexibility, pay-as-you-go pricing model, cheap initial investment, and risk transferability of cloud computing make it the go-to platform for establishing cost-effective business infrastructure. For several decades, distributed databases have been the holy grail of scientific inquiry. However, as data patterns and applications evolve, a new form known as key-value storage has emerged and is now extensively employed by many businesses. Hadoop, an open-source version of MapReduce, is widely utilized in business and academia . In terms of usability and efficiency, Hadoop is a game-changer. HDFS has become a beneficial technology for managing and archiving large, complicated datasets. It is becoming easier for computers to access and make sense of big data. Today is a data-driven world. They are everywhere these days due to the fantastic technological advances of recent years . The pace of digitization has accelerated, and the term "digital information societies" has entered common parlance. Whereas just 1% of information created 20 or 30 years ago was digital, now more than 94% of information arrives in digital form from a wide variety of digital sources. Large data sets that exceed the capacity of existing technologies are a hallmark of the "big data" phenomenon, which represents the evolution of human cognition . Fast, heterogeneous data calls for novel processing forms to facilitate decision-making, insight discovery, and process optimization. We must be able to safely store, handle, and share complex data on the cloud so that we can analyse the data and identify trends. Given the cloud's inherent complexity, we believe that focusing on incremental improvements to cloud security is preferable to presenting comprehensive approaches.
II. BIG DATA
Big data refers to the enormous, intricate, and varied databases that are challenging to handle and process using conventional data processing techniques. Volume, Velocity, and Variety are the three Vs that define it. The enormous quantity of data produced by numerous sources, including social media, sensors, and other digital devices, is referred to as volume. Velocity is the measure of how quickly data must be handled in order to be used in real-time. Data that is diverse includes all kinds and forms, including organized, semi-structured, and unstructured data. The difficulties of handling and studying these sizable files have given rise to big data technologies like Hadoop, Spark, and NoSQL databases. . These tools enable businesses to gather insightful data and make data-driven choices in a variety of industries, including marketing, finance, and healthcare.
A. Big Data and its features
Volume, value, variety, velocity, and veracity often define big data as a compilation of several sources.
Processing in Velocity may be done in two main ways: in a batch, or in a continuous stream. It is common practice to process data in batches that have been saved for later use. Data handled in batches tend to be quite useful. As a result, their processing time will increase. For large amounts of data, Hadoop MapReduce is the best framework available. This technique works well when processing large volumes of data is more important than obtaining real-time analytics.
However, stream processing is fundamental for real-time data processing and analysis. With the use of stream processing, new information may be examined as it comes. Rapid ingestion of this data into analytics tools enables rapid output of findings. The ability to spot anomalies that point to fraud in real time makes this approach promising in a number of contexts. Furthermore, online firms would profit from real-time processing since it would enable them to keep detailed records of consumer transactions and provide real-time product recommendations .
III. CLOUD MANAGEMENT FOR MASSIVE DATA SETS
The Cloud Computing ecosystem is built on the use and provision of services. There are several groups into which service-oriented systems might be grouped. The abstraction level supplied to the system's user is one of the most common criteria for categorising these systems. Typically, three distinct tiers are separated in this manner: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) (SaaS). Cloud Computing provides scalability regarding resource utilisation, cheap administration effort, price model flexibility, and software user mobility. Under these conditions, it is clear that the Cloud Computing paradigm is advantageous for big projects, such as those involving Big Data and BI .
Considering the nature of the data management industry, the optimal management organisation design may be built on a four-layer architecture and include the following elements:
A file system for storing Big Data, i.e., many big-sized archives. This layer is implemented at the IaaS level since it specifies the fundamental architecture structure for the subsequent layers .
A DBMS for efficiently arranging and gaining access to data. It is situated between IaaS and PaaS since it has properties with both systems. Developers utilise it to access the data, although its implementation is hardware-based. A PaaS serves as an interface, offering its capabilities on the top side and the implementation for a specific IaaS on the lower side. This functionality enables the deployment of apps on several IaaS without rewriting them.
A tool for distributing the computing workload among the cloud's processors. Clearly connected to PaaS, this layer functions as a "software API" for encoding Big Data and BI applications .
Users need a query mechanism for knowledge and information extraction between the PaaS and SaaS levels.
Computing services like as hosts, memory, databases, infrastructure, applications, analytics, and many more are distributed across the Internet to provide scalability, rapid innovation, and cost savings. Cloud computing has transformed the abstraction and use of computer infrastructure. The scope of cloud concepts has been expanded to include anything that may be deemed a service. The many advantages of cloud computing, including flexibility, pay-as-you-go or pay-per-use models, cheap initial investment, and many more, have made it a feasible and desired option for storing, administering, and analytics of large amounts of data . Amazon, Google, and Microsoft provide their own cost-effective big data platforms since big data is increasingly crucial for many enterprises and disciplines. These technologies are scalable for organisations of all sizes. That has led to the popularity of Analytics as a Service (AaaS) as a quicker and more effective method to connect, manipulate, and display various kinds of data. Data Analytics .
IV. BIG DATA ANALYTICS CYCLE
According to experts, processing massive data for analytics varies from regular transactional data. In conventional setups, data is analyzed before creating a model design and database structure. As can be seen, it begins by collecting information from several sources, including different files, systems, sensors, and the Internet. This data is stored on a medium capable of processing the volume, diversity, and velocity of data, known as the "landing zone." Typically, this is a distributed file system. After data is saved, it undergoes many modifications to retain its efficiency and scalability. Then they are incorporated into specific analytic activities, operational reporting, databases, or raw data extraction .
A. Advantages of Big Data Analytics
For companies seeking to harness the power of data to drive business outcomes, big data analytics has become a crucial instrument. The following are some benefits of big data analytics:
Decision-making is improved thanks to big data analytics, which give businesses insights into consumer behavior, market patterns, and other important data elements. Organizations can find patterns and trends that would be difficult to find through manual analysis by studying big datasets.
V. BIG DATAT MANAGEMENT
The demands of big data cannot be met by present technology, and the rate of storage capacity expansion is substantially slower than the data growth rate. Consequently, a revolutionary redesign of the information framework is essential. For this, we must develop a hierarchical storage architecture. Existing efficient algorithms do not effectively manage heterogeneous data; thus, it is necessary to build a highly efficient algorithm to manage heterogeneous data effectively .
A. Security in Big Data is Essential
Many businesses use big data, yet they may need more security-related assets. If there is a security danger to big data, it may result in an even more significant problem. Companies utilise this technology to store petabyte-scale data on the firm, its business, and its customers. That has a significant impact on the categorisation of information. We must either encrypt it, log it, or use honeypot tactics to safeguard the data. The difficulty of identifying threats and malicious intruders must be resolved through big data analysis techniques .
B. Extensive data Analysis and Computation
Speed is the most crucial factor when searching large datasets. However, the procedure may be time-consuming because it needs to explore all linked entries in the database quickly. While big data is becoming more complex, the indexes within big data target the most specific data types. The conventional serial technique could be more efficient for such a large data set .
VI. RISK AND CHALLENGES
Big data and cloud processing have many advantages, but they also have their share of dangers and difficulties. The following are some dangers and difficulties associated with large data in cloud computing:
Big Data is not a new concept, but it has recently come to the forefront due to the daily production of vast quantities of data from many sources. Our investigation revealed that big data is expanding rapidly, resulting in both advantages and concerns. Cloud computing is the ideal method for storing, processing, and analysing Big Data. The capacity to store vast volumes of data in a variety of formats and to analyse it at very high rates will provide data that can assist companies and educational institutions in their rapid development. The article provided an overview of Big Data and Cloud Computing, including its basic concepts and terminology, as well as the evolution of data management into cloud computing. As a bonus, it investigates the upsides and downsides of combining big data with cloud computing. Data storage and processing power are significant perks of cloud computing and extensive data integration; the cloud has access to a vast pool of resources and a variety of infrastructures that can accommodate this integration in the most suitable manner possible. The environment can be set up and managed with minimal effort to provide an excellent workspace for all extensive data requirements.
 Neelay Jagani, Parthil Jagani, Suril Shah et al (2021) Big Data in Cloud Computing: A Literature Review. International Journal of Engineering Applied Sciences & Technology 5(11):185-191  Samir A. El-Seoud, Hosam F. El-Sofany, Mohamed Abdelfattah, Reham Mohamed et al (2017) Big Data and Cloud Computing: Trends and Challenges. International Journal of Interactive Mobile Technologies 11(2):34  Amanpreet Kaur Sandhu (2021) Big Data with Cloud Computing: Discussion and Challeneges. Big Data Mining and Analytics 5(1):32-40  Venkatesh H, Shrivatsa D Perur, Nivedita Jalihal et al (2015) A Study On Use of Big Data in Cloud Computing Environment. International Journal of Computer Science and Information Technologies 6(3):2076-2078  Pedro Caldeira Neves, Bradley Schmerl, Jorge Bernardino, Javier Camara et al (2016) Big Data in Cloud Computing: Features and Issues. International Conference on Internet of Things and Big Data 307-314  T. Sri Harsha (2017) Big Data Analytics in Cloud Computing Environment. International Journal of Scientific & Engineering Research 8(8):393-398  Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, Abdullah Gani, Samee Ullah khan et al (2015) The rise of “big data” on cloud computing: Review and open research issues. Information Systems 47:98-115  Subia Saif, Samar Wazir (2018) Performance Analysis of Big Data and Cloud Computing Techniques: A Survey. International Conference on Computational Intelligence and Data Science 132:118-127  Shahana PN (2022) Impact and Implications of Big Data Analytics in Cloud Computing Platforms. International Journal for Research in Applied Science and Engineering Technology 10(5)  Md. Golam Morshed, Ling Yuan (2017) Big Data in Cloud Computing: An Analysis of Issues and Challenges. International Journal of Advanced Studies in Computer Science and Engineering 6(4):7-11  Hassan Sohail, Zeenia Zameer, Hafiz Farhan Ahmed, Usama Iqbal, Pir Amad Ali Shah et al (2017) Challenges and Opportunities in Big Data and Cloud Computing. ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 175-181  Chaowei Yang, Qunying Huang, Zhenlong Li, Kai Liu, Fei Hu et al (2017) Big Data and Cloud Computing: Innovation Opportunities and Challenges. International Journal of Digital Earth 10(1):13-53  Venkata Narasimha Inukollu, Sailaja Arsi, Srinivasa Rao Ravuri et al (2014) Security Issues Associated with Big Data in Cloud Computing. International Journal of Network Security & its Applications 6(3):45-56  Jinsong Zhang (2018) Applications and Challenges of Big Data and Cloud Computing in Power Industry. International Symposium on Communication Engineering & Computer Science 86:119-122  Manoj Muniswamaiah, Dr. Tilak Agerwala, Dr. Charles Tappert et al (2019) Challenges of Big Data Applications in Cloud Computing. CS&IT-CSCP:221-232  P. Mandana Mohan, B. Murali Manohar (2021) Challenges in Big Data Analytics & Cloud Computing. International Journal of Business and Management Research 9(2):156-161  Bo Li (2022) Research Review of Cloud Computing Technology Based on Big Data. Conference on Image Processing , Electronics and Computers 198-201  Blend Berisha, Endrit Meziu, Isak Shabani et al (2022) Big Data Analytics in Cloud Computing: An Overview. J Cloud Comput 11(1):24  Jayaraj T, J. Abdul Samath (2020) Secure and Cost-Effective Big-Data Analysis in Cloud Computing. International Journal of Scientific & Technology Research 9(2):3717-3720  Mythreyee S, Poornima Purohit, Apoorva D.R , Harshitha R, Lathashree P.V et al (2017) A Study On Use of Big Data in Cloud Computing Environment. International Journal of Advance Research , Ideas and Innovations in Technology 3(3):1312-1318
Copyright © 2023 Shiva Sharma, Simranjeet Singh, Shyam Sharma, Neerja Negi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.