Data Mining to Discovery of Knowledge

Authors: Aman Gupta, Ashish Jadhav

DOI Link: https://doi.org/10.22214/ijraset.2022.43643

Abstract

Introduction

I. INTRODUCTION

Data mining and discovery of knowledge have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.

At an abstract level, the DISCOVERY OF KNOWLEDGE field is concerned with the development of methods and techniques for making sense of data. The basic problem addressed by the DISCOVERY OF KNOWLEDGE process is one of mapping low-level data (which are typically too voluminous to understand and digest easily) into other forms that might be more compact (for example, a short report), more abstract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for example, a predictive model for estimating the value of future cases). At the core of the process is the application of specific data-mining methods for pattern discovery and extraction.

By discussing the historical context of DISCOVERY OF KNOWLEDGE and data mining and their intersection with other related fields. A brief summary of recent DISCOVERY OF KNOWLEDGE real-world applications is provided. Definitions of DISCOVERY OF KNOWLEDGE and data mining are provided, and the general multistep DISCOVERY OF KNOWLEDGE process is outlined. This multistep process has the application of data-mining algorithms as one particular step in the process. The data-mining step is discussed in more detail in the context of specific data-mining algorithms and their application. Real-world practical application issues are also outlined. Finally, the article enumerates challenges for future research and development and in particular discusses potential opportunities for AI technology in DISCOVERY OF KNOWLEDGE systems.

II. WHY DO WE NEED DISCOVERY OF KNOWLEDGE?

The traditional method of turning data into knowledge relies on manual analysis and interpretation. For example, in the health-care industry, it is common for specialists to periodically analyze current trends and changes in health-care data, say, on a quarterly basis. The specialists then provide a report detailing the analysis to the sponsoring health-care organization; this report becomes the basis for future decision making and planning for health-care management. In a totally different type of application, planetary geologists sift through remotely sensed images of planets and asteroids, carefully locating and cataloguing such geologic objects of interest as impact craters. Be it science, marketing, finance, health care, retail, or any other field, the classical approach to data analysis relies fundamentally on one or more analysts becoming intimately familiar with the data and serving as an interface between the data and the users and products.

The need to scale up human analysis capabilities to handling the large number of bytes that we can collect is both economic and scientific. Businesses use data to gain competitive advantage, increase efficiency, and provide more valuable services to customers. Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Because computers have enabled humans to gather more data than we can digest, it is only natural to turn to computational techniques to help us unearth meaningful patternsand structures from the massive volumes of data. Hence, DISCOVERY OF KNOWLEDGE is an attempt to address a problem that the digital information era made a fact of life for all of us: data overload.

III. DATA MINING AND DISCOVERY TO KNOWLEDGE

Historically, the notion of finding useful patterns in data has been given a variety of names, including data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing. The term data mining has mostly been used by statisticians, data analysts, and the management information systems (MIS) communities. It has also gained popularity in the database field. The phrase knowledge discovery in databases was coined at the first DISCOVERY OF KNOWLEDGE workshop in 1989 (Piatetsky-Shapiro 1991) to emphasize that knowledge is the end product of a data-driven discovery. It has been popularized in the AI and machine-learning fields.

In our view, DISCOVERY OF KNOWLEDGE refers to the overall process of discovering useful knowledge from data, and data mining refers to a particular step in this process. Data mining is the application of specific algorithms for extracting patterns from data. The distinction between the DISCOVERY OF KNOWLEDGE process and the data-mining step (within the process) is a central point of this article. The additional steps in the DISCOVERY OF KNOWLEDGE process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining, are essential to ensure that useful knowledge is derived from the data. Blind application of data-mining methods (rightly criticized as data dredging in the statistical literature) can be a dangerous activity, easily leading to the discovery of meaningless and invalid patterns. The Interdisciplinary Nature of DISCOVERY OF KNOWLEDGE DISCOVERY OF KNOWLEDGE has evolved, and continues to evolve, from the intersection of research fields such as machine learning, pattern recognition, databases, statistics, AI, knowledge acquisition for expert systems, data visualization, and high-performance computing. The unifying goal is extracting high-level knowledge from Rlow-level data in the context of large data sets.

IV. THE DISCOVERY OF KNOWLEDGE PROCESS

The DISCOVERY OF KNOWLEDGE process is interactive and iterative,involving numerous steps with many decisions made by the user. Brachman and Anand (1996) give a practical view of the DISCOVERY OF KNOWLEDGE process, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps: First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the DISCOVERY OF KNOWLEDGE process from the customer’s viewpoint. Second is creating a target data set: selecting a data set, or focusing on a subset of variables or data samples, on which discovery is to be performed. Third is data cleaning and preprocessing. Basic operations include removing noise if appropriate, collecting the necessary information to model or account for noise, deciding on strategies for handling missing data fields, and accounting for time-sequence information and known changes. Fourth is data reduction and projection: finding useful features to represent the data depending on the goal of the task. With dimensionality reduction or transformation methods, the effective number of variables under consideration can be reduced, or invariant representations for the data can be found.

V. APPLICATIONS

The investigation of distinguishing local area structures in multi-facet networks is encountering a bloom somewhat recently. Important explores cover different angles among our everyday existence, for example, breaking down persuasive clients in numerous social stages (Al-Garadi et al. 2018), tracking down association of proteins in a natural framework (Gosak et al. 2018) and overseeing metropolitan transportation framework with different traffic habits (Liu et al. 2019), and so on. The accompanying subsections summed up uses of local area location by means of a multi-facet network system.

VI. MAIN CONTRIBUTIONS

There have been various endeavors to address local area identification issue in multi-facet networks through different methodologies, e.g., distinguishing networks in worldly organizations by seclusion expansion (Bazzi et al. 2016), where the creators underscore the contrast between "invalid organizations" and "invalid models" in particularity expansion and examine the impact of interlayer edges on the multi-facet seclusion boost issue. De Bacco et al. (2017) propose a generative model for multi-facet organizations, which can be utilized to total layers into bunches or to pack a dataset by recognizing particularly pertinent or repetitive layers. The proposed model is fit for integrating local area discovery and connection expectation for multi-facet organizations, and trial results on both engineered and genuine world datasets shows its plausibility. Breaking down multi-facet networks is vital on the grounds that many intriguing examples can't be gotten by investigating single-layer organizations. That is our inspiration for summing up these methodologies. The commitments of this work are:

A. We construct a scientific classification of local area discovery strategies in view of different procedures utilized.

B. We give a nitty gritty review of works that go under various classifications.

C. The assessment measures for local area results are ordered and summed up.

D. The uses of local area discovery in multi-facet networks are presented, as well as fascinating headings for future works.

Supposedly, this is the most recent work that gives a complete overview on different local area discovery strategies in multi-facet organizations.

Conclusion

Here are numerous Data Mining techniques from which one can decided for mining the arising clinical information bases and vaults. In this part, we have assessed most famous ones, furthermore, gave a few pointers where they have been applied. Notwithstanding the potential and promising approaches, the utility of Data Mining techniques to dissect clinical informational indexes is as yet meager, particularly when contrasted with traditional measurable methodologies. It is making progress, be that as it may, in the areas where information is went with information bases, and where information storehouses putting away heterogenous information from various sources took ground.

References

[1] Bazzan, A. L., Engel, P. M., Schroeder, L. F., and da Silva, S. C. (2002). Automated annotation of keywords for proteins related to mycoplasmataceae using machine learning techniques. Bioinformatics, 18:35S–43S. [2] Frank, E., Holmes, G., Kirkby, R., and Hall, M. (2002). Racing committees for large datasets. In Proceedings of the International Conference on Discovery Science, pages 153–164. Springer-Verlag. [3] Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., and Nevill-Manning, C. G. (1999). Domain-specifific keyphrase extraction. In Proceedings of the 16th International Joint Conference on Artifificial Intelligence, pages 668–673. Morgan Kaufmann. [4] Holmes, G., Cunningham, S. J., Rue, B. D., and Bollen, F. (1998). Predicting apple bruising using machine learning. Acta Hort, 476:289–296. [5] Holmes, G. and Hall, M. (2002). A development environment for predictive modelling in foods. International Journal of Food Microbiology, 73:351–362. [6] Holmes, G., Kirkby, R., and Pfahringer, B. (2003). Mining data streams using option trees. Technical Report 08/03, Department of Computer Science, University of Waikato. [7] Kusabs, N., Bollen, F., Trigg, L., Holmes, G., and Inglis, S. (1998). Objective measurement of mushroom quality. In Proc New Zealand Institute of Agricultural Science and the New Zealand Society for Horticultural Science Annual Convention, page 51. [8] Li, J., Liu, H., Downing, J. R., Yeoh, A. E.-J., and Wong, L. (2003). Simple rules underlying gene expression profifiles of more than six subtypes of acute lymphoblastic leukemia (all) patients. Bioinformatics, 19:71–78. [9] McQueen, R., Holmes, G., and Hunt, L. (1998). User satisfaction with machine learning as a data analysis method in agricultural research. New Zealand Journal of Agricultural Research, 41(4):577–584.

Copyright

Copyright © 2022 Aman Gupta, Ashish Jadhav. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET43643

Publish Date : 2022-05-31

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here