Data Mining in Science and Technology
This article provides a comprehensive review from historical development of data mining to its applications in various fields of science and technology.
Summary:?In Science and Technology, huge amounts of data are collected and stored in computers so that the useful information could be extracted later on. Sometimes it is not known at the time of data collection what data will later be requested, therefore the database is not designed to distill any particular information, and rather it is, to large extent, unstructured. The science of information from large collection of data sets is referred to as “Data Mining”, sometimes called “Knowledge Discovery”. This paper provides a comprehensive review from historical development of data mining to its applications in various fields of science and technology.
Introduction:?Scientists, refer the 21st?century as the age of data. Technological advances in science and technology have enabled us to collect large amounts of data in fields such as signals, images, texts, spatial and other complex data [22]. This data set arises in diverse fields such as financial markets, meteorology, medical imaging, remote sensing, physics, chemistry, material sciences, astronomy, bioinformatics etc. These can be obtained from simulations, experiments or observations. Data Mining is the process concerned with uncovering patterns, associations, anomalies, significant features and unstructured data.
It is a multidisciplinary field borrowing and enhancing ideas from different domains including image processing, signal processing, machine learning, optimization, high performance computing, information retrieval and computer vision. It holds the promise of helping scientific community and technology in the analysis of massive, complex data sets, enabling them to make a reasonable decisions and discoveries after gaining fundamental insights.
2.?Developmental History of Data Mining:?Data Mining emerged about 60 years ago with joint work of mathematicians, statisticians, logicians, and computer scientists to create artificial intelligence and machine learning.
The term Data Mining was started during 1960’s when the artificial intelligence and statisticians practitioners developed new algorithms such as regression analysis, maximum likelihood estimates, neural network etc. In this decade the field of information retrieval made its contribution in the forms of clustering techniques and similarities measures at the time these techniques where applied to text documents but they would later be utilized when mining data in databases and other large distributed dataset. By the end of 1960’s information retrieval and database systems where developing in parallel.
In 1971, Gerard Salton published his work on the SMART information retrieval system this represented a new approach to information retrieval which utilized the algebra based vector space model (VSM). This was proved very important in the data mining toolkit.
During 1970-1990’s the confluence of discipline (Artificial Intelligence, Information Retrieval, Statistics and Database systems) and the availability of fast micro computers opened possibilities for retrieving and analyzing data. In 1977, the journal “Knowledge Discovery and Data Mining” was launched which focuses advances in data collection methods distribution to need for computation methods and its techniques to add in data analysis.
In early 1990’s the huge volume of data available had made essence for new techniques to handle quantities of information much of it was located in huge databases, during this decade data mining changed from being and interesting new technology becoming part of standard business practice.
In 2001, William S. Clevelan published “Data Science”: An Action Plan for Expanding the Technical Areas of the Field of Statistics”. It is a plan “to enlarge the major areas of technical work of the field of statistics. In 2003, The Journal of Data Science was launched: “By ‘Data Science’ we mean almost everything that has something to do with data: Collecting, analyzing, modeling etc.
Nowadays Data Mining is applied to many industries and sectors such as retail, medical, telecommunications, banking, finance, pharmaceuticals, marketing etc.
3. What is Data Mining and knowledge Discovery?
With the enormous amount of data stored in files, databases, and other repositories, it is increasingly important, if not necessary, to develop powerful means for analysis and perhaps interpretation of such data and for the extraction of interesting knowledge that could help in decision-making.
Data Mining also known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process.
Witten and Frank?defines “Data Mining refers to the process of finding the interesting patterns in the data that are not explicitly part of the data”. The interesting patterns can be used to tell us something new and to make predictions. The process of data mining is composed of several steps including selecting data to analyze, preparing the data mining algorithms, and then interpreting and evaluating the results.
?Data Mining or the term Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps:
The first four steps are the different forms of data preprocessing, where the data are prepared for mining. The data mining steps may interact with the user. The interesting patterns are presented to user and may be stored as new knowledge in the knowledge base. Data Mining is only one step in the entire process, although an essential one because it uncovers hidden patterns for evaluation.
领英推荐
?4.?Data Mining Methods
There are different types of data mining methods such as summarization, classification, clustering, regression, dependency modeling and change and deviation detection.
?5.?Application of Data Mining in Science and Technology
The field of data mining has been growing rapidly due to its broad applicability, achievements and scientific progress, understanding. A number of data mining applications have been in various domains such as astronomy, remote sensing, Medical science, security and surveillance, computer simulation, information retrieval, chemistry etc.
5.1. Astronomy:?Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th?century, this tradition is continuing today, and at an ever increasing rate. Astronomers classically have focused on clustering and classification problems as standard practice in the research.
5.2 Remote Sensing:?In recent years, with the development of remote sensing and data storage technique, a great number of image data are generated every day. Remote sensing images, whether obtained from satellites or aerial photography, are very rich source of data analysis.
Remote sensing systems play an important role in monitoring the earth, (global climate change detection through the identification of deforestation and global warming; yield prediction in agriculture; land use mapping for urban growth; resource exploration for minerals and natural gas; as well as military surveillance and reconnaissance for the purpose of tactical assessment), identification of man-made structures (such as building, roads, bridges, airports, etc), metrological (to analyze and predict typhoons using satellite images that capture the cloud the cloud patterns of the typhoon), high resolution satellite imaginary from different sensors.
5.3 Medical science:?In recent years, Data Mining has been widely used in area of Medical science such as Biomedical, DNA, Genetics and Medicine etc. In the area of Genetics, the important goal is to understand the mapping relationship between the variation in human DNA sequences and the disease susceptibility. Data Mining is very important tool to help improve the diagnosis, prevention and treatment of the diseases.
Image processing also play an important role in biomedical Data Mining such as Electrocardiogram (ECG), Electroencephalogram (EEG), Magnetic Resonance Image (MRI), functional magnetic resonance imaging (fMRI). Image processing also helps to present complex genes structure in graphs, trees and chains. The visual representation helps to better understanding of complex genes structures, for knowledge discovery and data exploration.
5.4 Security and Surveillance:?Another broad and emerging area of research in data mining techniques is security and surveillance. The field of privacy-preserving data mining (PPDM) has been around for seven years including diverse applications as biometrics with fingerprints, iris face, signature, and voice recognition; automated target recognition in aerial and satellite imagery; video surveillance; and network intrusion detection etc.
5.5 Computer Simulation:?Computer simulation often generates large data sets whose sheer size and complexity make them difficult to analyze. There are many different ways such as detection of coherent structures, dimension reduction, code validation understanding simulations etc, in which data mining playing an important role in the analysis of simulation data sets.
5.6 Information Retrieval:?Information retrieval research involves techniques from machine learning and other theoretical models, together with extensive experimentation to develop more accurate, fast and advanced information retrieval and search techniques for a variety of applications such as?Retrieval Models,?New features,?Optimization and Learning, Measurement and effectiveness.
5.7 Chemistry:?The analysis of data sets is one of the most important tasks in the investigation of properties of chemical compounds. Especially in drug design, methods are used to characterize complete sets of chemical compounds instead of describing individual molecules.?Data mining, i.e. the exploration of large amounts of data in search for consistent patterns, correlations and other systematic relationships, can be a helpful tool to evaluate "hidden" information in a set of molecules.
Data Mining Service - Chemistry (DMSC) is a project for the development of a centralized service for the exploration of chemical data sets. With this service it will be possible to analyze chemical data sets for molecular patterns and systematic relationships using the following methods:
DMSC opens a new way of chemical information processing using the newest WWW techniques to visualize complex trends, patterns and relationships in chemical datasets in a most effective way.
6.?Conclusion:?In this paper we briefly reviewed the historical development of data mining and its various applications in science and technology. Though very few areas are named here in this paper, yet they are those which are commonly forgotten. This paper provides a new perspective of a researcher regarding applications of data mining in science and technology.
Student at Netaji Subhas Open University
2 年Interesting
Student at Netaji Subhas University
2 年Science ??
Operations Executive@ PyNet Labs India
2 年thanks