Data Mining with R

Author: Luis Torgo
Publisher: CRC Press
ISBN: 1482234890
Release Date: 2016-12-02
Genre:

Data Mining with R: Learning with Case Studies, Second Editionuses practical examples to illustrate the power of R and data mining. Providing an extensive update to the best-selling first edition, this new edition is divided into two parts. The first part will feature introductory material, including a new chapter that provides an introduction to data mining, to complement the already existing introduction to R. The second part includes case studies, and the new edition strongly revises the R code of the case studies making it more up-to-date with recent packages that have emerged in R. The book does not assume any prior knowledge about R. Readers who are new to R and data mining should be able to follow the case studies, and they are designed to be self-contained so the reader can start anywhere in the document. The book is accompanied by a set of freely available R source files that can be obtained at the book's web site. These files include all the code used in the case studies, and they facilitate the "do-it-yourself" approach followed in the book. Designed for users of data analysis tools, as well as researchers and developers, the book should be useful for anyone interested in entering the "world" of R and data mining. About the Author Luís Torgo is an associate professor in the Department of Computer Science at the University of Porto in Portugal. Heãeeteaches Data Mining in R in the NYU Stern School of Business' MS in Business Analytics program. An active researcher in machine learning and data mining for more than 20 years, Dr. Torgo is also a researcher in the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) of INESC Porto LA.

Data Mining with R

Author: Luis Torgo
Publisher: Chapman and Hall/CRC
ISBN: 1439810184
Release Date: 2010-11-09
Genre: Business & Economics

The versatile capabilities and large set of add-on packages make R an excellent alternative to many existing and often expensive data mining tools. Exploring this area from the perspective of a practitioner, Data Mining with R: Learning with Case Studies uses practical examples to illustrate the power of R and data mining. Assuming no prior knowledge of R or data mining/statistical techniques, the book covers a diverse set of problems that pose different challenges in terms of size, type of data, goals of analysis, and analytical tools. To present the main data mining processes and techniques, the author takes a hands-on approach that utilizes a series of detailed, real-world case studies: Predicting algae blooms Predicting stock market returns Detecting fraudulent transactions Classifying microarray samples With these case studies, the author supplies all necessary steps, code, and data. Web Resource A supporting website mirrors the do-it-yourself approach of the text. It offers a collection of freely available R source files that encompass all the code used in the case studies. The site also provides the data sets from the case studies as well as an R package of several functions.

Data Mining with R

Author: Luis Torgo
Publisher: CRC Press
ISBN: 9781315399096
Release Date: 2016-11-30
Genre: Business & Economics

Data Mining with R: Learning with Case Studies, Second Edition uses practical examples to illustrate the power of R and data mining. Providing an extensive update to the best-selling first edition, this new edition is divided into two parts. The first part will feature introductory material, including a new chapter that provides an introduction to data mining, to complement the already existing introduction to R. The second part includes case studies, and the new edition strongly revises the R code of the case studies making it more up-to-date with recent packages that have emerged in R. The book does not assume any prior knowledge about R. Readers who are new to R and data mining should be able to follow the case studies, and they are designed to be self-contained so the reader can start anywhere in the document. The book is accompanied by a set of freely available R source files that can be obtained at the book’s web site. These files include all the code used in the case studies, and they facilitate the "do-it-yourself" approach followed in the book. Designed for users of data analysis tools, as well as researchers and developers, the book should be useful for anyone interested in entering the "world" of R and data mining. About the Author Luís Torgo is an associate professor in the Department of Computer Science at the University of Porto in Portugal. He teaches Data Mining in R in the NYU Stern School of Business’ MS in Business Analytics program. An active researcher in machine learning and data mining for more than 20 years, Dr. Torgo is also a researcher in the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) of INESC Porto LA.

Handbook of Educational Data Mining

Author: Cristobal Romero
Publisher: CRC Press
ISBN: 1439804583
Release Date: 2010-10-25
Genre: Business & Economics

Handbook of Educational Data Mining (EDM) provides a thorough overview of the current state of knowledge in this area. The first part of the book includes nine surveys and tutorials on the principal data mining techniques that have been applied in education. The second part presents a set of 25 case studies that give a rich overview of the problems that EDM has addressed. Researchers at the Forefront of the Field Discuss Essential Topics and the Latest Advances With contributions by well-known researchers from a variety of fields, the book reflects the multidisciplinary nature of the EDM community. It brings the educational and data mining communities together, helping education experts understand what types of questions EDM can address and helping data miners understand what types of questions are important to educational design and educational decision making. Encouraging readers to integrate EDM into their research and practice, this timely handbook offers a broad, accessible treatment of essential EDM techniques and applications. It provides an excellent first step for newcomers to the EDM community and for active researchers to keep abreast of recent developments in the field.

Data Mining

Author: Richard J. Roiger
Publisher: CRC Press
ISBN: 9781498763981
Release Date: 2017-01-06
Genre: Business & Economics

Data Mining: A Tutorial-Based Primer, Second Edition provides a comprehensive introduction to data mining with a focus on model building and testing, as well as on interpreting and validating results. The text guides students to understand how data mining can be employed to solve real problems and recognize whether a data mining solution is a feasible alternative for a specific problem. Fundamental data mining strategies, techniques, and evaluation methods are presented and implemented with the help of two well-known software tools. Several new topics have been added to the second edition including an introduction to Big Data and data analytics, ROC curves, Pareto lift charts, methods for handling large-sized, streaming and imbalanced data, support vector machines, and extended coverage of textual data mining. The second edition contains tutorials for attribute selection, dealing with imbalanced data, outlier analysis, time series analysis, mining textual data, and more. The text provides in-depth coverage of RapidMiner Studio and Weka’s Explorer interface. Both software tools are used for stepping students through the tutorials depicting the knowledge discovery process. This allows the reader maximum flexibility for their hands-on data mining experience.

Privacy Aware Knowledge Discovery

Author: Francesco Bonchi
Publisher: CRC Press
ISBN: 9781439803660
Release Date: 2010-12-02
Genre: Computers

Covering research at the frontier of this field, Privacy-Aware Knowledge Discovery: Novel Applications and New Techniques presents state-of-the-art privacy-preserving data mining techniques for application domains, such as medicine and social networks, that face the increasing heterogeneity and complexity of new forms of data. Renowned authorities from prominent organizations not only cover well-established results—they also explore complex domains where privacy issues are generally clear and well defined, but the solutions are still preliminary and in continuous development. Divided into seven parts, the book provides in-depth coverage of the most novel reference scenarios for privacy-preserving techniques. The first part gives general techniques that can be applied to various applications discussed in the rest of the book. The second section focuses on the sanitization of network traces and privacy in data stream mining. After the third part on privacy in spatio-temporal data mining and mobility data analysis, the book examines time series analysis in the fourth section, explaining how a perturbation method and a segment-based method can tackle privacy issues of time series data. The fifth section on biomedical data addresses genomic data as well as the problem of privacy-aware information sharing of health data. In the sixth section on web applications, the book deals with query log mining and web recommender systems. The final part on social networks analyzes privacy issues related to the management of social network data under different perspectives. While several new results have recently occurred in the privacy, database, and data mining research communities, a uniform presentation of up-to-date techniques and applications is lacking. Filling this void, Privacy-Aware Knowledge Discovery presents novel algorithms, patterns, and models, along with a significant collection of open problems for future investigation.

Computational Business Analytics

Author: Subrata Das
Publisher: CRC Press
ISBN: 9781439890707
Release Date: 2013-12-14
Genre: Business & Economics

Learn How to Properly Use the Latest Analytics Approaches in Your Organization Computational Business Analytics presents tools and techniques for descriptive, predictive, and prescriptive analytics applicable across multiple domains. Through many examples and challenging case studies from a variety of fields, practitioners easily see the connections to their own problems and can then formulate their own solution strategies. The book first covers core descriptive and inferential statistics for analytics. The author then enhances numerical statistical techniques with symbolic artificial intelligence (AI) and machine learning (ML) techniques for richer predictive and prescriptive analytics. With a special emphasis on methods that handle time and textual data, the text: Enriches principal component and factor analyses with subspace methods, such as latent semantic analyses Combines regression analyses with probabilistic graphical modeling, such as Bayesian networks Extends autoregression and survival analysis techniques with the Kalman filter, hidden Markov models, and dynamic Bayesian networks Embeds decision trees within influence diagrams Augments nearest-neighbor and k-means clustering techniques with support vector machines and neural networks These approaches are not replacements of traditional statistics-based analytics; rather, in most cases, a generalized technique can be reduced to the underlying traditional base technique under very restrictive conditions. The book shows how these enriched techniques offer efficient solutions in areas, including customer segmentation, churn prediction, credit risk assessment, fraud detection, and advertising campaigns.

The Essentials of Data Science Knowledge Discovery Using R

Author: Graham J. Williams
Publisher: CRC Press
ISBN: 9781351647496
Release Date: 2017-07-28
Genre: Business & Economics

The Essentials of Data Science: Knowledge Discovery Using R presents the concepts of data science through a hands-on approach using free and open source software. It systematically drives an accessible journey through data analysis and machine learning to discover and share knowledge from data. Building on over thirty years’ experience in teaching and practising data science, the author encourages a programming-by-example approach to ensure students and practitioners attune to the practise of data science while building their data skills. Proven frameworks are provided as reusable templates. Real world case studies then provide insight for the data scientist to swiftly adapt the templates to new tasks and datasets. The book begins by introducing data science. It then reviews R’s capabilities for analysing data by writing computer programs. These programs are developed and explained step by step. From analysing and visualising data, the framework moves on to tried and tested machine learning techniques for predictive modelling and knowledge discovery. Literate programming and a consistent style are a focus throughout the book.

Data Clustering in C

Author: Guojun Gan
Publisher: CRC Press
ISBN: 9781439862247
Release Date: 2011-03-28
Genre: Business & Economics

Data clustering is a highly interdisciplinary field, the goal of which is to divide a set of objects into homogeneous groups such that objects in the same group are similar and objects in different groups are quite distinct. Thousands of theoretical papers and a number of books on data clustering have been published over the past 50 years. However, few books exist to teach people how to implement data clustering algorithms. This book was written for anyone who wants to implement or improve their data clustering algorithms. Using object-oriented design and programming techniques, Data Clustering in C++ exploits the commonalities of all data clustering algorithms to create a flexible set of reusable classes that simplifies the implementation of any data clustering algorithm. Readers can follow the development of the base data clustering classes and several popular data clustering algorithms. Additional topics such as data pre-processing, data visualization, cluster visualization, and cluster interpretation are briefly covered. This book is divided into three parts-- Data Clustering and C++ Preliminaries: A review of basic concepts of data clustering, the unified modeling language, object-oriented programming in C++, and design patterns A C++ Data Clustering Framework: The development of data clustering base classes Data Clustering Algorithms: The implementation of several popular data clustering algorithms A key to learning a clustering algorithm is to implement and experiment the clustering algorithm. Complete listings of classes, examples, unit test cases, and GNU configuration files are included in the appendices of this book as well as in the CD-ROM of the book. The only requirements to compile the code are a modern C++ compiler and the Boost C++ libraries.

Advances in Machine Learning and Data Mining for Astronomy

Author: Michael J. Way
Publisher: CRC Press
ISBN: 9781439841730
Release Date: 2012-03-29
Genre: Computers

Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The book’s introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

Spectral Feature Selection for Data Mining

Author: Zheng Alan Zhao
Publisher: CRC Press
ISBN: 9781439862100
Release Date: 2011-12-14
Genre: Business & Economics

Spectral Feature Selection for Data Mining introduces a novel feature selection technique that establishes a general platform for studying existing feature selection algorithms and developing new algorithms for emerging problems in real-world applications. This technique represents a unified framework for supervised, unsupervised, and semisupervised feature selection. The book explores the latest research achievements, sheds light on new research directions, and stimulates readers to make the next creative breakthroughs. It presents the intrinsic ideas behind spectral feature selection, its theoretical foundations, its connections to other algorithms, and its use in handling both large-scale data sets and small sample problems. The authors also cover feature selection and feature extraction, including basic concepts, popular existing algorithms, and applications. A timely introduction to spectral feature selection, this book illustrates the potential of this powerful dimensionality reduction technique in high-dimensional data processing. Readers learn how to use spectral feature selection to solve challenging problems in real-life applications and discover how general feature selection and extraction are connected to spectral feature selection.

Computational Methods of Feature Selection

Author: Huan Liu
Publisher: CRC Press
ISBN: 1584888792
Release Date: 2007-10-29
Genre: Computers

Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational statistics, pattern recognition, machine learning, data mining, and knowledge discovery. Highlighting current research issues, Computational Methods of Feature Selection introduces the basic concepts and principles, state-of-the-art algorithms, and novel applications of this tool. The book begins by exploring unsupervised, randomized, and causal feature selection. It then reports on some recent results of empowering feature selection, including active feature selection, decision-border estimate, the use of ensembles with independent probes, and incremental feature selection. This is followed by discussions of weighting and local methods, such as the ReliefF family, k-means clustering, local feature relevance, and a new interpretation of Relief. The book subsequently covers text classification, a new feature selection score, and both constraint-guided and aggressive feature selection. The final section examines applications of feature selection in bioinformatics, including feature construction as well as redundancy-, ensemble-, and penalty-based feature selection. Through a clear, concise, and coherent presentation of topics, this volume systematically covers the key concepts, underlying principles, and inventive applications of feature selection, illustrating how this powerful tool can efficiently harness massive, high-dimensional data and turn it into valuable, reliable information.

Music Data Mining

Author: Tao Li
Publisher: CRC Press
ISBN: 9781439835524
Release Date: 2011-07-12
Genre: Business & Economics

The research area of music information retrieval has gradually evolved to address the challenges of effectively accessing and interacting large collections of music and associated data, such as styles, artists, lyrics, and reviews. Bringing together an interdisciplinary array of top researchers, Music Data Mining presents a variety of approaches to successfully employ data mining techniques for the purpose of music processing. The book first covers music data mining tasks and algorithms and audio feature extraction, providing a framework for subsequent chapters. With a focus on data classification, it then describes a computational approach inspired by human auditory perception and examines instrument recognition, the effects of music on moods and emotions, and the connections between power laws and music aesthetics. Given the importance of social aspects in understanding music, the text addresses the use of the Web and peer-to-peer networks for both music data mining and evaluating music mining tasks and algorithms. It also discusses indexing with tags and explains how data can be collected using online human computation games. The final chapters offer a balanced exploration of hit song science as well as a look at symbolic musicology and data mining. The multifaceted nature of music information often requires algorithms and systems using sophisticated signal processing and machine learning techniques to better extract useful information. An excellent introduction to the field, this volume presents state-of-the-art techniques in music data mining and information retrieval to create novel ways of interacting with large music collections.

Mining Software Specifications

Author: David Lo
Publisher: CRC Press
ISBN: 9781439806272
Release Date: 2011-05-24
Genre: Computers

An emerging topic in software engineering and data mining, specification mining tackles software maintenance and reliability issues that cost economies billions of dollars each year. The first unified reference on the subject, Mining Software Specifications: Methodologies and Applications describes recent approaches for mining specifications of software systems. Experts in the field illustrate how to apply state-of-the-art data mining and machine learning techniques to address software engineering concerns. In the first set of chapters, the book introduces a number of studies on mining finite state machines that employ techniques, such as grammar inference, partial order mining, source code model checking, abstract interpretation, and more. The remaining chapters present research on mining temporal rules/patterns, covering techniques that include path-aware static program analyses, lightweight rule/pattern mining, statistical analysis, and other interesting approaches. Throughout the book, the authors discuss how to employ dynamic analysis, static analysis, and combinations of both to mine software specifications. According to the US National Institute of Standards and Technology in 2002, software bugs have cost the US economy 59.5 billion dollars a year. This volume shows how specification mining can help find bugs and improve program understanding, thereby reducing unnecessary financial losses. The book encourages the industry adoption of specification mining techniques and the assimilation of these techniques in standard integrated development environments (IDEs).

Data Clustering

Author: Charu C. Aggarwal
Publisher: CRC Press
ISBN: 9781498785778
Release Date: 2016-04-08
Genre: Business & Economics

Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering process—including how to verify the quality of the underlying clusters—through supervision, human intervention, or the automated generation of alternative clusters.