Prediction Science - Approach

ToC

1.  Introduction

2.  Business Drivers

3.  Market Trend

4.  Popular Algorithms

5.  Planning The Strategy


1.  Introduction

Data science as a field has many dimensions and applications. In science, we are familiar that, we understand the features, behavior patterns and meaningful sights by formulating reusable and established formulas. In similar way from data too we understand the behavior patterns, meaningful sights, through engineering and statistical methods. Hence it can be also viewed as Data + science, or the science of data.


2.  Business Drivers

 Data science has been in use across the Industries for decades and many popular tools in place.

§ Alteryx, which consists of a Designer module for designing analytics applications, a Server component for scaling across the organization and an Analytics Gallery for sharing applications with external partners.

§ IBM, which provides SPSS Modeler, a tool targeted to users with little or no analytical background. IBM also has SPSS Statistics, which is geared toward more sophisticated analysts.

§ KNIME, an open source product commercialized by software vendor KNIME.com that includes an analytics platform and a number of commercial extensions for big data, cluster operations and collaboration.

§ Microsoft Revolution Analytics, which spans two products -- Revolution R Open, a free download that's an enhanced version of the R programming language, and Revolution R Enterprise, which supports the use of R in clustered environments (like Hadoop).

§ Oracle Advanced Analytics, which includes Oracle Data Miner, Oracle R Advanced Analytics for Hadoop and Oracle Big Data Discovery, as well as connectors and interfaces for SQL and R.

§ RapidMiner, which provides a Studio component for design, a Server component, a Hadoop connector called Radoop and a component for stream processing.

§ SAP Predictive Analytics, which comprises two versions, Automated Analytics (for business users without a formal background) and Expert Analytics (targeted to professional data analysts and data scientists).

§ SAS Enterprise Miner, which is intended to help users quickly develop descriptive and predictive models, including components for predictive modeling and in-database scoring.

§ The Teradata Aster Discovery Platform, which is a framework offered by Teradata with its Aster database, Discovery Portfolio with built-in analytics functions, a graph processing engine, MapReduce and a version of R.


3.  Market Trend

The recent surge of low cost technology availability like Hadoop Eco systems, cloud computing, Big data and open source tools have lead to large scale adoption by every industry from small to large giants.

Few popular open source tools:

·        R

·        Python

·        Scala

·        MatLab

·        Julia

They offer all the popular functionality on par with statistical packages such as SPSS, SAS, and Stata.

·        K-means Clustering

·        Association Rule Mining

·        Linear Regression

·        Logistic Regression

·        Na?ve Bayesian Classifiers

·        Decision Tree

·        Time Series Analysis

·        Text Analytics

·        Big Data Processing

·        Visual WorkFlows

4.  Popular Models/Algorithms



5.  Planning The Strategy.

Below are few of steps towards strategy

·        Defining the business goal -> is the most important criteria

·        Sponsorship -> Buy in from key stakeholders

·        Collaboration among data engineers and data scientists is very crucial for project success.

·        Build a data lake or repository of valid, meaningful and useful historical data by gathering from different source systems -> Capacity Planning to be well considered.

·        Data Quality-> Cleansing data as per the quality norms needed.

·        Features extraction -> Meaning full insights from existing data for primary variables

·        Prediction Models: with feature engineering create models for derived features

·        Testing the models: Following K-Fold Cross Validation or 70:30 models for through testing.

·        Establish the Model Results: The results accuracy validates the model effectiveness

·        Fine-tuning: Continually improvement of the model with new data

--------------------------------------------------

要查看或添加评论,请登录

社区洞察

其他会员也浏览了