Data Analytics # CAROHITUPADHYAYA

Data Analytics # CAROHITUPADHYAYA

Introduction to Data Analytics

Data analytics is the science of raw data analysis to draw conclusions about it. Data Analytics refers to the techniques for analyzing data for improving productivity and the profit of the business. Data is extracted and cleaned from different sources to analyze various patterns. Many data analytics techniques and processes are automated into mechanical processes and algorithms which handle raw data for human consumption.

Types of Data Analytics

The Data Analytics Process is subjectively categorized into three types based on the purpose of analyzing data as:

  • Descriptive Analytics
  • Predictive Analytics
  • Prescriptive Analytics

1. Descriptive Analytics

Descriptive Analytics focuses on summarizing past data to derive inferences.

The most commonly used measures to characterize historical data distribution quantitatively includes:

  • Measures of Central Tendency:?Mean, Median, Quartiles, Mode
  • Measures of variability or spread:?Range, Inter-Quartile Range, Percentiles

In recent times, the difficulties and limitations involved to collect, store and comprehend massive data heaps are overcome with the statistical inference process. Generalized inferences about population dataset statistics are deduced by using sampling methods along with the application of central limiting theory. A leading news broadcaster gathers casted vote details of randomly chosen voters at the exit of a poll station on the election day to derive statistical inferences about the preferences of the entire population.

2. Predictive Analytics

Predictive Analytics exploits patterns in historical or past data to estimate future outcomes, identify trends, uncover potential risks and opportunities, or forecast process behavior. As Prediction use-cases are plausible in nature, these approaches employ probabilistic models to measure the likelihood of all possible outcomes. The chatBot in Customer Service Portal of financial firm pro-actively learns the customers’ intent or need to be based on his/her past activities in its web domain. With the predicted context, chatBot interactively converses with the customer to deliver apt services quickly and achieve better customer satisfaction.

3. Prescriptive Analytics

Prescriptive Analytics uses knowledge discovered as a part of both descriptive and predictive analysis to recommend a context-aware course of actions. Advanced statistical techniques and computational-intensive optimization methods are implemented to understand the distribution of estimated predictions.

In precise terms, the impact and benefit of each outcome that is estimated during predictive analytics are evaluated to make heuristic and time-sensitive decisions for a given set of conditions. A Stock market consultancy firm performs SWOT (Strength, Weakness, Opportunities, and Threat) analysis on predicted prices for stocks in investors’ portfolio and recommends the best Buy-Sell options to its clients.

Process Flow in Data Analytics

The process of data analytics have various stages of data processing as given below:

1. Data Extraction

Data ingestion from multiple data sources of various types, including web pages, databases, legacy applications, results in input datasets of different formats.

The data formats inputted to the data analytics flow can be broadly classified as:

  • Structured data have a clear definition of data types along with associated field length or field delimiters. This type of data can be easily queried like the content stored in the Relational Database (RDBMS).
  • Semi-structured data lack precise layout definition, but data elements can be identified, separated, and grouped based on a standard schema or other metadata rules. An XML file employs tagging to hold data, whereas the Javascript object Notation file (JSON) holds data in name-value pairs. NoSQL (Not only SQL) databases like MongoDB but couch base are also used to store semi-structured data.
  • Unstructured data includes social media conversations, images, audio clips etc. Traditional data parsing methods fail to understand this data. Unstructured data is stored in data lakes.

Implementation of data parsing for structured and semi-structured data is incorporated in various ETL tools like Ab Initio, Informatica, Datastage, and open source alternatives like Talend.

2. Data Cleaning and Transformation

Cleaning of parsed data is done to ensure data consistency and availability of relevant data for the later stages in a process flow.

The major cleansing operations in Data analytics are:

  • Detection and elimination of outliers in the data volumes.
  • Removing duplicates in the dataset.
  • Handling missing entries in data records with the understanding of functionality or use-cases.
  • Validations for permissible field values in data records like “31-February” cannot be a valid value in any of the date fields.

Cleansed data is transformed into a suitable format to analyze data.

Data transformations includes:

  • A filter of unwanted data records.
  • Joining the data fetched from different sources.
  • Aggregation or grouping of data.
  • Data typecasting.

3. KPI/Insight Derivation

Data Mining, Deep learning methods are used to evaluate Key Performance Indicators(KPI) or derive valuable insights from the cleaned and transformed data. Based on the objective of analytics, data analysis is performed using various pattern recognition techniques like k-means clustering, SVM classification, Bayesian classifiers, etc. and machine learning models like Markov models, Gaussian Mixture Models(GMM), etc.

Probabilistic models in the training phase learn optimal model parameters, and in the validation phase, the model is tested using k-fold cross-validation testing to avoid over-fitting and under-fitting errors. The most commonly used programming language for data analysis is R and Python. Both have a rich set of libraries (SciPy, NumPy, Pandas) that are open-sourced to perform complex data analysis.

4. Data Visualization

Data visualization is the process of clear and effective presentation of uncovered patterns, derived conclusions from the data using graphs, plots, dashboards, and graphics.

  • Data reporting tools like QlikView, Tableau, etc., display KPI and other derived metrics at various levels of granularity.
  • Reporting tools enable end-users to create customized reports with pivot, drill-down options using user-friendly drag and drop interfaces.
  • Interactive data visualization libraries like D3.js (Data-driven documents), HTML5-Anycharts, etc.. are used to increase the ability to explore analyzed data.

CA ROHIT UPADHYAYA

Group CFO @ Kerchanshe Coffees | 25 + YEARS I FMCGI MANUFACTURING I LOGISTICS I OIL & GAS I AGRICULTURE I HOSPITALITY I RETAIL I MINING I TECHNO COMMERCIAL I AFRICA I Driving financial growth and consolidation

2 年

要查看或添加评论,请登录

CA ROHIT UPADHYAYA的更多文章

社区洞察

其他会员也浏览了