ABCDEFGs of Data Science & Statistical Terminology

ABCDEFGs of Data Science & Statistical Terminology

Executive Summary

Data science and statistics are becoming increasingly relevant in different fields. ‘ABCDEFGs of Data Science & Statistical Terminology’ covers the evolution and relevance of data science, data management, predictive analytics, statistics, logic and data analysis fallacies. Howard Diesel discusses metadata, conditional probability, bias, analysing bias and forecasting, and concepts in forecasting and backcasting.

The webinar explores the use of Quizlet for learning terminology, providing an overview of business terminology, hypothesis testing, testing and evaluation methods, and learning data science terminology. Overall, understanding these concepts and terminologies is important to make informed decisions based on data analysis.


Introduction to Data Science and Statistics

Howard Diesel mentions that he is nervous about presenting an Excel spreadsheet containing business glossary terms. He shares that his presentation was inspired by a man who studied the dictionary daily to improve his English communication skills. To help explain statistical terms.


The Evolution and Relevance of Data Science in Different Fields

Data science and AI techniques have roots in statistics and have benefitted from advancements in computation. Howard notes that data science is defined as the intersection of statistics, computer science, and domain expertise, and its relevance is increasing in various fields such as finance, banking, biology, psychology, and others. He also reflects on the difference between data science and traditional statistics and the continued relevance of statistics in modern computing and data analysis.


Figure 1 The Evolution and Relevance of Data Science in Different Fields
Figure 2 The Evolution and Relevance of Data Science in Different Fields continued

Data Management and Predictive Analytics

Howard discusses various resources for data management, predictive analytics, and statistical algorithms. PowerBI provides clickable terms for definitions, while the Bing search engine offers examples and explanations. Co-pilot, an AI tool, simplifies complex concepts. Integrating Bing with PowerBI facilitates easy access to definitions and further details.

Bart is mentioned as another resource for data analysis. Howard utilises various searches to access forecasts, analytics, statistics, and binomial distribution terms. The implications of using both resources may stem from the ease of integration with PowerBI.


Figure 3 PowerBI Model


Figure 4 Predictive Analytics


Figure 5 PowerBI Search

Statistics and Terminology in Interactive Glossary

The interactive PowerBI presentation on the business glossary included examples of terms like AB testing and Bing. Howard discusses the base rate fallacy and questions its relevance to statistical significance. He highlights the importance of careful data selection and affirms the antecedent and consequent logic used in inference, which can be prone to errors. Finally, one example related to students was presented.

?

Figure 6 Statistical Glossary

Fallacies in Logic and Data Analysis

Understanding the logic used in data analysis to avoid fallacies is crucial. One such fallacy is affirming the consequent, where an invalid deduction is made by assuming that if x, then y, and y are true, then x must be true. Consideration of using already existing information in data science and statistics to ensure representative answers is necessary. It is important to condition information in data analysis and make specific use of available information to answer questions accurately.


Figure 7 Fallacies in Logic and Data Analysis

Metadata, Conditional Probability, Bias

Understanding metadata is crucial when comprehending the relationship between different pieces of information, as it can impact how data is labelled and interpreted by users. In addition, grasping conditional probability is important as it involves the likelihood of an event occurring, given that another event has already occurred.

It's important to be aware of biases in statistics and data science, which can result in consistently inaccurate predictions or measurements. Therefore, it's crucial to examine forecasting models for systematic biases to ensure precise projections in data management.


Figure 8 PowerBI Glossary

Analysing Bias and Forecasting in Projection Tech

It's important to run automatic checks in analytical contexts to identify systematic bias or optimism in frameworks and human projections. Different models should be used to compete and determine which ones are more accurate over time.

Failure to do so can lead to overestimations, as seen in economic growth forecasting. It's important to track and monitor biases over time and automate this process. Capturing and tracking metadata is also essential for recording bias in GDP forecasting and nowcasting.

Concepts in Forecasting and Backcasting

In predictive analytics, it is important to test for bias in forecasting by comparing forecasted values to actual historical values using standard statistical tests and visual inspections. Presenting plots is good practice for this purpose. Backcasting in economics or finance involves rerunning models backwards using the data available at a specific time to account for data revision and parameterise models correctly.

In machine learning, using final vintage data for model building can lead to incorrect forecast performances and model parameterisation in economics. Finally, ecological correlation can be misleading as it refers to the correlation between averages of groups of individuals and does not necessarily reflect the association of the individuals.


Figure 9 Ante Example

Using Quizlet for Learning Terminology

The program Quizlet is recommended for learning data management and science terminology. This platform allows users to create study sets with various terms, including metadata management and data architecture.

A well-defined terminology is emphasised as crucial in the field of data management. Quizlet includes examples of different types of abstraction, such as horizontal and vertical abstraction, used in data modelling. It also offers an interactive game to help participants learn and understand terms, such as AB testing, in a fun and engaging way.


Figure 10 Quizlet Question


Figure 11 Quizlet Answer

Overview of Business Terminology and Hypothesis Testing

Howard covers various topics related to business, logic, and technology. He discusses the importance of understanding business terminology and the use of flashcards for reviewing concepts. Howard also discusses the logical fallacy of affirming the consequent.

In addition, he explains the application of machine learning in operations, the concept of algorithms, and the use of null hypotheses in hypothesis testing. The null hypothesis is usually the hypothesis that something is true, while the alternative hypothesis represents the opposite possibility.


Figure 12 Quizlet Questions and Answers


Figure 13 Quizlet Questions and Answers continued


Figure 14 Quizlet Questions and Answers continued


Figure 15 Quizlet Questions and Answers continued

Methods of Testing and Evaluation

Hypothesis tests determine whether a result is statistically significant and accurate, whereas selection bias is the systematic tendency to choose certain examples over others. Numeric values can represent quantitative variables, and the rejection region is where you would reject the null hypothesis.

Affirming the consequent is a logical fallacy where a statement assumes the cause of something based on its effect. The test method allows for true or false answers and can be set up to assess understanding of terms and definitions. The treatment group is the group that receives the treatment being tested. These methods help understand terminology and testing in various contexts.


Figure 16 Quizlet Questions and Answers continued

Data Science and Terminology

Howard discusses various terms related to data architecture and data management and mentions a dictionary from 2009 that contains all the terms relevant to data science. A participant expresses interest in learning more about data science and mentioned seeing data science flashcards in the past.

Howard talks about the effectiveness of flashcards for memorising information and highlights the importance of finding interactive ways to review business glossaries with data stewards.

Learning Data Science Terminology

It is crucial to clearly understand data science terminology to avoid misunderstandings and communicate effectively. Creating an ontology or graph relationship between terms can help navigate and comprehend their relationships. Understanding antonyms, synonyms, and mononyms is essential to comprehend how terms are related.


Join DAMA Southern Africa and Howard Diesel for our monthly Big Data and Data Science webinars on every Third Thursday of the month.

Please comment below if you wish to receive the recording.

Register for our webinars here:

https://www.meetup.com/dama-sa-data-management-meetup/events/


#damasa #datascience #cdmp #damacertification #damasouthafrica #cdmpspecialist

Zaheer Dhoodhat

System Architect at Hatch

1 个月

Please share the recording with me. Thank you

Howard Diesel

Chief Data Officer @ Modelware Systems | CDMP Master | Data Management Advisor

1 个月

Was an incredible session with the DAMA SA community. Looking forward to the next Big Data & Data Science webinar

要查看或添加评论,请登录

社区洞察

其他会员也浏览了