ABCDEFGs of Data Science & Statistical Terminology
DAMA Southern Africa
Data Literacy: From Executives to Data Citizens to Data Management Professionals; We all need to improve our DM KSCs
Executive Summary
Data science and statistics are becoming increasingly relevant in different fields. ‘ABCDEFGs of Data Science & Statistical Terminology’ covers the evolution and relevance of data science, data management, predictive analytics, statistics, logic and data analysis fallacies. Howard Diesel discusses metadata, conditional probability, bias, analysing bias and forecasting, and concepts in forecasting and backcasting.
The webinar explores the use of Quizlet for learning terminology, providing an overview of business terminology, hypothesis testing, testing and evaluation methods, and learning data science terminology. Overall, understanding these concepts and terminologies is important to make informed decisions based on data analysis.
Introduction to Data Science and Statistics
Howard Diesel mentions that he is nervous about presenting an Excel spreadsheet containing business glossary terms. He shares that his presentation was inspired by a man who studied the dictionary daily to improve his English communication skills. To help explain statistical terms.
The Evolution and Relevance of Data Science in Different Fields
Data science and AI techniques have roots in statistics and have benefitted from advancements in computation. Howard notes that data science is defined as the intersection of statistics, computer science, and domain expertise, and its relevance is increasing in various fields such as finance, banking, biology, psychology, and others. He also reflects on the difference between data science and traditional statistics and the continued relevance of statistics in modern computing and data analysis.
Data Management and Predictive Analytics
Howard discusses various resources for data management, predictive analytics, and statistical algorithms. PowerBI provides clickable terms for definitions, while the Bing search engine offers examples and explanations. Co-pilot, an AI tool, simplifies complex concepts. Integrating Bing with PowerBI facilitates easy access to definitions and further details.
Bart is mentioned as another resource for data analysis. Howard utilises various searches to access forecasts, analytics, statistics, and binomial distribution terms. The implications of using both resources may stem from the ease of integration with PowerBI.
Statistics and Terminology in Interactive Glossary
The interactive PowerBI presentation on the business glossary included examples of terms like AB testing and Bing. Howard discusses the base rate fallacy and questions its relevance to statistical significance. He highlights the importance of careful data selection and affirms the antecedent and consequent logic used in inference, which can be prone to errors. Finally, one example related to students was presented.
?
Fallacies in Logic and Data Analysis
Understanding the logic used in data analysis to avoid fallacies is crucial. One such fallacy is affirming the consequent, where an invalid deduction is made by assuming that if x, then y, and y are true, then x must be true. Consideration of using already existing information in data science and statistics to ensure representative answers is necessary. It is important to condition information in data analysis and make specific use of available information to answer questions accurately.
Metadata, Conditional Probability, Bias
Understanding metadata is crucial when comprehending the relationship between different pieces of information, as it can impact how data is labelled and interpreted by users. In addition, grasping conditional probability is important as it involves the likelihood of an event occurring, given that another event has already occurred.
It's important to be aware of biases in statistics and data science, which can result in consistently inaccurate predictions or measurements. Therefore, it's crucial to examine forecasting models for systematic biases to ensure precise projections in data management.
Analysing Bias and Forecasting in Projection Tech
It's important to run automatic checks in analytical contexts to identify systematic bias or optimism in frameworks and human projections. Different models should be used to compete and determine which ones are more accurate over time.
Failure to do so can lead to overestimations, as seen in economic growth forecasting. It's important to track and monitor biases over time and automate this process. Capturing and tracking metadata is also essential for recording bias in GDP forecasting and nowcasting.
领英推荐
Concepts in Forecasting and Backcasting
In predictive analytics, it is important to test for bias in forecasting by comparing forecasted values to actual historical values using standard statistical tests and visual inspections. Presenting plots is good practice for this purpose. Backcasting in economics or finance involves rerunning models backwards using the data available at a specific time to account for data revision and parameterise models correctly.
In machine learning, using final vintage data for model building can lead to incorrect forecast performances and model parameterisation in economics. Finally, ecological correlation can be misleading as it refers to the correlation between averages of groups of individuals and does not necessarily reflect the association of the individuals.
Using Quizlet for Learning Terminology
The program Quizlet is recommended for learning data management and science terminology. This platform allows users to create study sets with various terms, including metadata management and data architecture.
A well-defined terminology is emphasised as crucial in the field of data management. Quizlet includes examples of different types of abstraction, such as horizontal and vertical abstraction, used in data modelling. It also offers an interactive game to help participants learn and understand terms, such as AB testing, in a fun and engaging way.
Overview of Business Terminology and Hypothesis Testing
Howard covers various topics related to business, logic, and technology. He discusses the importance of understanding business terminology and the use of flashcards for reviewing concepts. Howard also discusses the logical fallacy of affirming the consequent.
In addition, he explains the application of machine learning in operations, the concept of algorithms, and the use of null hypotheses in hypothesis testing. The null hypothesis is usually the hypothesis that something is true, while the alternative hypothesis represents the opposite possibility.
Methods of Testing and Evaluation
Hypothesis tests determine whether a result is statistically significant and accurate, whereas selection bias is the systematic tendency to choose certain examples over others. Numeric values can represent quantitative variables, and the rejection region is where you would reject the null hypothesis.
Affirming the consequent is a logical fallacy where a statement assumes the cause of something based on its effect. The test method allows for true or false answers and can be set up to assess understanding of terms and definitions. The treatment group is the group that receives the treatment being tested. These methods help understand terminology and testing in various contexts.
Data Science and Terminology
Howard discusses various terms related to data architecture and data management and mentions a dictionary from 2009 that contains all the terms relevant to data science. A participant expresses interest in learning more about data science and mentioned seeing data science flashcards in the past.
Howard talks about the effectiveness of flashcards for memorising information and highlights the importance of finding interactive ways to review business glossaries with data stewards.
Learning Data Science Terminology
It is crucial to clearly understand data science terminology to avoid misunderstandings and communicate effectively. Creating an ontology or graph relationship between terms can help navigate and comprehend their relationships. Understanding antonyms, synonyms, and mononyms is essential to comprehend how terms are related.
Join DAMA Southern Africa and Howard Diesel for our monthly Big Data and Data Science webinars on every Third Thursday of the month.
Please comment below if you wish to receive the recording.
Register for our webinars here:
#damasa #datascience #cdmp #damacertification #damasouthafrica #cdmpspecialist
System Architect at Hatch
1 个月Please share the recording with me. Thank you
Thank you to everyone who joined this discussion. Please comment below if you'd like to receive the recording, and we'll gladly share it with you. Drew Kennedy; Thetshelesani Ravhura; Allen Machary; John Kumwenda; Auxilia Maomela; Anurag Kanumuri; Kayle Maclou; Marvin Vollenhoven; Venkat Rao Bhamidipathi; Nélia Costa da Silva; Loryn Sorour; Theresia Alibalio; Shannon Kruger; Sadaf Qureshi; Nurse Mgidi; Mayela Torrealba; Diana Joseph; Marc Nolte, CDMP, CDP; Rizwana C.; Kimberly Nkiwane; Ndilenga (Tuyakula) M.; Zaheer Dhoodhat; Hamdi BAANANNOU - TOGAF? Certified, CDMP?, PRINCE2?; John O'Gorman; Mishumo Dzhivhuho (MBA); Arpit Jain; Keith T. Mutambirwa; Siviwe Bikitsha; Hangwelani Mamuthubi; Olorato Morerinyane; Deshnee Boodhram; Thetshelesani Ravhura; Mopholosi Monyollo; Suresh Dontha; Dr. Anil Pise; Ipeleng Peta; Benny Chabalala; Hamdi BAANANNOU - TOGAF? Certified, CDMP?, PRINCE2?; Rirhandzu Ingrid Diale; Alfred Vinyo Owusu-Duku; Lesego Siti; Mmamalema Molepo; Louisa Kekana; Lungelo Mvuyana; Tiego T.; Sphiwe Masoka; Paul Grobler; Kimberly Nkiwane; Purity Molala; Asanda Simelane; Oratile Peu; Molefi Radebe; Hesham Khalil, CDMP; Daan Steenkamp; Janita Botha
Chief Data Officer @ Modelware Systems | CDMP Master | Data Management Advisor
1 个月Was an incredible session with the DAMA SA community. Looking forward to the next Big Data & Data Science webinar