登录查看更多内容

Terms In Data Science (A-Z)

Sachin M

Analytics Engineer at Deloitte ?? - AI & Data | Power BI | ADB | ADF | ADLS | SQL | PySpark

发布日期: 2024年6月10日

+ 关注

? Accuracy: Correct predictions divided by total predictions.

? Area Under Curve: Represents performance under ROC curve.

? ARIMA: Time series forecasting method.

? Bias: Difference between true value and predicted value.

? Bayes Theorem: Calculates event probability based on prior knowledge.

? Binomial Distribution: Models number of successes in fixed trials.

? Clustering: Grouping data points based on similarities.

? Confusion Matrix: Evaluates classification model performance.

? Cross-validation: Assesses model performance by training/testing on data subsets.

? Decision Trees: Tree-like model for classification and regression.

? Dimensionality Reduction: Reducing dataset features while preserving important information.

? Discriminative Models: Learn boundaries between classes.

? Ensemble Learning: Combines multiple models for better performance.

? EDA (Exploratory Data Analysis): Analyzing and visualizing data patterns.

? Entropy: Measure of randomness in information.

? Feature Engineering: Creating new features to improve model performance.

? F-score: Balances precision and recall in binary classification.

? Feature Extraction: Automatically extracting meaningful features from data.

? Gradient Descent: Optimization algorithm minimizing function by adjusting parameters iteratively.

? Gaussian Distribution: Normal distribution with bell-shaped curve.

? Gradient Boosting: Sequentially builds weak learners for improved performance.

? Hypothesis: Testable statement in statistical inference.

? Hierarchical Clustering: Organizes data into tree-like structure.

? Heteroscedasticity: Unequal variance of errors in regression model.

? Information Gain: Measures feature importance in decision trees.

? Independent Variable: Variable manipulated to observe effects on dependent variable.

? Imbalance: Unequal distribution of classes in dataset.

? Jupyter: Interactive computing environment for data analysis.

? Joint Probability: Probability of events occurring together.

? Jaccard Index: Measures similarity between two sets.

? Kernel Density Estimation: Estimates probability density of continuous variable.

? KS Test: Compares two probability distributions.

? KMeans Clustering: Partitions data into K clusters based on similarity.

? Likelihood: Chance of observing data given a model.

? Linear Regression: Models relationship between dependent and independent variables.

? L1/L2 Regularization: Prevents overfitting by adding penalty terms.

? Maximum Likelihood Estimation: Estimates statistical model parameters.

? Multicollinearity: High correlation between independent variables in regression.

Pratibha Kumari J. 1 年前

K-nearest neighbor Classification(KNN)

Bluechip Technologies Asia 6 个月前

Different Data Transformations in Machine Learning -…

Vinod Kumar G R 7 个月前

? Mutual Information: Amount of information shared between variables.

? Naive Bayes: Probabilistic classifier assuming feature independence.

? Normalization: Scales data to mean 0 and std-dev 1.

? Null Hypothesis: No significant difference/effect in statistical testing.

? Overfitting: Model performs well on training but poorly on new data.

? Outliers: Data points significantly different from others.

? One-hot encoding: Converts categorical variables into binary vectors.

? PCA: Reduces dimensionality by transforming data into components.

? Precision: True positive predictions among all positives.

? p-value: Probability of result under null hypothesis.

? QQ-plot: Compares distribution of two datasets graphically.

? QR decomposition: Factorizes matrix into orthogonal and upper triangular matrix.

? Random Forest: Ensemble method using multiple decision trees.

? Recall: True positives among all actual positives.

? ROC Curve: Shows binary classifier performance at different thresholds.

? SVM: Algorithm for classification and regression.

? Standardisation: Scales data to mean 0 and std-dev 1.

? Sampling: Selecting subset of data points from larger dataset.

? t-SNE: Visualizes high-dimensional data in lower dimensions.

? t-distribution: Used in hypothesis testing with small sample sizes.

? Type I/II Error: False positive/negative in hypothesis testing.

? Underfitting: Model too simple to capture data patterns.

? UMAP: Visualizes high-dimensional data.

? Uniform Distribution: All outcomes equally likely.

? Variance: Spread of data points around mean.

? Validation Curve: Shows model performance across hyperparameter values.

? Vanishing Gradient: Gradients become small during deep network training.

? Word embedding: Represents words as dense vectors in NLP.

? Word cloud: Visualizes word frequency by size.

? Weights: Parameters learned during model training.

? XGBoost: Gradient boosting library.

? XLNet: Language model for NLP.

? YOLO: Real-time object detection system.

? Yellowbrick: Python library for machine learning visualization.

? Z-score: Standardized value showing data point’s deviation from mean.

? Z-test: Compares sample mean to population mean.

? Zero-shot learning: Model recognizes new classes without prior examples.

Terms In Data Science (A-Z)

Sachin M

Analytics Engineer at Deloitte ?? - AI & Data | Power BI | ADB | ADF | ADLS | SQL | PySpark

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Class 16 - DATA SCIENCE PROCESSES Notes from the AI Advance course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

Data Science

AutoEDA with glook

Essential Data Science Concepts from A to Z

Outlier Detection in Data Science: Techniques and Use?Cases

Decoding Classification Algorithms: A Fun Guide to Finding Your Data's Perfect Match!

Unravelling the Data Science Step-by-Step Process: From Raw Data to Actionable Insights

A Data Sapient Guide to Feature Engineering: Handling Missing Data

Critical Importance of Data Science as an adhesive for advancement in the Economic and Educational sector

Role of PCA in current data science

领英推荐

A Simple Guide for Data Aspirants to Organizing Information and Making Smart Choices"

2023年11月28日

Data Analytics: A Simple Guide to Python Magic! ??

2023年11月22日

Data Science Essentials: Concepts, Differences, Bias, and Key Terms Explained for Freshers

2023年8月29日