登录查看更多内容

Machine Learning Framework

Andrew Woon

Managing Director at Faber-Castell Malaysia, Indochina, East Asia (Japan, South Korea, Taiwan, Philippines)

发布日期: 2020年1月30日

+ 关注

Just for sharing to people who are just starting their machine learning journey.

1) Define the Problem and Success Measures

a) Define if the problem is a supervised or unsupervised problem; regression or classification

b) Determine level of accuracy for the model to be deemed successful

2) Data Collection

a) The larger the quantity of data, the more accurate the model will be

b) Look beyond dataset for other relevant domain information/data

3) Data Preparation

a) Wrangle data and prepare it for training (e.g. setting up for lag in time series, multi variate)

b) Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, stationarity, data type conversions, removing bias, etc.)

c) Randomize data (except for time series), which erases the effects of the particular order in which we collected and/or otherwise prepared our data

d) Visualize data to help detect relevant relationships between variables or class imbalances or bias or perform other exploratory analysis

e) Use information from visualization as well as other domain knowledge to generate features

f) Split into training and evaluation sets (70/30, 80/20, etc.)

4) Choose a Model

a) Select the right algorithms for the specific tasks. Different algorithms perform better at different tasks (e.g. CNNs for Natural Language and Vision Systems, LSTM/Prophet/ARIMA for time series, K-Means for Classification, XGBoost/LightGBM/Catboost for tabular data, etc.)

b) If unsure run AutoML with an ensemble (H2O.ai, Auto Gluon, etc.)

c) Some algorithms run faster than others

5) Train the Model

a) Training assigns weights/importance to features (Linear regression example: algorithm would need to learn values for m (or W) and b (x is input, y is output)

b) The more iterations or training step the more accurate the model however it will reach a saturation point. Need to balance time and computational power vs accuracy.

6) Evaluate the Model

a) Uses some metric or combination of metrics to "measure" objective performance of model (RMSE, MAE, MAPE, etc.)

b) Test the model against previously unseen data to further tune the model.

c) Compare train/eval split -> 80/20, 70/30, etc. Depending on domain, data availability, dataset particulars, etc.

d) Prevent Over Fitting. Model should be able to generalize.

7) Parameter Tuning

a) Hyperparameter tuning of algorithm especially for Neural Networks

b) Manually tune model parameters for improved performance (or use AutoKeras, Google’s AutoML for neural networks)

c) Simple model hyperparameters may include: number of training steps (epochs), learning rate, initialization (seed) values and distribution, number of nodes, etc.

8) Make Predictions

a) Using further (test set) data which have, until this point, been withheld from the model (and for which class labels are known), are used to test the model; a better approximation of how the model will perform in the real world

Machine Learning Framework

Andrew Woon

Managing Director at Faber-Castell Malaysia, Indochina, East Asia (Japan, South Korea, Taiwan, Philippines)

社区洞察

其他会员也浏览了

Understanding Bagging in Machine Learning: Combat Overfitting and Boost Accuracy

Feature Selection vs. Feature Extraction: Navigating Dimensionality Reduction in Machine Learning

Extracting Graph Level Features from Graphs for Machine Learning Models: Part 4 of X of my notes

9-Step Guide to Building Machine Learning Models

7 Common Challenges in 2023 - Machine Learning

5 Common Machine Learning Problems & How to Solve Them

Why Correlation-Based Machine Learning Leads to Bad Predictions

Unlocking the Potential of Machine Learning: A Look into the Various Applications

10 Must-Know Machine Learning Algorithms for 2024

A Best-Practice Approach to Machine Learning Model Development