Problem Solving vs Parameter Tuning
When building machine learning models, it is very tempting to spend a lot of time optimizing the models by adjusting/fine-tuning hyper-parameters. Hyper-parameters are the knobs/switches. Hyper-parameter tuning is definitely an important step – as it sometimes gives a few percentage better results (better accuracy, better ROC/AUC, etc.,) over the default settings.
But an inordinate amount of focus is given to this tuning task and to applying multiple modeling techniques (KNNs, random forests, gradient boosting, deep neural nets) – each with its own set of knobs (gamma, learning rate, max depth, epochs, batch size, optimizer etc.,) to adjust. So much time is spent on wringing the last amount of performance that not enough time (i.e., no time at all) spent on revisiting the problem definition and revisiting the data collection. Unfortunately, very true when the model performance is mediocre/average. Instead of revisiting the problem that we are trying to solve, we tend to ‘improve’ the (bad) performance by a few percentage points.
There appears to be a false belief that by using a high population grid search space, the ‘best’ model can be achieved. This is far from the truth. The real best model is achieved by properly defining the problem question, capturing the data elements that inherently affect the outcome and then creating/engineering new features (i.e., variables like BMI that combine height & weight or replacing postal code with median home values or replacing state with unemployment rate etc.,) that are information laden. When the performance metrics are less than ideal, we should revisit the problem definition & data collection – hard tasks. Instead, through inertia (or escapism), a lot of time is spent on tuning/optimizing the model.
It always helps to remember when in doubt:
“>” – More important than
Problem definition > data identification > feature creation > modeling technique > hyper-parameter tuning
Finally, when current model performance is mediocre, we must recall Tukey's words: "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data".
So, focus on getting better data and more info-rich predictors. Good luck!