Machine Learning - Hyperparameter Tuning
Gaurav Pahuja
Senior Data Scientist | DatSci 2019 Finalist | Python/Plotly-Dash | R/R-Shiny | Oracle SQL/BI | SQL | Machine Learning | Deep Learning | Techfitlab
Randomized Search for Classification and Regression Models
Content
What is Hyperparameter Tuning?
In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned. — Wikipedia
In other words, hyperparameters are points of choice or configuration that allow a machine learning model to be customised for a specific task or dataset.
What is Randomized Search?
Randomized Search is a method in which random combinations of hyperparameters are selected and used to train a model. In contrast to Grid Search, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions.
Some of the key benefits for Randomized Search is that it is trivial to implement, it is great for discovery and getting hyperparameter combinations that you would not have guessed intuitively and random experiments are more efficient because not all hyperparameters are equally important to tune. [1]
Implementing Randomized Search for Regression?
In this example, we will collect our own dataset for weather and electricity demand as shown in the series of videos below:
Next, let’s use random search to find a good model configuration for the demand dataset.
data.head()
plt.figure(figsize=(50,4))
plt.plot(train_df.index,train_df['demand'],label='Train');
plt.plot(val_df.index,val_df['demand'],label='Val');
plt.plot(test_df.index,test_df['demand'],label='Test')
plt.legend()
In this section, we will explore hyperparameter optimisation of the XGBoost regression model on the demand dataset.
import xgboost as xgb
# Split the dataset
train_df = data[0:int(n*0.8)]
val_df = data[int(n*0.8):int(n*0.9)]
test_df = data[int(n*0.9):]
X_train, y_train = train_df.iloc[:,:-1], train_df.iloc[:,-1]
X_val, y_val = val_df.iloc[:,:-1], val_df.iloc[:,-1]
X_test, y_test = test_df.iloc[:,:-1], test_df.iloc[:,-1]
First, we will define the model that will be optimised and use default values for the hyperparameters that will not be optimised.
xgb_reg = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=500)
We will run one baseline model and calculate the R-squared value to see if we can improve the baseline model by using RandomizedSearchCV.
xgb_reg.fit(X_train, y_train)
y_pred = xgb_reg.predict(X_test)
xgb_reg.score(X_test, y_test)
Output:
0.734
Next, we can define the search procedure with all of these elements.
Finally, we can perform the optimisation and report the results.
# baseline
xgb_reg_2 = xgb.XGBRegressor(objective='reg:squarederror',
nthread = 4,
silent = 0)
# define search space
params = {
'num_boost_round': [10, 25, 5, 15],
'eta': [0.05, 0.001, 0.1, 0.3],
'max_depth': [3, 6, 4, 5],
'subsample': [0.9, 1.0, 0.8],
'colsample_bytree': [0.9, 1.0, 0.8],
'alpha': [0.1, 0.3, 0.0]
}
# define search
random_search = RandomizedSearchCV(xgb_reg_2, params, n_jobs=-1, cv=5, n_iter=500, verbose=1, scoring='r2')
start = time()
# execute search
random_search.fit(X_train, y_train, verbose=True)
best_parameters = random_search.best_params_
# print results
print('RandomizedSearchCV Results: ')
print(random_search.best_score_)
print('Best Parameters: ')
for param_name in sorted(best_parameters.keys()):
print("%s: %r" % (param_name, best_parameters[param_name]))
end = time()
print('time elapsed: ' + str(end-start))
print(' ')
print('Best Estimator: ')
print(random_search.best_estimator_)
y_pred = random_search.predict(X_test)
Output:
Running the example may take some time depending on the size of your dataset and the list of parameters. You may see some warnings during the optimization for invalid configuration combinations. These can be safely ignored.
领英推荐
Fitting 5 folds for each of 500 candidates, totalling 2500 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done? 42 tasks? ? ? | elapsed:? 1.2min
[Parallel(n_jobs=-1)]: Done 192 tasks? ? ? | elapsed:? 3.9min
[Parallel(n_jobs=-1)]: Done 442 tasks? ? ? | elapsed:? 8.4min
[Parallel(n_jobs=-1)]: Done 792 tasks? ? ? | elapsed: 15.3min
[Parallel(n_jobs=-1)]: Done 1242 tasks? ? ? | elapsed: 23.1min
[Parallel(n_jobs=-1)]: Done 1792 tasks? ? ? | elapsed: 31.9min
[Parallel(n_jobs=-1)]: Done 2442 tasks? ? ? | elapsed: 43.4min
[Parallel(n_jobs=-1)]: Done 2500 out of 2500 | elapsed: 44.6min finished
[17:15:13] WARNING: /Users/travis/build/dmlc/xgboost/src/learner.cc:480:
Parameters: { num_boost_round, silent } might not be used.
This may not be accurate due to some parameters are only used in language bindings but passed down to XGBoost core.? Or some parameters are not used but slip through this verification. Please open an issue if you find the above cases.
[0] validation_0-rmse:3028.58520 validation_1-rmse:3115.59595
Multiple eval metrics have been passed: 'validation_1-rmse' will be used for early stopping.
Will train until validation_1-rmse hasn't improved in 10 rounds.
[1] validation_0-rmse:2878.61621 validation_1-rmse:2970.77710
[2] validation_0-rmse:2736.20850 validation_1-rmse:2833.18750
[3] validation_0-rmse:2601.21094 validation_1-rmse:2701.06250
[4] validation_0-rmse:2473.86548 validation_1-rmse:2578.27075
[5] validation_0-rmse:2352.15552 validation_1-rmse:2459.19556
[6] validation_0-rmse:2236.24487 validation_1-rmse:2348.32202
[7] validation_0-rmse:2127.36231 validation_1-rmse:2243.54468
[8] validation_0-rmse:2023.19324 validation_1-rmse:2141.18164
[9] validation_0-rmse:1923.92053 validation_1-rmse:2046.05396
[10] validation_0-rmse:1829.69202 validation_1-rmse:1955.49414
[11] validation_0-rmse:1740.27380 validation_1-rmse:1869.02966
[12] validation_0-rmse:1656.18420 validation_1-rmse:1787.90308
[13] validation_0-rmse:1575.64587 validation_1-rmse:1710.54053
[14] validation_0-rmse:1499.17456 validation_1-rmse:1637.45349
[15] validation_0-rmse:1426.67993 validation_1-rmse:1568.17102
[16] validation_0-rmse:1358.07056 validation_1-rmse:1500.74902
[17] validation_0-rmse:1293.55676 validation_1-rmse:1438.45398
[18] validation_0-rmse:1232.22253 validation_1-rmse:1378.79785
[19] validation_0-rmse:1174.12036 validation_1-rmse:1322.90442
[20] validation_0-rmse:1119.11609 validation_1-rmse:1270.13025
[21] validation_0-rmse:1066.15186 validation_1-rmse:1219.09558
[22] validation_0-rmse:1016.04895 validation_1-rmse:1170.56677
[23] validation_0-rmse:968.62353 validation_1-rmse:1124.20093
[24] validation_0-rmse:923.65564 validation_1-rmse:1080.49011
[25] validation_0-rmse:881.09045 validation_1-rmse:1038.90637
[26] validation_0-rmse:840.81384 validation_1-rmse:999.42071
[27] validation_0-rmse:802.68713 validation_1-rmse:962.28687
[28] validation_0-rmse:766.65265 validation_1-rmse:927.13886
[29] validation_0-rmse:732.61865 validation_1-rmse:893.88788
[30] validation_0-rmse:700.86975 validation_1-rmse:862.87537
[31] validation_0-rmse:670.53931 validation_1-rmse:832.40411
[32] validation_0-rmse:641.86975 validation_1-rmse:803.79620
[33] validation_0-rmse:614.82153 validation_1-rmse:776.66516
[34] validation_0-rmse:589.69751 validation_1-rmse:751.33124
[35] validation_0-rmse:565.57971 validation_1-rmse:727.15832
[36] validation_0-rmse:543.32098 validation_1-rmse:704.69067
[37] validation_0-rmse:522.30914 validation_1-rmse:683.33594
[38] validation_0-rmse:502.57916 validation_1-rmse:663.11688
[39] validation_0-rmse:484.17969 validation_1-rmse:644.16101
[40] validation_0-rmse:466.39215 validation_1-rmse:625.65485
[41] validation_0-rmse:449.68274 validation_1-rmse:608.07361
[42] validation_0-rmse:434.08716 validation_1-rmse:591.30786
[43] validation_0-rmse:419.44092 validation_1-rmse:575.45856
[44] validation_0-rmse:406.17426 validation_1-rmse:560.37653
[45] validation_0-rmse:393.43451 validation_1-rmse:546.23151
[46] validation_0-rmse:381.78534 validation_1-rmse:533.12537
[47] validation_0-rmse:370.61115 validation_1-rmse:520.50757
[48] validation_0-rmse:360.20517 validation_1-rmse:508.57068
[49] validation_0-rmse:350.63019 validation_1-rmse:497.26331
[50] validation_0-rmse:341.87067 validation_1-rmse:487.02097
[51] validation_0-rmse:333.80243 validation_1-rmse:477.15784
[52] validation_0-rmse:326.03281 validation_1-rmse:467.54929
[53] validation_0-rmse:319.03561 validation_1-rmse:458.90945
[54] validation_0-rmse:312.48703 validation_1-rmse:450.43494
[55] validation_0-rmse:306.46024 validation_1-rmse:442.28781
[56] validation_0-rmse:300.82993 validation_1-rmse:434.65640
[57] validation_0-rmse:295.65082 validation_1-rmse:427.46564
[58] validation_0-rmse:290.87881 validation_1-rmse:420.90317
[59] validation_0-rmse:286.56857 validation_1-rmse:414.80145
[60] validation_0-rmse:282.53354 validation_1-rmse:408.78177
[61] validation_0-rmse:278.82739 validation_1-rmse:403.02801
[62] validation_0-rmse:275.38083 validation_1-rmse:397.55566
[63] validation_0-rmse:272.26483 validation_1-rmse:392.38892
[64] validation_0-rmse:269.36115 validation_1-rmse:387.80649
[65] validation_0-rmse:266.69910 validation_1-rmse:383.25079
[66] validation_0-rmse:264.33737 validation_1-rmse:378.94147
[67] validation_0-rmse:262.08743 validation_1-rmse:374.86331
[68] validation_0-rmse:259.97595 validation_1-rmse:370.89734
[69] validation_0-rmse:258.09634 validation_1-rmse:367.23883
[70] validation_0-rmse:256.35492 validation_1-rmse:363.91281
[71] validation_0-rmse:254.73637 validation_1-rmse:360.63870
[72] validation_0-rmse:253.34224 validation_1-rmse:357.59689
[73] validation_0-rmse:252.02702 validation_1-rmse:354.60568
[74] validation_0-rmse:250.82719 validation_1-rmse:351.79309
[75] validation_0-rmse:249.67825 validation_1-rmse:349.23715
[76] validation_0-rmse:248.65965 validation_1-rmse:346.77612
[77] validation_0-rmse:247.72450 validation_1-rmse:344.40747
[78] validation_0-rmse:246.80267 validation_1-rmse:342.15888
[79] validation_0-rmse:245.99811 validation_1-rmse:340.12512
[80] validation_0-rmse:245.22859 validation_1-rmse:338.13821
[81] validation_0-rmse:244.41821 validation_1-rmse:336.35168
[82] validation_0-rmse:243.80837 validation_1-rmse:334.60898
[83] validation_0-rmse:243.19672 validation_1-rmse:332.99353
[84] validation_0-rmse:242.57985 validation_1-rmse:331.44428
[85] validation_0-rmse:242.04291 validation_1-rmse:330.03220
[86] validation_0-rmse:241.52477 validation_1-rmse:328.73193
[87] validation_0-rmse:240.97461 validation_1-rmse:327.55893
[88] validation_0-rmse:240.53438 validation_1-rmse:326.28009
[89] validation_0-rmse:240.10960 validation_1-rmse:325.12067
[90] validation_0-rmse:239.70949 validation_1-rmse:323.97183
[91] validation_0-rmse:239.36101 validation_1-rmse:322.87314
[92] validation_0-rmse:238.99913 validation_1-rmse:321.74439
[93] validation_0-rmse:238.48848 validation_1-rmse:320.85998
[94] validation_0-rmse:238.20342 validation_1-rmse:320.03012
[95] validation_0-rmse:237.89264 validation_1-rmse:319.19754
[96] validation_0-rmse:237.66429 validation_1-rmse:318.30530
[97] validation_0-rmse:237.32375 validation_1-rmse:317.85239
[98] validation_0-rmse:237.03314 validation_1-rmse:317.06726
[99] validation_0-rmse:236.69791 validation_1-rmse:316.29480
RandomizedSearchCV Results:
0.7942837766673408
Best Parameters:
alpha: 0.3
colsample_bytree: 0.9
eta: 0.05
max_depth: 5
num_boost_round: 15
subsample: 0.8
time elapsed: 2676.828140974045
Best Estimator:
XGBRegressor(alpha=0.3, base_score=0.5, booster='gbtree', colsample_bylevel=1,
????????????colsample_bynode=1, colsample_bytree=0.9, eta=0.05, gamma=0,
????????????gpu_id=-1, importance_type='gain', interaction_constraints='',
????????????learning_rate=0.0500000007, max_delta_step=0, max_depth=5,
????????????min_child_weight=1, missing=nan, monotone_constraints='()',
????????????n_estimators=100, n_jobs=4, nthread=4, num_boost_round=15,
????????????num_parallel_tree=1, random_state=0, reg_alpha=0.300000012,
????????????reg_lambda=1, scale_pos_weight=1, silent=0, subsample=0.8,
????????????tree_method='exact', validate_parameters=1, ...)
At the end of the run, the best score and hyperparameter configuration that achieved the best performance are reported. As you can see we were able to improve the R-squared value from 0.73 to 0.79 by tuning the hyperparameters.
Implementing Randomized Search for Classification?
In this example, we will create our own dataset through the sklearn.datasets package.
Package: sklearn.datasets.make_classification
# create dataset
from sklearn.datasets import make_classification
X, y = make_classification(
n_classes=2, class_sep=0.5, weights=[0.6, 0.4],
n_informative=3, n_redundant=1, flip_y=0.3,
n_features=20, n_clusters_per_class=3,
n_samples=50000, random_state=11
)
model_dataset = pd.DataFrame(X)
model_dataset['Class'] = y
# shuffle
model_dataset = sklearn.utils.shuffle(model_dataset)
# plot
plt.figure(figsize=(15,8))
plt.suptitle('Target Flag')
sns.barplot(x='Class', y='Customers', hue='Class',
data= model_dataset.groupby(['Class']).size().reset_index(name='Customers'))
plt.show()
In this section, we will explore hyperparameter optimisation of the XGBoost classification model on the dataset we created in the previous step.
Next, we will split the dataset.
# split dataset
n = len(model_dataset)
train_df = model_dataset[0:int(n*0.8)]
test_df = model_dataset[int(n*0.8):]
X_train, y_train = train_df.iloc[:,:-1], train_df.iloc[:,-1]
X_test, y_test = test_df.iloc[:,:-1], test_df.iloc[:,-1]
First, we will define the model that will be optimised and use default values for the hyperparameters that will not be optimised.
# baseline
xgb_base = xgb.XGBClassifier(n_estimators=100)
training_start = time.perf_counter()
xgb_base.fit(X_train, y_train)
training_end = time.perf_counter()
prediction_start = time.perf_counter()
y_base = xgb_base.predict(X_test)
prediction_end = time.perf_counter()
acc_xgb = (y_base == y_test).sum().astype(float) / len(y_base)*100
xgb_train_time = training_end-training_start
xgb_prediction_time = prediction_end-prediction_start
print("XGBoost's prediction accuracy is: %3.2f" % (acc_xgb))
print("Time consumed for training: %4.3f seconds" % (xgb_train_time))
print("Time consumed for prediction: %6.5f seconds" % (xgb_prediction_time))
Output:
XGBoost's prediction accuracy is: 72.87
Time consumed for training: 3.137 seconds
Time consumed for prediction: 0.01386 seconds
Next, we will split the dataset including the validation set for evaluation.
# split dataset including validation set for evaluation
n = len(model_dataset)
train_df = model_dataset[0:int(n*0.8)]
val_df = model_dataset[int(n*0.8):int(n*0.9)]
test_df = model_dataset[int(n*0.9):]
X_train, y_train = train_df.iloc[:,:-1], train_df.iloc[:,-1]
X_val, y_val = val_df.iloc[:,:-1], val_df.iloc[:,-1]
X_test, y_test = test_df.iloc[:,:-1], test_df.iloc[:,-1]
Next, we can define the search procedure with all of these elements.
Finally, we can perform the optimisation and report the results.
# baseline
xgb_reg = xgb.XGBClassifier()
# define search space
params = {
'num_boost_round': [5, 10, 15, 25],
'eta': [0.05, 0.001, 0.1, 0.3],
'max_depth': [3, 6, 5, 8],
'subsample': [0.9, 1, 0.8],
'colsample_bytree': [0.9, 1, 0.8],
'alpha': [0.1, 0.3, 0]
}
# define search
random_search = RandomizedSearchCV(xgb_reg, params, n_jobs=-1, cv=5, n_iter=500, verbose=1)
# execute search
start = time.time()
random_search.fit(X_train,
y_train,
eval_set=[(X_train, y_train), (X_val, y_val)],
early_stopping_rounds=10,
verbose=True)
best_parameters = random_search.best_params_
# print results
print('RandomizedSearchCV Results: ')
print(random_search.best_score_)
print('Best Parameters: ')
for param_name in sorted(best_parameters.keys()):
print("%s: %r" % (param_name, best_parameters[param_name]))
end = time.time()
print('time elapsed: ' + str(end-start))
print(' ')
print('Best Estimator: ')
print(random_search.best_estimator_)
y_pred = random_search.predict(X_test)
Output:
Fitting 5 folds for each of 500 candidates, totalling 2500 fits
[20:36:17] WARNING: /Users/travis/build/dmlc/xgboost/src/learner.cc:480:
Parameters: { num_boost_round } might not be used.
This may not be accurate due to some parameters are only used in language bindings but passed down to XGBoost core. Or some parameters are not used but slip through this verification. Please open an issue if you find the above cases.
[0] validation_0-error:0.27945 validation_1-error:0.28940
Multiple eval metrics have been passed: 'validation_1-error' will be used for early stopping.
Will train until validation_1-error hasn't improved in 10 rounds.
[1] validation_0-error:0.27148 validation_1-error:0.27780
[2] validation_0-error:0.26435 validation_1-error:0.27160
[3] validation_0-error:0.26600 validation_1-error:0.27140
[4] validation_0-error:0.26253 validation_1-error:0.26840
[5] validation_0-error:0.25917 validation_1-error:0.26360
[6] validation_0-error:0.25912 validation_1-error:0.26520
[7] validation_0-error:0.25548 validation_1-error:0.26200
[8] validation_0-error:0.25520 validation_1-error:0.26220
[9] validation_0-error:0.25370 validation_1-error:0.26120
[10] validation_0-error:0.25357 validation_1-error:0.26040
[11] validation_0-error:0.25285 validation_1-error:0.25860
[12] validation_0-error:0.25117 validation_1-error:0.25900
[13] validation_0-error:0.24840 validation_1-error:0.26000
[14] validation_0-error:0.24800 validation_1-error:0.25920
[15] validation_0-error:0.24802 validation_1-error:0.25780
[16] validation_0-error:0.24740 validation_1-error:0.25720
[17] validation_0-error:0.24830 validation_1-error:0.25600
[18] validation_0-error:0.24775 validation_1-error:0.25760
[19] validation_0-error:0.24755 validation_1-error:0.25800
[20] validation_0-error:0.24570 validation_1-error:0.25740
[21] validation_0-error:0.24695 validation_1-error:0.25860
[22] validation_0-error:0.24588 validation_1-error:0.25780
[23] validation_0-error:0.24463 validation_1-error:0.25600
[24] validation_0-error:0.24407 validation_1-error:0.25580
[25] validation_0-error:0.24405 validation_1-error:0.25580
[26] validation_0-error:0.24352 validation_1-error:0.25640
[27] validation_0-error:0.24275 validation_1-error:0.25640
[28] validation_0-error:0.24205 validation_1-error:0.25500
[29] validation_0-error:0.24158 validation_1-error:0.25480
[30] validation_0-error:0.24130 validation_1-error:0.25500
[31] validation_0-error:0.24083 validation_1-error:0.25360
[32] validation_0-error:0.24060 validation_1-error:0.25260
[33] validation_0-error:0.23988 validation_1-error:0.25220
[34] validation_0-error:0.23885 validation_1-error:0.25300
[35] validation_0-error:0.23805 validation_1-error:0.25300
[36] validation_0-error:0.23770 validation_1-error:0.25240
[37] validation_0-error:0.23733 validation_1-error:0.25120
[38] validation_0-error:0.23670 validation_1-error:0.25060
[39] validation_0-error:0.23622 validation_1-error:0.25140
[40] validation_0-error:0.23522 validation_1-error:0.24980
[41] validation_0-error:0.23380 validation_1-error:0.24940
[42] validation_0-error:0.23187 validation_1-error:0.25000
[43] validation_0-error:0.23117 validation_1-error:0.24920
[44] validation_0-error:0.23075 validation_1-error:0.24760
[45] validation_0-error:0.23008 validation_1-error:0.24740
[46] validation_0-error:0.22930 validation_1-error:0.24880
[47] validation_0-error:0.22785 validation_1-error:0.24800
[48] validation_0-error:0.22678 validation_1-error:0.24800
[49] validation_0-error:0.22623 validation_1-error:0.24860
[50] validation_0-error:0.22535 validation_1-error:0.24920
[51] validation_0-error:0.22498 validation_1-error:0.24840
[52] validation_0-error:0.22228 validation_1-error:0.24900
[53] validation_0-error:0.22193 validation_1-error:0.24820
[54] validation_0-error:0.22168 validation_1-error:0.24860
[55] validation_0-error:0.22132 validation_1-error:0.24940
Stopping. Best iteration:
[45] validation_0-error:0.23008 validation_1-error:0.24740
RandomizedSearchCV Results:
0.744675
Best Parameters:
alpha: 0
colsample_bytree: 0.9
eta: 0.05
max_depth: 8
num_boost_round: 5
subsample: 0.9
time elapsed: 2141.411530017853
Best Estimator:
XGBClassifier(alpha=0, base_score=0.5, booster='gbtree', colsample_bylevel=1,colsample_bynode=1, colsample_bytree=0.9, eta=0.05, gamma=0,gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.0500000007, max_delta_step=0, max_depth=8,
min_child_weight=1, missing=nan, monotone_constraints='()',
n_estimators=100, n_jobs=0, num_boost_round=5, num_parallel_tree=1, random_state=0, reg_alpha=0, reg_lambda=1,scale_pos_weight=1, subsample=0.9, tree_method='exact', validate_parameters=1, verbosity=None)
At the end of the run, the best score and hyperparameter configuration that achieved the best performance are reported. As you can see we were able to improve the accuracy from 0.72 to 0.74 by tuning the hyperparameters. This is not a huge improvement but still an improvement, as this is only a dummy dataset created by us in this post and doesn’t hold any realistic measure for the model to perform accurately.
Summary
In this article, we covered What is Hyperparameter Tuning? What is Randomized Search? Implementing Randomized Search on Regression Model? Implementing Randomized Search on Classification Model?
Framework: Jupyter Notebook, Language: Python, Libraries: sklearn, seaborn, matplotlib and pandas.
Reference
Wait! don’t forget to follow me on LinkedIn ;)
Thanks for your support, it motivates me to create more content.