Regression using Neural Network

Keras a wrapper API that runs on top of Tensorflow is very popular and easy to use. Scikitlearn also very popular libraries for machine learning.In this post I will show how to use keras and scikitlearn to build neural network architecture in python and develop a regression linear model.

Jump to code and spare reading time:

https://github.com/GKarmakar/RegressionUsingNN

Define a base model to be used to build a model for regression using scikitlearn API KerasRegressor.

def baseline_model_1057(optimizer=’adam’):

 # create model

 model = Sequential()

 model.add(Dense(1058, activation=’relu’, 

 kernel_regularizer = ‘l2’, 

 kernel_initializer = ‘normal’, 

 input_shape=(1057,)))

 model.add(BatchNormalization())

 model.add(Dropout(0.5))

 model.add(Dense(529, activation=’relu’, 

 kernel_regularizer = ‘l2’,

 kernel_initializer = ‘normal’))

 model.add(BatchNormalization())

 model.add(Dropout(0.5))

 model.add(Dense(1, activation=’linear’,

 kernel_regularizer = ‘l2’, 

 kernel_initializer=’normal’))

 model.compile(loss=’mse’, optimizer=optimizer, metrics=[‘accuracy’])

 return model

Now we write a method for training the model we created above:

def train_data_nn(X_train, y_train):

 

 np.random.seed(42)

 # create model

 estimator = KerasRegressor(build_fn=baseline_model_1057, epochs=100, batch_size=10, verbose=0)

 kfold = KFold(n_splits=10, random_state=42)

 results = cross_val_score(estimator, X_train, y_train, cv=kfold) 

 print(“Standardized: %.2f (%.2f) MSE” % (results.mean(), results.std()))

 return estimator

Define a method to visualize loss — we are using MSE loss for regression.

def visualize_learning_curve(history):

 # summarize history for loss

 plt.plot(history.history[‘loss’])

 plt.plot(history.history[‘val_loss’])

 plt.title(‘model loss’)

 plt.ylabel(‘loss’)

 plt.xlabel(‘epoch’)

 plt.legend([‘train’, ‘test’], loc=’upper left’)

 plt.show()

Main method to perform data preprocessing such as replace null values, standardize data and split into train and test.

def train_and_predict(Xtrain, Xtest):

 X = Xtrain

 y = X[‘rank’]

 X.drop(“rank”, inplace=True, axis=1) 

 

 null_cols = X.columns[X.isnull().all()]

 X.drop(null_cols, inplace=True, axis=1)

 nunique = X.apply(pd.Series.nunique)

 null_col_uni = nunique[nunique == 1].index

 X.drop(null_col_uni, inplace=True, axis=1)

 

 

 Xtest.drop(null_cols, inplace=True, axis=1) 

 Xtest.drop(null_col_uni, inplace=True, axis=1)

print(‘Train size:’, X.shape, ‘ Test size:’, Xtest.shape)

 

 seed = 7

 numpy.random.seed(seed)

 X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1, random_state=42)

 

 scaler = StandardScaler().fit(X_train)

 X_train = scaler.transform(X_train)

 X_val = scaler.transform(X_val)

 estimator = train_data_nn(X_train, y_train)

 early_stopping = EarlyStopping(monitor=’loss’, patience=1, verbose=1) 

 history = estimator.fit(X_train, y_train, validation_split=0.1, 

 epochs=100, batch_size=10, 

 callbacks=[early_stopping], 

 verbose=1)

 visualize_learning_curve(history)

 rmse = math.sqrt(mean_squared_error(y_val.values, estimator.predict(X_val.values)))

 print(rmse)

 pred = estimator.predict(X_test)

 test_df = pd.DataFrame({‘y_pred’: pred}) 

 return test_df

Data processing from train and test data files:

df_train = pd.read_csv(“train.csv”)

df_test = pd.read_csv(“test.csv”)

train_num = len(df_train)

df_test.insert(0, ‘rank’, 0)

dataset = pd.concat(objs=[df_train, df_test], axis=0)

dataset = shuffle(dataset)

dataset.fillna(0, inplace=True)

df_train = dataset[:train_num]

df_test = dataset[train_num:]

df_test.drop(‘rank’, inplace=True, axis=1)

print(“Train Data:”, df_train.shape)

print(“Test Data:”, df_test.shape)

Create predictions and submission file for kaggle like submission.

test_df = train_and_predict_new(df_train, df_test)

submission = test_df

submission.sort_index(inplace=True)

submission.loc[submission[‘y_pred’] < 0, ‘y_pred’] = 0

submission.loc[submission[‘y_pred’] > 100, ‘y_pred’] = 100

submission.to_csv(“submission.csv”, index=False)

Grid Search Deep Learning Model Parameters

The previous example showed how easy it is to wrap your deep learning model from Keras and use it in functions from the scikit-learn library.

In this example, we go a step further. The function that we specify to the build_fn argument when creating the KerasRegressor wrapper can take arguments. We can use these arguments to further customize the construction of the model. In addition, we know we can provide arguments to the fit() function.

In this example, we use a grid search to evaluate different configurations for our neural network model and report on the combination that provides the best-estimated performance.

The create_model() function is defined to take two arguments optimizer and init, both of which must have default values. This will allow us to evaluate the effect of using different optimization algorithms and weight initialization schemes for our network.

After creating our model, we define arrays of values for the parameter we wish to search, specifically:

Optimizers for searching different weight values. Initializers for preparing the network weights using different schemes. Epochs for training the model for a different number of exposures to the training dataset. Batches for varying the number of samples before a weight update. The options are specified into a dictionary and passed to the configuration of the GridSearchCV scikit-learn class. This class will evaluate a version of our neural network model for each combination of parameters (2 x 3 x 3 x 3 for the combinations of optimizers, initializations, epochs and batches). Each combination is then evaluated using the default of 3-fold stratified cross validation.

That is a lot of models and a lot of computation. This is not a scheme that you want to use lightly because of the time it will take. It may be useful for you to design small experiments with a smaller subset of your data that will complete in a reasonable time. This is reasonable in this case because of the small network and the small dataset (less than 1000 instances and 9 attributes).

Finally, the performance and combination of configurations for the best model are displayed, followed by the performance of all combinations of parameters.

This might take about 5 minutes to complete on your workstation executed on the CPU (rather than CPU). running the example shows the results below.

We can see that the grid search discovered that using a uniform initialization scheme, rmsprop optimizer, 150 epochs and a batch size of 5 achieved the best cross-validation score of approximately 75% on this problem.

def gridSearch_neural_network(df_train, ytrain):

 # fix random seed for reproducibility

 seed = 7

 numpy.random.seed(seed)

 X_train, X_val, y_train, y_val = train_test_split(df_train, ytrain, test_size=0.1, random_state=42)

 

 print(“Train Data:”, X_train.shape)

 print(“Train label:”, y_train.shape)

 # evaluate model with standardized dataset

 estimator = KerasRegressor(build_fn=baseline_model, nb_epoch=100, batch_size=5, verbose=0)

 

 # grid search epochs, batch size and optimizer

 optimizers = [‘rmsprop’, ‘adam’]

 dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

 init = [‘glorot_uniform’, ‘normal’, ‘uniform’]

 epochs = [50, 100, 150]

 batches = [5, 10, 20]

 weight_constraint = [1, 2, 3, 4, 5]

 param_grid = dict(optimizer=optimizers, 

 dropout_rate=dropout_rate, 

 epochs=epochs, 

 batch_size=batches, 

 weight_constraint=weight_constraint, 

 init=init)

 

 grid = GridSearchCV(estimator=estimator, param_grid=param_grid)

 grid_result = grid.fit(X_train.values, y_train.values)

 # summarize results

 print(“Best: %f using %s” % (grid_result.best_score_, grid_result.best_params_))

 means = grid_result.cv_results_[‘mean_test_score’]

 stds = grid_result.cv_results_[‘std_test_score’]

 params = grid_result.cv_results_[‘params’]

 for mean, stdev, param in zip(means, stds, params):

 print(“%f (%f) with: %r” % (mean, stdev, param))

Summary

In this post, you discovered how you can wrap your Keras deep learning models and use them in the scikit-learn general machine learning library.

You can see that using scikit-learn for standard machine learning operations such as model evaluation and model hyperparameter optimization can save a lot of time over implementing these schemes yourself.

Wrapping your model allowed you to leverage powerful tools from scikit-learn to fit your deep learning models into your general machine learning process

要查看或添加评论,请登录

Gautam Karmakar的更多文章

社区洞察

其他会员也浏览了