Language Model (Keras Implementation)
Keras is high-level API with tensorflow/theano/CKTN backend. In this article, I will discuss simple character level language model.
First import required libraries
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
Loading the file and aggregating the unique character and number of characters. Below is the result of my file. I have chosen text file with no special character. If you get the file with special character then you need to clean it for better performance.
['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
There are 19909 total characters and 27 unique characters in your data.
Now, creating 2 dictionaries, mapping character to index and index to character. In all deep learning framework, we just have to describe forward propagation in the neural net and the framework automatically takes care of backward propagation which helps us in gradient descent.
Here for this model, we will be using LSTM network. for more information on LSTM, you can visit this blog (https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
LSTM helps in controlling the flow of values as well as remembers some values in the network which makes it best for the models having longterm dependency. But, there is one problem is LSTM that is the problem of exploding gradient which can be fixed by gradient clipping.
Coming back to Keras model, For feeding data into the model, we need to preprocess and transform the data. I have chosen sequence size of 100(you can choose whatever you want) then normalize it. Now, by our character to index dictionary, we will convert all created sequences into the index and index to character for the conversion after getting result.
RNN Model:
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
Here I have chosen adam optimizer and loss function as cross-entropy. If you want to accelerate the training process you can you Latency trick also. It is also suggested to create check point so that you can load only those epoch's parameters which showed minimum loss during training.
My loss after 20 epochs(if you are running LSTM on your PC it will take some time. For me it took around 25 mins to run.):
Epoch 20/20
19712/19809 [============================>.] - ETA: 0s - loss: 2.0527Epoch 00020: loss improved from 2.08093 to 2.05359, saving model to weights-improvement-20-2.0536.hdf5
19809/19809 [==============================] - 127s 6ms/step - loss: 2.0536
After predicting on created model, I got this kind of output:
orosaurus orthogoniosaurus orthomerus oryctodromeus oshanosaurus osmakasaurus ostafrikasaurus ostro " oaurus
- To learn more about text generation, check out Karpathy's blog post.
- You can refer this blog for code and explanation: https://chunml.github.io/ChunML.github.io/project/Creating-Text-Generator-Using-Recurrent-Neural-Network/
Silicon Solutions Engineer at NVIDIA | Previously worked at Intel & Tesla
7 年Excellent!