登录查看更多内容

Language Model (Keras Implementation)

Abhijeet Khandelwal

Software Engineering Manager(ML)

发布日期: 2018年3月7日

Keras is high-level API with tensorflow/theano/CKTN backend. In this article, I will discuss simple character level language model.

First import required libraries

import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

Loading the file and aggregating the unique character and number of characters. Below is the result of my file. I have chosen text file with no special character. If you get the file with special character then you need to clean it for better performance.

['\n', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
There are 19909 total characters and 27 unique characters in your data.

Now, creating 2 dictionaries, mapping character to index and index to character. In all deep learning framework, we just have to describe forward propagation in the neural net and the framework automatically takes care of backward propagation which helps us in gradient descent.

Here for this model, we will be using LSTM network. for more information on LSTM, you can visit this blog (https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTM helps in controlling the flow of values as well as remembers some values in the network which makes it best for the models having longterm dependency. But, there is one problem is LSTM that is the problem of exploding gradient which can be fixed by gradient clipping.

Coming back to Keras model, For feeding data into the model, we need to preprocess and transform the data. I have chosen sequence size of 100(you can choose whatever you want) then normalize it. Now, by our character to index dictionary, we will convert all created sequences into the index and index to character for the conversion after getting result.

RNN Model:

model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

Here I have chosen adam optimizer and loss function as cross-entropy. If you want to accelerate the training process you can you Latency trick also. It is also suggested to create check point so that you can load only those epoch's parameters which showed minimum loss during training.

My loss after 20 epochs(if you are running LSTM on your PC it will take some time. For me it took around 25 mins to run.):

Epoch 20/20
19712/19809 [============================>.] - ETA: 0s - loss: 2.0527Epoch 00020: loss improved from 2.08093 to 2.05359, saving model to weights-improvement-20-2.0536.hdf5
19809/19809 [==============================] - 127s 6ms/step - loss: 2.0536

After predicting on created model, I got this kind of output:

orosaurus
orthogoniosaurus
orthomerus
oryctodromeus
oshanosaurus
osmakasaurus
ostafrikasaurus
ostro "
oaurus

To learn more about text generation, check out Karpathy's blog post.
You can refer this blog for code and explanation: https://chunml.github.io/ChunML.github.io/project/Creating-Text-Generator-Using-Recurrent-Neural-Network/

Bhumika Khandelwal

Silicon Solutions Engineer at NVIDIA | Previously worked at Intel & Tesla

7 年

Excellent!

要查看或添加评论，请登录

Abhijeet Khandelwal的更多文章

PCA or NMF ?

2018年4月13日

PCA or NMF ?

When It comes to dimension reduction people generally prefer to perform Principal Component Analysis(PCA) or…
Shallow Neural Network

2018年4月11日

Shallow Neural Network

It's always enjoyable feeling to write each piece of code from scratch without using any kind of Framework in Machine…

Language Model (Keras Implementation)

Abhijeet Khandelwal

Software Engineering Manager(ML)

Abhijeet Khandelwal的更多文章

社区洞察

其他会员也浏览了

New Open Source Projects, NGINX Tutorial, Running Ollama on Kubernetes, Deep Learning Book

New Book on Synthetic Data: Version 3.0 Just Released

Machine Learning Libraries

AI Foundation: Creating a small Language Model (LLM) for a lab exercise

The Top 4 Reasons to Learn PyTorch (and start getting into AI)

Do Transformers Really Perform Bad for Graph Representation?

Issue #194 - THE ML ENGINEER ??

Roadmap of skills required to create AI Agent

Applied Machine Learning: CNNs for Image Recognition

5th grade data science (NLP: Computers & Text)

Abhijeet Khandelwal的更多文章

PCA or NMF ?

Shallow Neural Network

社区洞察

其他会员也浏览了

New Open Source Projects, NGINX Tutorial, Running Ollama on Kubernetes, Deep Learning Book

New Book on Synthetic Data: Version 3.0 Just Released

Machine Learning Libraries

AI Foundation: Creating a small Language Model (LLM) for a lab exercise

The Top 4 Reasons to Learn PyTorch (and start getting into AI)

Do Transformers Really Perform Bad for Graph Representation?

Issue #194 - THE ML ENGINEER ??

Roadmap of skills required to create AI Agent

Applied Machine Learning: CNNs for Image Recognition

5th grade data science (NLP: Computers & Text)