Text Generation with OpenAI's GPT-2 in Production
Open AI has just recently released the largest version of GPT-2, which has 1.5B parameters, 9 months after its initial release of smallest version, which had 124M parameters. Back in February 2019, Elon Musk's OpenAI released a statement stating that OpenAI's GPT-2 is so good at generating text that it is dangerous to release it. This was the stated reason for the delayed release of the largest model. It is here now and whether it is dangerous or not, only time will tell. Research is great, but getting it to production and using it in production systems is what makes a difference in real life.
In this blog I will share with you tips, tricks and tools to get Open AIs GPT 2 to production, and give you a working Docker container image for free which you can deploy to your production environment. Before we dive into technical implementation, let's look at what is Text Generation and how Open AI GPT 2 model facilitates it.
What is OpenAI?
OpenAI is an AI research company based in San Fransisco, US. OpenAI’s mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. By AGI we mean highly autonomous systems that outperform humans at most economically valuable work.
What is GPT 2 model?
GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data. Developed by OpenAI, GPT-2 is a pre-trained language model.
“GPT-2 achieves state-of-the-art scores on a variety of domain-specific language modeling tasks. Our model is not trained on any of the data specific to any of these tasks and is only evaluated on them as a final test; this is known as the “zero-shot” setting. GPT-2 outperforms models trained on domain-specific data sets (e.g. Wikipedia, news, books) when evaluated on those same data sets.” – Open AI team.
For more info on the architecture and working of the model, read the research paper published by OpenAI.
On a practical level, it can do various Natural Language Processing tasks, such as:
- Text generation
- Language translation
- Reading Comprehension
- Building question-answering systems
- Writing assistance
- Creative writing and art
- Entertainment: Creation of games, chat bots, and amusing generations.
Here is an example of how the model generated a story or text upon giving it a short sentence(s).
To get a quick feel, You can experiment with the model here: https://talktotransformer.com/
Now let's Deploy it to Prod....
We will use python for the application and deployment. We use Flask application to infer the GPT-2 model on an Apache web server(suitable for production, can handle large number of requests), Docker to containerise it and deploy it on a Kubernetes cluster which handles service orchestration and scaling. To get started,
Clone this repository: https://github.com/emmanuelraj7/opengpt2.git
git clone https://github.com/emmanuelraj7/opengpt2.git
Key files...
a) download_models.py - Downloads needed model artefacts.
b) flask_predict_api.py - Flask based web application that provides GPT-2 model as a service in form of a REST API
To run it locally...
All steps can optionally be done in a virtual environment using tools such as virtualenv or conda. Have version python 3.6(not 3.7). Install tensorflow 1.12 (with GPU support, if you have a GPU and want everything to run faster)
pip3 install tensorflow==1.12.0
or
pip3 install tensorflow-gpu==1.12.0
Install other needed python packages
cd flask_demo pip3 install -r requirements.txt
Download model data: downloads models with 124, 355, 774 Million and 1.5 Billion parameters
python3 download_model.py 124M python3 download_model.py 355M python3 download_model.py 774M python3 download_model.py 1558M
Run inference-API
python flask_predict_api.py
Running it in production...
1. Build a Docker image
FROM continuumio/anaconda3:4.4.0 MAINTAINER Emmanuel Raj, AI Engineer EXPOSE 8000 RUN apt-get update && apt-get install -y apache2 \ apache2-dev \ vim \ && apt-get clean \ && apt-get autoremove \ && rm -rf /var/lib/apt/lists/* WORKDIR /var/www/flask_predict_api/ COPY ./flask_predict_api.wsgi /var/www/flask_predict_api/flask_predict_api.wsgi COPY ./flask_demo /var/www/flask_predict_api/ RUN pip install -r requirements.txt RUN python3 download_model.py 124M RUN python3 download_model.py 355M RUN python3 download_model.py 774M RUN python3 download_model.py 1558M RUN /opt/conda/bin/mod_wsgi-express install-module RUN mod_wsgi-express setup-server flask_predict_api.wsgi --port=8000 \ --user www-data --group www-data \ --server-root=/etc/mod_wsgi-express-80 CMD /etc/mod_wsgi-express-80/apachectl start -D FOREGROUND
From above DockerFile we build a docker image to deploy. To begin with we use a pre made container image continuumio/anaconda3:4.4.0 from Docker hub, we Expose port 8000 where Apache server will run the server, then install apache, create a workdir for the docker container, copy all needed files from the repo and install all needed dependencies from requirements.txt.
Next, we download needed artifacts - model files by using file download_model.py. We create a web server gateway interface(WSGI) facilitated by the file - flask_predict_api.wsgi, using the apache server to host our web application since the flask development server is not fit for production. Then we containerize flask application with Apache server into a docker container and run it at port 8000.
To run build this docker image:
docker build --tag openai-gpt2 .
push this docker image to Docker hub or you're desired docker registry. From there it should be ready to be deployed to a Kubernetes cluster.
To run it locally:
docker run -d -p 8000:8000 containerid
This application has Swagger UI by which you can use and test the service on: https://localhost:8000/apidocs/
Or simply make a get request to endpoint 'get_text_generate' with 'input_text' and 'model_name' parameters. Eg:
https://localhost:8000/text-generate?input_text=where%20is%20finland%3F&model_name=124M
Upload docker image to a hub/registry:
docker push openai-gpt2:latest
2. Deploy the Docker container image to a Kubernetes cluster
Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management. You can deploy this using an ideal tool of your choice eg: Kubectl, Helm or Kompose.
Here is a dashboard view of my Kubernetes cluster where I have deployed the service(using Helm) which auto-scales and is ready to handle multiple and large numbers of requests in a go.
Thanks for reading. It's been my pleasure to share with you on bringing Open AI's GPT-2 Model to production. I will be happy to help you deploy state of the art AI models to production. Please feel free to reach out to me in case of any questions.
Owner
5 年Very interesting article. Thanks :)?