登录查看更多内容

Text Generation with OpenAI's GPT-2 in Production

Emmanuel Raj

Lead Machine Learning Engineer ? Author ? MLOps Expert ? Agents

发布日期: 2019年11月15日

Open AI has just recently released the largest version of GPT-2, which has 1.5B parameters, 9 months after its initial release of smallest version, which had 124M parameters. Back in February 2019, Elon Musk's OpenAI released a statement stating that OpenAI's GPT-2 is so good at generating text that it is dangerous to release it. This was the stated reason for the delayed release of the largest model. It is here now and whether it is dangerous or not, only time will tell. Research is great, but getting it to production and using it in production systems is what makes a difference in real life.

In this blog I will share with you tips, tricks and tools to get Open AIs GPT 2 to production, and give you a working Docker container image for free which you can deploy to your production environment. Before we dive into technical implementation, let's look at what is Text Generation and how Open AI GPT 2 model facilitates it.

What is OpenAI?

OpenAI is an AI research company based in San Fransisco, US. OpenAI’s mission is to ensure that artificial general intelligence (AGI) benefits all of humanity. By AGI we mean highly autonomous systems that outperform humans at most economically valuable work.

What is GPT 2 model?

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than 10X the amount of data. Developed by OpenAI, GPT-2 is a pre-trained language model.

“GPT-2 achieves state-of-the-art scores on a variety of domain-specific language modeling tasks. Our model is not trained on any of the data specific to any of these tasks and is only evaluated on them as a final test; this is known as the “zero-shot” setting. GPT-2 outperforms models trained on domain-specific data sets (e.g. Wikipedia, news, books) when evaluated on those same data sets.” – Open AI team.

For more info on the architecture and working of the model, read the research paper published by OpenAI.

On a practical level, it can do various Natural Language Processing tasks, such as:

Text generation
Language translation
Reading Comprehension
Building question-answering systems
Writing assistance
Creative writing and art
Entertainment: Creation of games, chat bots, and amusing generations.

Here is an example of how the model generated a story or text upon giving it a short sentence(s).

To get a quick feel, You can experiment with the model here: https://talktotransformer.com/

Now let's Deploy it to Prod....

We will use python for the application and deployment. We use Flask application to infer the GPT-2 model on an Apache web server(suitable for production, can handle large number of requests), Docker to containerise it and deploy it on a Kubernetes cluster which handles service orchestration and scaling. To get started,

Clone this repository: https://github.com/emmanuelraj7/opengpt2.git

git clone https://github.com/emmanuelraj7/opengpt2.git

Key files...

a) download_models.py - Downloads needed model artefacts.

b) flask_predict_api.py - Flask based web application that provides GPT-2 model as a service in form of a REST API

To run it locally...

All steps can optionally be done in a virtual environment using tools such as virtualenv or conda. Have version python 3.6(not 3.7). Install tensorflow 1.12 (with GPU support, if you have a GPU and want everything to run faster)

pip3 install tensorflow==1.12.0

pip3 install tensorflow-gpu==1.12.0

Install other needed python packages

cd flask_demo
pip3 install -r requirements.txt

Download model data: downloads models with 124, 355, 774 Million and 1.5 Billion parameters

python3 download_model.py 124M
python3 download_model.py 355M
python3 download_model.py 774M
python3 download_model.py 1558M

Run inference-API

python flask_predict_api.py

Running it in production...

1. Build a Docker image

FROM continuumio/anaconda3:4.4.0
MAINTAINER Emmanuel Raj, AI Engineer
EXPOSE 8000
RUN apt-get update && apt-get install -y apache2 \
	    apache2-dev \   
	    vim \
	 && apt-get clean \
	 && apt-get autoremove \
	 && rm -rf /var/lib/apt/lists/*
WORKDIR /var/www/flask_predict_api/
COPY ./flask_predict_api.wsgi /var/www/flask_predict_api/flask_predict_api.wsgi
COPY ./flask_demo /var/www/flask_predict_api/
RUN pip install -r requirements.txt
RUN python3 download_model.py 124M
RUN python3 download_model.py 355M
RUN python3 download_model.py 774M
RUN python3 download_model.py 1558M
RUN /opt/conda/bin/mod_wsgi-express install-module
RUN mod_wsgi-express setup-server flask_predict_api.wsgi --port=8000 \
	    --user www-data --group www-data \
	    --server-root=/etc/mod_wsgi-express-80
	
CMD /etc/mod_wsgi-express-80/apachectl start -D FOREGROUND

From above DockerFile we build a docker image to deploy. To begin with we use a pre made container image continuumio/anaconda3:4.4.0 from Docker hub, we Expose port 8000 where Apache server will run the server, then install apache, create a workdir for the docker container, copy all needed files from the repo and install all needed dependencies from requirements.txt.

Next, we download needed artifacts - model files by using file download_model.py. We create a web server gateway interface(WSGI) facilitated by the file - flask_predict_api.wsgi, using the apache server to host our web application since the flask development server is not fit for production. Then we containerize flask application with Apache server into a docker container and run it at port 8000.

To run build this docker image:

docker build --tag openai-gpt2 .

push this docker image to Docker hub or you're desired docker registry. From there it should be ready to be deployed to a Kubernetes cluster.

To run it locally:

docker run -d -p 8000:8000 containerid

This application has Swagger UI by which you can use and test the service on: https://localhost:8000/apidocs/

Or simply make a get request to endpoint 'get_text_generate' with 'input_text' and 'model_name' parameters. Eg:

https://localhost:8000/text-generate?input_text=where%20is%20finland%3F&model_name=124M

Upload docker image to a hub/registry:

docker push openai-gpt2:latest

2. Deploy the Docker container image to a Kubernetes cluster

Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management. You can deploy this using an ideal tool of your choice eg: Kubectl, Helm or Kompose.

Here is a dashboard view of my Kubernetes cluster where I have deployed the service(using Helm) which auto-scales and is ready to handle multiple and large numbers of requests in a go.

Thanks for reading. It's been my pleasure to share with you on bringing Open AI's GPT-2 Model to production. I will be happy to help you deploy state of the art AI models to production. Please feel free to reach out to me in case of any questions.

David Ts

Owner

5 年

Very interesting article. Thanks :)?

1 次回应

要查看或添加评论，请登录

Emmanuel Raj的更多文章

What is Edge Computing and EdgeAI?

2020年2月17日

What is Edge Computing and EdgeAI?

With the global proliferation of IoT, 5G, efficiency in data transmission and processing is becoming increasingly…

5 条评论
A Year in Review: AI in 2019

2019年12月31日

A Year in Review: AI in 2019

2019 witnessed a surge in Algorithms, Data, Investments, Research papers, and much more. It was a fast-paced year with…

2 条评论
Robust and Scalable ML Lifecycle for a High Performing AI Team

2019年11月20日

Robust and Scalable ML Lifecycle for a High Performing AI Team

There is no denying that we are well into the era of Artificial Intelligence, spurred by algorithmic, and computational…

3 条评论
8 Enablers For Europe's Trustworthy Artificial Intelligence

2019年7月3日

8 Enablers For Europe's Trustworthy Artificial Intelligence

AI Experts and practitioners from all across Europe gathered on 26th June 2019 at the first European AI Alliance held…

2 条评论
3 learnings about fintech and AI from a real Millennial

2016年12月15日

3 learnings about fintech and AI from a real Millennial

Hi I’m Emmanuel and I am a millennial. During the Fall 2016 me and my company Bankiton participated in Nordea's startup…

1 条评论

See all articles

Text Generation with OpenAI's GPT-2 in Production

Emmanuel Raj

Lead Machine Learning Engineer ? Author ? MLOps Expert ? Agents

Now let's Deploy it to Prod....

To run it locally...

Running it in production...

1. Build a Docker image

2. Deploy the Docker container image to a Kubernetes cluster

Emmanuel Raj的更多文章

社区洞察

其他会员也浏览了

Meta Llama 3.1 vs. GPT-4o Mini: The Latest Open-Source AI Model Showdown

DeepSeek - The open source AI model revolution

GPT-4o Mini: Bridging the Gap Between Cost and Capability in AI

GPT-4 Cheat Sheet: What Is GPT-4, and What Is it Capable Of?

Alibaba's Qwen2.5 Max Beats DeepSeek and OpenAI In Performance

GenAI for Dummies

1min.AI Review ??: Lifetime Deal Including API [Midjourney 6.1, OpenAI o1-preview, Luma AI]

Mistral Small 3 Arrives: A Mighty Mini-Model Outpacing GPT-4o Mini and Shaking Up the AI Landscape

What is GPT-4 and why should recruiters be excited by it?

The Hidden Language of AI: A Deep Dive into Embeddings

Now let's Deploy it to Prod....

To run it locally...

Running it in production...

1. Build a Docker image

2. Deploy the Docker container image to a Kubernetes cluster

Emmanuel Raj的更多文章

What is Edge Computing and EdgeAI?

A Year in Review: AI in 2019

Robust and Scalable ML Lifecycle for a High Performing AI Team

8 Enablers For Europe's Trustworthy Artificial Intelligence

3 learnings about fintech and AI from a real Millennial

社区洞察

其他会员也浏览了

Meta Llama 3.1 vs. GPT-4o Mini: The Latest Open-Source AI Model Showdown

DeepSeek - The open source AI model revolution

GPT-4o Mini: Bridging the Gap Between Cost and Capability in AI

GPT-4 Cheat Sheet: What Is GPT-4, and What Is it Capable Of?

Alibaba's Qwen2.5 Max Beats DeepSeek and OpenAI In Performance

GenAI for Dummies

1min.AI Review ??: Lifetime Deal Including API [Midjourney 6.1, OpenAI o1-preview, Luma AI]

Mistral Small 3 Arrives: A Mighty Mini-Model Outpacing GPT-4o Mini and Shaking Up the AI Landscape

What is GPT-4 and why should recruiters be excited by it?

The Hidden Language of AI: A Deep Dive into Embeddings