A Small Overview and Demo of Google Flan-T5 Model
https://www.narrativa.com/flan-t5-a-yummy-model-superior-to-gpt-3/

A Small Overview and Demo of Google Flan-T5 Model

This article presents an overview of the Google Flan-T5 model developed by Google.

The content of this post:

  1. What is Flan-T5 model?
  2. Packages for running Flan-T5 model
  3. A Demo of using Google Flan-T5 model


What is Flan-T5 model?

FLAN-T5 is a combination of two: a network and a model. Here, FLAN is Finetuned LAnguage Net and T5 is a language model developed and published by Google in 2020. This model provides an improvement on the T5 model by improving the effectiveness of the zero-shot learning. Google have developed and published several language models: BERT (in 2018), PaLM (in 2022), and LaMDA (in 2022).

FLAN-T5 model comes with many variants based on the numbers of parameters.

  • FLAN-T5 small (60M)
  • FLAN-T5 base (250M)
  • FLAN-T5 large (780M)
  • FLAN-T5 XL (3B)
  • FLAN-T5 XXL (11B)

Packages for running Flan-T5 model

  • transformers
  • sentencepiece
  • accelerate

The above mentioned packages can be installed using pip command in the console.

pip install transformers
pip install sentencepiece
pip install accelerate        

Demo of Google Flan-T5 model

Step 1: Importing packages and downloading the Google Flan-T5 model. (In this example, I used Google FLAN-T5 large (780M) model.

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="auto")        

Step 2: Writing a function to parse the query to the model to generate the results. (This function is taken from the post by Koki Noda).

def inference(input_text)
  input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
  outputs = model.generate(input_ids, max_length=200, bos_token_id=0)
  result = tokenizer.decode(outputs[0], skip_special_tokens=True)
  print(result):        

Step 3: Pass the input text to the model and print the results.

No alt text provided for this image
Google FLAN-T5 on Google Colab

As the model is smaller in size (780M parameters) it may provide correct answers and wrong answers.

References:

  1. https://medium.com/@koki_noda/try-language-models-with-python-google-ais-flan-t5-ba72318d3be6
  2. https://www.narrativa.com/flan-t5-a-yummy-model-superior-to-gpt-3/

Assaf Toledo

Research Staff Member at IBM

1 年

Hi, Small correction: FLAN-T5 small is 60M params. See https://arxiv.org/pdf/1910.10683.pdf, page 36. at the bottom. Best

要查看或添加评论,请登录

社区洞察

其他会员也浏览了