A Small Overview and Demo of Google Flan-T5 Model
Balayogi G
Interdisciplinary enthusiast | Ph.D. Candidate in Computer Science | UGC NET Certified | Human-Computer Interaction | Accessibility | Usable Security | Artificial Intelligence | Computational Security
This article presents an overview of the Google Flan-T5 model developed by Google.
The content of this post:
What is Flan-T5 model?
FLAN-T5 is a combination of two: a network and a model. Here, FLAN is Finetuned LAnguage Net and T5 is a language model developed and published by Google in 2020. This model provides an improvement on the T5 model by improving the effectiveness of the zero-shot learning. Google have developed and published several language models: BERT (in 2018), PaLM (in 2022), and LaMDA (in 2022).
FLAN-T5 model comes with many variants based on the numbers of parameters.
Packages for running Flan-T5 model
The above mentioned packages can be installed using pip command in the console.
pip install transformers
pip install sentencepiece
pip install accelerate
Demo of Google Flan-T5 model
Step 1: Importing packages and downloading the Google Flan-T5 model. (In this example, I used Google FLAN-T5 large (780M) model.
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="auto")
Step 2: Writing a function to parse the query to the model to generate the results. (This function is taken from the post by Koki Noda).
def inference(input_text)
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids, max_length=200, bos_token_id=0)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result):
Step 3: Pass the input text to the model and print the results.
As the model is smaller in size (780M parameters) it may provide correct answers and wrong answers.
References:
Research Staff Member at IBM
1 年Hi, Small correction: FLAN-T5 small is 60M params. See https://arxiv.org/pdf/1910.10683.pdf, page 36. at the bottom. Best