My experiments with LLaMA 2
Pranav Patil, TOGAF,CSM,PGDIPR,
PLM Architect and Director - Iristechsys Software Services
Llama 2 is a large language model (LLM) developed by Meta that has been trained on over 2 trillion tokens of public data, designed to enable developers and organizations to build generative AI-powered tools
It was released to the public on July 18, 2023 in a blog post on Meta's AI website.?
Since it was published with such fanfare, I decided to experiment with one of the models on the weekend - the Llama-2-13b-hf model.
Though the models are available on the AWS Sagemaker, I did not attempt to use them because my objectives were different.
The model and config files were obtained from :
https://huggingface.co/meta-llama/Llama-2-13b-hf/tree/main
One has to request access to the model by accepting the T&Cs, which is a fairly straightforward process.
The objectives of my experiments were the following:
1. To find out whether such a model can be loaded on a general purpose desktop
2. If the model is loaded, does it process prompts in a reasonable amount of time
3. The influence of input parameters like temperature on the final outcome
4. The influence of prompt variations on the final output
5. The accuracy of the output.
I experimented with two machines:
1. 32GB RAM i5 4 vCPUs + SSD desktop - Ubuntu
2. AWS g3.4xlarge - 122 GB RAM, 16 CPUs, 1 GPU + SSD - Ubuntu, cloud
I started the experimentation with the 32 GB RAM machine - however, it chickened out due to an out-of-memory error. I realized that the memory required for the model to load was much more than the 32 Gigs. I attempted a workaround by increasing the swap space to about 100GB. Fortunately, I had a 1 TB SSD at my disposal.
But even with the workaround, the time taken to load the model was excruciatingly slow, and finally, I stopped my efforts.
Next was the g3.4xlarge cloud EC2 instance. It was set up in a matter of minutes. The virtual environment and dependent libraries took hardly 10 mins.
The model was downloaded in about 3-5 minutes and it took around 10+ mins to load it into the memory.
I had written a small script that read the prompt and the model parameters like temperature, top_k, max_tokens etc from a config file.
Following are a few prompts and the answers that the model provided (I have truncated some of the output for brevity sake):
Prompt: smita is manoj's wife. Amita is manoj's daughter. Sachi is smita's daughter. then the relationship between amita and sachi is of
Answer:
A. Grandmother and granddaughter
B. Mother and daughter
C. Aunt and niece
D. Sister and sister
Ans: A. Grandmother and granddaughter
Prompt: Mumbai:Maharashtra, Chennai:Tamilnadu, Bhopal:Madhya Pradesh, Panaji:Goa, Srinagar
Answer:
Panaji:Goa,Srinagar:Jammu & Kashmir,Shimla:Himachal Pradesh,Thiruvananthapuram:Kerala,Bengaluru:Karnataka,Hyderabad:Telangana,Ahmedabad:Gujarat,Patna:Bihar,Chandigarh:Punjab,Lucknow:Uttar Pradesh,Dehradun:Uttarakhand,Bhubaneswar:Odisha,Kochi:Kerala,Jaipur: thing to do in Jaipur,Jaipur: Rajasthan,Jaipur: Rajasthan,Jaipur: Rajasthan
Prompt: A poem on rains in India would be like this:
Answer:
The rains have come,The rains have come,The rains have come.The rains have come, the rains have come,The rains have come, the rains have come.
Prompt: The prime minister of india is
Answer:
The prime minister of india is the chairman of the planning commission of india.
Which one of the following is the largest inland water body of India?
The correct option is C Chilka lake
Prompt: An email for requesting a leave would be :
Answer:
Dear Mr. /Mrs. (name),
I am writing to inform you that I will be going on leave from (date) to (date) due to (reason). I will be available to take calls and respond to emails during this time. If you need any urgent help, please contact (name) at (phone number).
Thank you for your understanding and cooperation.?
Prompt: 1,3,9,27,81
Answer:
领英推荐
1,3,9,27,81,243,613,1827,5468,16384,49152,196608,786432,3145728,12582912,50439360,201784320,807112
Prompt: 1,1,2,1,2,3,1,2,3,4,1,2,3,4,5
Answer:
1,1,2,1,2,3,1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,1,2,3,4,5,6,7,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,9,1,2,3,4,
Prompt: the woods are lovely dark and deep, but i have promises to keep
Answer:
the woods are lovely dark and deep, but i have promises to keep, and miles to go before i sleep, and miles to go before i sleep.i am a child of the night.
Prompt: a short poem on rain would be following:
Answer:
The rain is falling on the window pane.The leaves are falling from the trees.
The wind is blowing through the branches Of the trees.
Prompt: a summary of india-pakistan relations would be the following:
Answer:
On 29 December 2015, Indian and Pakistani foreign secretaries met for the first time since 2012 in Islamabad to discuss peace and security issues. In a joint statement issued at the conclusion of the meeting, both sides agreed to work together to combat terrorism and reiterated their commitment to the
Prompt: joe biden: USA, Vladimir Putin: Russia, Benjamin Netanyahu: Israel, Greta Thurnberg:
Answer:?
joe biden: USA, Vladimir Putin: Russia, Benjamin Netanyahu: Israel, Greta Thurnberg: Sweden, Aung San Suu Kyi: Myanmar, Angela Merkel: Germany, Boris Johnson: UK, Donald Trump: USA, Xi Jinping: China, Recep Tayyip Erdo?an: Turkey, Imran Khan: Pakistan, Kim Jong Un: North Korea, Vladimir Putin: Russia,
Prompt: if adam is mada, secret is terces, pump is pmup then colossal is
Answer:
if adam is mada, secret is terces, pump is pmup then colossal is 1000
Prompt: java code to generate fibonacci sequence would be
Answer:
What is the Java code to generate the Fibonacci sequence?
Java Program to generate Fibonacci series using recursion
Java Program to generate Fibonacci series using iteration
Prompt: translation of 'I am fine' in hindi would be
Answer:
translation of 'I am fine' in hindi would be ??? ??? ???.
?можем да кажем и "я съм добре"
My conclusions:
1. It does fairly well in matters of "general knowledge" or information prevailing at the time of its training.
2. It does well in simple sequence predictions. But when you give it off the track prompts, it struggles
3. Very bad performance in programming prompts - who knows, a better prompt might do the trick?
4. It fairs poorly when asked logical questions
5. The Max length parameter has a significant influence on the processing time. For a max length of 100 it took about 1.5 mins on g3.4xlarge, whereas for a max length of 1000, it took about 9 minutes. (Looks like it is a fairly linear increase)
In addition, the more the max-length, the more it produced junk output at the end of the token sequence.
6. It fared poorly when the temperature is reduced - this was unexpected as I was under the impression that for lower temperature it hallucinates less, and becomes less creative.?
All in all, the model still has a long way to go till it could be reliably used in an industry setting. Of course this is the 13b model - the 70b model might fare better.
On a lighter note, the most important conclusion is "Memory and processing Is All You Need, Attention comes next" :)
Next steps:
I would like to fine-tune the model - at this point in time, it throws a CUDA error that I am attempting to resolve but so far no solution has been found.?
Has anyone carried out a similar study with the 7B or 70B model? I would love to hear the results!
References:
https://huggingface.co/blog/llama2
https://huggingface.co/meta-llama/Llama-2-13b-hf
https://sundarpuli.medium.com/how-llama-v2-could-revolutionize-the-way-we-interact-with-computers-mindfuldiscoveries-16d366be5637
https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-20-04
PLM Architect and Director - Iristechsys Software Services
1 年Yogesh Haribhau Kulkarni referring to your latest post, I have reached the same conclusion: these models are not industry ready yet. Please have a look at the results of LLaMAv2 13b above and if you have any insights or suggestions to share, you are most welcome