My experiments with code-Llama
Image Credit: Maginative

My experiments with code-Llama

Back with another experiment with code-llama, the large language model from Meta AI, trained specifically for code. It supports common programming languages today, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.

Code-Llama was released on August 24, 2023.

We saw in my last post that the Llama 13b fared extremely poorly in code completion tasks.

Therefore, I decided to experiment with code-llama to see if it provides better results and most importantly whether it is "usable".

This time, I used a slightly lower configuration - EC2 VM, g4dn.4xlarge, Ubuntu 22.04, having 64 Gigs of RAM and 16 vCPU.

The concept of the CPU cores is somewhat misleading and we will come to that later in the post.

As usual, I used the models from huggingface https://huggingface.co/codellama, more specifically, the CodeLlama-13b-Instruct-hf model. The setup was smooth and glitch-free - no issues were encountered.


Now for the warm-up phase - an "off-the-shelf" problem:

Prompt:

Write a python program to sort decimals using bubble sort

Output:

def bubble_sort_decimals(decimals):

for i in range(len(decimals)):

for j in range(i+1, len(decimals)):

if decimals[i] > decimals[j]:

decimals[i], decimals[j] = decimals[j], decimals[i]

return decimals

You can see that it did a decent job. However, you could also see that it performed unnecessary swaps, even when the list is sorted.


Then came a twisted question where a modified Fibonacci sequence output was prompted:

Prompt:

Write a Python program to generate a modified Fibonacci sequence 1,1,4,9,25,64,169 where each term is the square of the value of Fibonacci sequence

Output:

def modified_fibonacci(n):

if n == 0:

return 0

elif n == 1:

return 1

elif n == 2:

return 4

else:

return modified_fibonacci(n-1)**2 + modified_fibonacci(n-2)**2

It generated a function that was a recursive one and it was correct! It certainly proves that this model can certainly deal with a bit of novelty.

Good job Meta!


Then my colleague Mandar Kore decided to break-its-back.

Prompt:

Write a Java program to generate the following outputs based on two inputs: If inputs are 2,3 output should be 8. If inputs are 3,7 output should be 27. If inputs are 4,5 output should be 32. If inputs are 5,8 output should be 60. If inputs are 6,7 output should be 72. And so on.

(Here, if the number is xy, the pattern is (x-1)*x + x*y. )

Output:

public class Solution {

public static void main(String[] args) {

int a = 2;

int b = 3;

System.out.println(a * b);

}

}

It struggled badly to identify the pattern or to provide a coherent solution. Incidentally, Mandar had tried it with ChatGPT and it flunked on this question as well!


Then, I wanted to identify whether it could simplify programming and write a vanilla-plain, (and an inefficient) program :

Prompt:

Write a Java program to use a simple for loop to calculate a square root of a decimal number for eg 25.589 or 63.899. It's fine even if the answer comes in decimal values, but the answer has to be accurate up to 2 decimals. For eg, a square root of 25.8 is 5.07. The square root of 35.0 is 5.91

Output:

public class SquareRoot {

public static void main(String[] args) {

double number = 25.589;

double root = 0;

for (int i = 1; i <= number; i++) {

root = i;

}

System.out.println(root);

}

}

Despite providing clear instructions for getting answers up to two decimal places, it goofed up. All it had to do was to increment it by 0.01


However, when prompted to get the square root of a negative number, it generated the correct output:

Prompt:

Write a Python program to use a simple for loop to calculate a square root of a negative number.

Output:

def square_root(num):

if num < 0:

return "Square root of a negative number doesn't exist"

else:

for i in range(num):

if i ** 2 == num:

return i

return "Square root of a negative number doesn't exist"

I attempted other functions as well and as long as they were straightforward, even novel programs, it did a good job, and for brevity's sake I am skipping their prompts and outputs.


Other observations:

1. There is an almost non-linear relationship between the max_token parameter and the processing time required. This is to be expected, as in the earlier case of Llamav2.

2. When I "enabled" the GPU, the speed was worse. The processing used only one core instead of the 8 available cores.

CPU consumption with GPU

Am I missing something here?

3. Without the GPU, the "htop" application shows 8 fully engaged cores and 8 zero-utilization cores. This is sort of misleading as the rest of the 8 cores are virtual. Please see footnote [2]

CPU consumption without GPU - note the virtual cores show 0 utilization


4. For about 100 max_tokens it took around 3 mins, and for 512 took around 11+ mins. Assuming we would like to fine-tune the model and use it for specific purposes, it will cost around 900 USD/month for a cloud instance, not to mention the higher capacity required to fine-tune it. That certainly brings questions about RoI into focus.

5.In general, the 13b model consumed about 50 GB of RAM

6.We have seen that with a large language model, prompts are key to getting a better output- so I wonder if the prompts are in a different format will that substantially improve the output? Food for thought.

Conclusions:

I am pleasantly surprised with the 13b code-Llama model - it doesn't throw gibberish that the vanilla 13b model used to do, but as far as the returns on Investment are concerned, it still has a long way to go.

The 34b model, albeit a larger one might provide better and accurate results but will require a higher processing capacity.

References and footnotes:

[1] https://huggingface.co/codellama/CodeLlama-13b-Instruct-hf

[2] https://github.com/UKPLab/sentence-transformers/issues/922

[3] https://about.fb.com/news/2023/08/code-llama-ai-for-coding/


Sameer Kanse

Business Strategy| Technology Driven Problem Solver | 0-1 Business Scaling

1 年

Interesting observations Pranav Patil, TOGAF,CSM,PGDIPR, especially on the core utilisations and impact of Prompts on ROI. I think just like coders now focus on efficient code, the future training would focus on providing accurate prompts for optimised results of resource utilisation. Looks like creative writing and framing problems with brevity are going to be in demand skills vs pure coding syntax.

要查看或添加评论,请登录

Pranav Patil, TOGAF,CSM,PGDIPR,的更多文章

  • Take Heart! ??

    Take Heart! ??

    Imagine cruising along the highway on a sleek bike with a powerful engine. Suddenly, it sputters—further investigation…

    5 条评论
  • CHANDRAYAAN-3 - Technological challenges for a soft-landing

    CHANDRAYAAN-3 - Technological challenges for a soft-landing

    Chandrayaan-3's smooth landing has been the feather in the cap for us all Indians. India made history as the 4th…

  • My experiments with LLaMA 2

    My experiments with LLaMA 2

    Llama 2 is a large language model (LLM) developed by Meta that has been trained on over 2 trillion tokens of public…

    1 条评论
  • Entrepreneurship and Marathons: Analogies and insights

    Entrepreneurship and Marathons: Analogies and insights

    Today I successfully completed 21Kms Pune half marathon. As an entrepreneur, I drew some analogies and insights between…

    17 条评论
  • Don't judge a book by its cover!

    Don't judge a book by its cover!

    How many times have you have seen a swanky, large mall in an upscale locality forced to shut down within a year while a…

    9 条评论
  • Hail the "experts"

    Hail the "experts"

    Few years back, my car developed a “bipolar mood disorder”. At times, it was feisty and sporty.

    3 条评论
  • Patent Myths - Stop fooling yourself !

    Patent Myths - Stop fooling yourself !

    I have met quite a few people who think they have invented something new and want to make a killing out of their…

    6 条评论

社区洞察

其他会员也浏览了