Echoes of the Forgotten Code: 21K Codebase Challenge – From GPT-3.5 to Google Gemini, Who Remembers Best?
Florin Badita
Data Mining for Startups (Series A-C) | +10k websites scrapped, collected billion of data points, +10 yrs exp | Do you need data that is difficult to find? Contact me - Forbes U30 / Tedx Speaker
Last week I showed how I build a advanced graph visualizer in 6 hours using ChatGPT, Mistral and Google Gemeni Advanced.
Now we are interested in providing 3 different algorithms to visualize the data. The codebase has 21k characters, 72 functions in 650 Lines of code.
The prompt:
"we now want to add a button that will allow us to change the algorithm used for the graph, at first it should offer 3 different modes. Each mode will be its own function so we keep things modular "
And then the 21k javascript code
I specifically did not want to suggest the algorithms to use to see what LLM will be best as understanding the task and providing the relevant information for me.
TLDR
Final scores below:
Tested using
Wanted to test some other models (Llama-2-70b-Groq) , but they were not able to operate on a 21k context window.
Others models were just useless (Code-Llama-70B-FW)
The answer its weird, it understood the idea of something with 3, produced a Csharp code for some strange reason, and build this program that does not have any connection with anything from the original codebase.
For each of the 6, we will measure them in 8 different things
Provided HTML code?
? if it does not include it by default
? If it provided the html snippet that I needed to add to the HTML file.
Provided JS code?
? if it does not include it by default
? If it provided some JS snippets of what I need to add to my code.
Provided Algorithm suggestions first time?
? if included just dummy names and not actual possible implementation of real algorithms
? If it provided directly some real algorithms that I can use
Provided Algorithm implementation when asked?
? if it was not able to provide real algorithms when I prompted it to include actual code
? if it provided real algorithms when I prompted it to include them
Future suggestions provided?
? if it does not include it by default
? If it provided some suggestions that would allow the user to move forward with the task
Provided requested code when feeding it it's own future suggestions?
? if it was not able to generate some code using the suggestion that the LLM provided earlier.
? If it provided code that would allow the user to move forward with the task
Provided working code?
? if the code does not work
? If the suggestions actually created working code that the user can use
Able to provide full code implementation?
? if it was not able to provide relevant suggestions that will make sense to the existing codebase when prompted to tie together the suggestions that it had given in the prompts for the algorithms with the old codebase (pasting the whole 21k code so it can remember my old codebase.
? if it was able to provide relevant suggestions that will make sense to the existing codebase when prompted to tie together the suggestions that it had given in the prompts for the algorithms with the old codebase (pasting the whole 21k code so it can remember my old codebase.
1. OpenAI ChatGPT 3.5,
?Provided HTML code
?Provided JS code
?Provided Algorithm suggestions first time?
?Future suggestions provided
Replace , , and with the names of your three different modes (functions) for graph generation. Update the function to call the appropriate function based on the selected algorithm. Finally, attach the function to the button click event listener.
?Provided requested code when feeding it it's own future suggestions?
To test if it was able to provide with actual algorithms, I pasted the response from the future suggestions back to ChatGPT 3.5
?Provided Algorithm implementation when asked?
It suggested
?Provided working code?
Could not test as I was not able to integrate the code into the existing app
?Able to provide full code implementation?
It Produced full code of some things that were not affected by the changes required, and some things that were affected but the script was not working after adding all the code, it was missing some things from the original function.
1. OpenAI ChatGPT 4
?Provided HTML code
?Provided JS code
?Provided Algorithm suggestions first time?
?Provided Algorithm implementation when asked?
1. Force-Directed Layout
2. Hierarchical (Tree) Layout
3. Radial Layout
But not full implementation, more just like sample code
?Future suggestions provided
"Make sure to adjust the function names and the logic within them according to the specific algorithms or modes you want to implement. The above code provides a flexible framework for switching between different visualization modes based on user input."
?Provided requested code when feeding it it's own future suggestions?
Some sample of it
?Provided working code?
?Able to provide full code implementation?
Not really, it provided the HTML and some JS, but forgot about how my code looked like before.
?Able to provide full code implementation after showing old code and asking it to tie everything together?
Look at my old code and tell how to tie everything together. for brevity show just what I need to change
Here it also failed as it forgot already what it suggested 2 prompts ago so it was suggesting just general things how to implement.
1. Mistral Large
?Provided HTML code
?Provided JS code
?Provided Algorithm suggestions first time?
?Provided Algorithm implementation when asked?
?Future suggestions provided
"In the places where the comments "Call the function for modeX and update the graph" are, you should call the respective function for each mode and update the graph with the new data. Since the functions for each mode are not provided in your code, I cannot give specific details on how to do this. However, you should generally update the and variables with the new data and call the function again."
Here it failed "Since the functions for each mode are not provided in your code, I cannot give specific details on how to do this." because the algorithm could and should have decided how to call the functions.
?Provided requested code when feeding it it's own future suggestions?
领英推荐
?Provided working code?
It understood and tied everything together, but the end result did not worked for the 2 algorithms that it provided.
It produced code, the switch button was working but could not make the functions to wor
k
?Able to provide full code implementation?
look at my old code and tell how to tie everything together. for brevity show just what I need to change
It provided full code implementation within my existing codebase.
1. Claude 3 Sonnet
?Provided HTML code
?Provided JS code
?Provided Algorithm suggestions first time?
?Provided Algorithm implementation when asked?
?Future suggestions provided
With these changes, you should now have three buttons that allow you to switch between different graph layout algorithms. You'll need to implement the actual layout algorithms in the , , and functions, but the overall structure is in place.
Note that this is a basic example, and you might need to adjust the code to fit your specific use case and data structure. Additionally, you might want to consider adding some visual feedback or loading indicators to enhance the user experience when switching between different modes.
?Provided requested code when feeding it it's own future suggestions?
?Provided working code?
?Able to provide full code implementation?
look at my old code and tell how to tie everything together. for brevity show just what I need to change
The code was tied together with my old codebase
Google Gemeni Advanced (Paid version)
?Provided HTML code
?Provided JS code
?Provided Algorithm suggestions first time?
?Provided Algorithm implementation when asked?
1. Force-Directed Layout
2. Community Detection (e.g., Girvan-Newman Algorithm)
3. Shortest Path (e.g., Dijkstra's Algorithm)
Absolutely! Here are 3 actual graph algorithms you can implement and switch between in your visualization:
1. Force-Directed Layout
2. Community Detection (e.g., Girvan-Newman Algorithm)
3. Shortest Path (e.g., Dijkstra's Algorithm)
How to choose and implement
And then it provided just a boilerplate
I then asked for full code and it provided half baked answer
I then asked again to provide 3 implementation of algorithms that would work for D3 and already was to bias from its poor suggestions before (love the diversity) but what's the purpose of providing this suggestions if I cannot use them where I need
?Future suggestions provided
I will put this as a no, because in the same prompt it suggested to create a updateVisualization Function, only to ask me to provide it the updateVisualization implementation 8 lines down
?Provided requested code when feeding it it's own future suggestions?
For some stupid reason Google Gemeni does not allow the user to edit old queries, so you cannot go back to a previous conversation and continue the thread from there.
Same as Poe from Quora, how can you be so lacking?
OpenAI ChatGPT in contrast allows users to do this, and I use this a lot when I want to test different queries, or go back to a time when I know the LLM still has a certain topic or concept in mind based on our previous discussions. Without the ability to edit the query, you just advance in the conversation and the LLM will forget more and more of the previous conversations.
?Provided working code?
?Able to provide full code implementation?
I needed to change the prompt because of the fact that google does not allow the editing of old chats so it already forgot what we were discussing 5 prompts before.
Google Gemeni was so lost it was one of the worst performers.
DeepSeek-Coder-33B-T using Poe
?Provided HTML code
?Provided JS code
?Provided Algorithm suggestions first time?
?Provided Algorithm implementation when asked?
?Future suggestions provided
?Provided requested code when feeding it it's own future suggestions?
?Provided working code?
?Able to provide full code implementation?
look at my old code and tell how to tie everything together. for brevity show just what I need to change
Failed at this also
Final results
Winner
Claude 3 Sonnet
Claude 3 Sonnet got 7/8, the only thing that did not work was providing actual working code.
It was the only one that provided the algorithms to use on the first try, instead of some dummy placeholder code
The code worked, could click the button and it will change between the algorithms but the algorithm implementation was missing something that I was not able to fix within the max of 5 free messages that they offer, and I cannot yet subscribe to the pro version from EU.
Mistral Large got 6/8 , I felt was one of the most spot at understanding the task and understanding my existing codebase and it could tie thing together.
DeepSeek-Code-33B-T got 5/8 we had a good start but because of the errors that I got, either the model could not handle the accumulated context window or something else at Poe, could not actually got it to working code.
ChatGPT 3.5 got 4/8
It produced some full code of some things that were not affected by the changes required, and some things that were affected but the script was not working after adding all the code, it was missing some things from the original function.
ChatGPT 4 got 3/8
I usually have a good experience with GPT4, but its possible because of the larger context window of 21k its starting to remember to much becoming more usesless for coding tasks
Google Gemeni Advanced got 3/8
At times this happened because it tried to be to smart.
Even if the task was suggest 3 different algorithms to visualize the data and by looking at the codebase it could see that its a D3.js codebase, it suggested 3 different implementation in 2 different programming languages.
At one time it suggested me a new function to use and in the same response it was asking me to give him the code sample for the new function that it just created in the response. It's not fun to have alzheimer