登录查看更多内容

Echoes of the Forgotten Code: 21K Codebase Challenge – From GPT-3.5 to Google Gemini, Who Remembers Best?

Florin Badita

Data Mining for Startups (Series A-C) | +10k websites scrapped, collected billion of data points, +10 yrs exp | Do you need data that is difficult to find? Contact me - Forbes U30 / Tedx Speaker

发布日期: 2024年3月18日

Last week I showed how I build a advanced graph visualizer in 6 hours using ChatGPT, Mistral and Google Gemeni Advanced.

Now we are interested in providing 3 different algorithms to visualize the data. The codebase has 21k characters, 72 functions in 650 Lines of code.

The prompt:

"we now want to add a button that will allow us to change the algorithm used for the graph, at first it should offer 3 different modes. Each mode will be its own function so we keep things modular "

And then the 21k javascript code

I specifically did not want to suggest the algorithms to use to see what LLM will be best as understanding the task and providing the relevant information for me.

TLDR

Final scores below:

Tested using

1. OpenAI ChatGPT 3.5,
2. ChatGPT4,
3. Mistral AI Large,
4. Anthropic Claude 3 Sonnet
5. 谷歌 Gemeni Advanced
6. DeepSeek tested on Quora Poe DeepSeek-Coder-33B-T

Wanted to test some other models (Llama-2-70b-Groq) , but they were not able to operate on a 21k context window.

Others models were just useless (Code-Llama-70B-FW)

The answer its weird, it understood the idea of something with 3, produced a Csharp code for some strange reason, and build this program that does not have any connection with anything from the original codebase.

For each of the 6, we will measure them in 8 different things

Provided HTML code?

? if it does not include it by default

? If it provided the html snippet that I needed to add to the HTML file.

Provided JS code?

? if it does not include it by default

? If it provided some JS snippets of what I need to add to my code.

Provided Algorithm suggestions first time?

? if included just dummy names and not actual possible implementation of real algorithms

? If it provided directly some real algorithms that I can use

Provided Algorithm implementation when asked?

? if it was not able to provide real algorithms when I prompted it to include actual code

? if it provided real algorithms when I prompted it to include them

Future suggestions provided?

? if it does not include it by default

? If it provided some suggestions that would allow the user to move forward with the task

Provided requested code when feeding it it's own future suggestions?

? if it was not able to generate some code using the suggestion that the LLM provided earlier.

? If it provided code that would allow the user to move forward with the task

Provided working code?

? if the code does not work

? If the suggestions actually created working code that the user can use

Able to provide full code implementation?

? if it was not able to provide relevant suggestions that will make sense to the existing codebase when prompted to tie together the suggestions that it had given in the prompts for the algorithms with the old codebase (pasting the whole 21k code so it can remember my old codebase.

? if it was able to provide relevant suggestions that will make sense to the existing codebase when prompted to tie together the suggestions that it had given in the prompts for the algorithms with the old codebase (pasting the whole 21k code so it can remember my old codebase.

1. OpenAI ChatGPT 3.5,

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Future suggestions provided

Replace , , and with the names of your three different modes (functions) for graph generation. Update the function to call the appropriate function based on the selected algorithm. Finally, attach the function to the button click event listener.

?Provided requested code when feeding it it's own future suggestions?

To test if it was able to provide with actual algorithms, I pasted the response from the future suggestions back to ChatGPT 3.5

?Provided Algorithm implementation when asked?

It suggested

Random Graph Algorithm:

Grid Graph Algorithm:

Scale-Free Graph Algorithm (using Barabási–Albert model):

?Provided working code?

Could not test as I was not able to integrate the code into the existing app

?Able to provide full code implementation?

It Produced full code of some things that were not affected by the changes required, and some things that were affected but the script was not working after adding all the code, it was missing some things from the original function.

1. OpenAI ChatGPT 4

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

1. Force-Directed Layout

2. Hierarchical (Tree) Layout

3. Radial Layout

But not full implementation, more just like sample code

?Future suggestions provided

"Make sure to adjust the function names and the logic within them according to the specific algorithms or modes you want to implement. The above code provides a flexible framework for switching between different visualization modes based on user input."

?Provided requested code when feeding it it's own future suggestions?

Some sample of it

?Provided working code?

?Able to provide full code implementation?

Not really, it provided the HTML and some JS, but forgot about how my code looked like before.

?Able to provide full code implementation after showing old code and asking it to tie everything together?

Look at my old code and tell how to tie everything together. for brevity show just what I need to change

Here it also failed as it forgot already what it suggested 2 prompts ago so it was suggesting just general things how to implement.

1. Mistral Large

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

Force-Directed Layout (already implemented in your code)
Circular Layout
Hierarchical Layout

?Future suggestions provided

"In the places where the comments "Call the function for modeX and update the graph" are, you should call the respective function for each mode and update the graph with the new data. Since the functions for each mode are not provided in your code, I cannot give specific details on how to do this. However, you should generally update the and variables with the new data and call the function again."

Here it failed "Since the functions for each mode are not provided in your code, I cannot give specific details on how to do this." because the algorithm could and should have decided how to call the functions.

?Provided requested code when feeding it it's own future suggestions?

领英推荐

The elevation of human work

Reid Hoffman 1 年前

The ultimate guide on prompt injection

Algolia 3 个月前

?? Agents for Time Series Analysis

Pascal Biese 1 个月前

?Provided working code?

It understood and tied everything together, but the end result did not worked for the 2 algorithms that it provided.

It produced code, the switch button was working but could not make the functions to wor

?Able to provide full code implementation?

look at my old code and tell how to tie everything together. for brevity show just what I need to change

It provided full code implementation within my existing codebase.

1. Claude 3 Sonnet

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

?Future suggestions provided

With these changes, you should now have three buttons that allow you to switch between different graph layout algorithms. You'll need to implement the actual layout algorithms in the , , and functions, but the overall structure is in place.

Note that this is a basic example, and you might need to adjust the code to fit your specific use case and data structure. Additionally, you might want to consider adding some visual feedback or loading indicators to enhance the user experience when switching between different modes.

?Provided requested code when feeding it it's own future suggestions?

?Provided working code?

?Able to provide full code implementation?

look at my old code and tell how to tie everything together. for brevity show just what I need to change

The code was tied together with my old codebase

Google Gemeni Advanced (Paid version)

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

1. Force-Directed Layout

2. Community Detection (e.g., Girvan-Newman Algorithm)

3. Shortest Path (e.g., Dijkstra's Algorithm)

Absolutely! Here are 3 actual graph algorithms you can implement and switch between in your visualization:

1. Force-Directed Layout

Purpose: Arranges nodes and links in an aesthetically pleasing and often meaningful way. Nodes are treated like repelling particles, and links act like springs, leading to an equilibrium layout.
Use Cases:General network visualizationSocial network graphsMolecule diagrams
Implementation Notes: You can often use libraries like D3.js (which you seem to be using) for its module to handle the physics calculations.

2. Community Detection (e.g., Girvan-Newman Algorithm)

Purpose: Finds groups of densely connected nodes (communities) within a graph.
Use Cases:Identifying social groups within networksFinding clusters in biological dataMarket segmentation
Implementation Notes:Libraries like NetworkX (Python) or igraph (C++, Python, R) offer community detection implementations.The Girvan-Newman algorithm works by repeatedly calculating edge betweenness and removing the edge with the highest betweenness. This process continues until the network breaks into communities.

3. Shortest Path (e.g., Dijkstra's Algorithm)

Purpose: Finds the shortest path (or minimum cost path) between two nodes in a graph.
Use Cases:Routing and navigation (think GPS systems)Network optimization problems
Implementation Notes:Dijkstra's Algorithm is a classic and efficient solution. It maintains a set of visited nodes, progressively updating the shortest known distance to each node from the starting point.

How to choose and implement

Consider Your Goal: Which aspects of the graph do you want to highlight with different algorithms? Arrangement, group structure, or optimal routes?
Libraries: D3 might be sufficient for Force-Directed Layout. For more specialized algorithms, check out these resources:NetworkX (Python): https://networkx.org/igraph (C++, Python, R): https://igraph.org/
Adapting Your Code: Your algorithms will likely need to modify the and data to affect how your visualization renders.

And then it provided just a boilerplate

I then asked for full code and it provided half baked answer

I then asked again to provide 3 implementation of algorithms that would work for D3 and already was to bias from its poor suggestions before (love the diversity) but what's the purpose of providing this suggestions if I cannot use them where I need

?Future suggestions provided

I will put this as a no, because in the same prompt it suggested to create a updateVisualization Function, only to ask me to provide it the updateVisualization implementation 8 lines down

?Provided requested code when feeding it it's own future suggestions?

For some stupid reason Google Gemeni does not allow the user to edit old queries, so you cannot go back to a previous conversation and continue the thread from there.

Same as Poe from Quora, how can you be so lacking?

OpenAI ChatGPT in contrast allows users to do this, and I use this a lot when I want to test different queries, or go back to a time when I know the LLM still has a certain topic or concept in mind based on our previous discussions. Without the ability to edit the query, you just advance in the conversation and the LLM will forget more and more of the previous conversations.

?Provided working code?

?Able to provide full code implementation?

I needed to change the prompt because of the fact that google does not allow the editing of old chats so it already forgot what we were discussing 5 prompts before.

Google Gemeni was so lost it was one of the worst performers.

DeepSeek-Coder-33B-T using Poe

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

?Future suggestions provided

?Provided requested code when feeding it it's own future suggestions?

?Provided working code?

?Able to provide full code implementation?

look at my old code and tell how to tie everything together. for brevity show just what I need to change

Failed at this also

Final results

Winner

Claude 3 Sonnet

Claude 3 Sonnet got 7/8, the only thing that did not work was providing actual working code.

It was the only one that provided the algorithms to use on the first try, instead of some dummy placeholder code

The code worked, could click the button and it will change between the algorithms but the algorithm implementation was missing something that I was not able to fix within the max of 5 free messages that they offer, and I cannot yet subscribe to the pro version from EU.

Mistral Large got 6/8 , I felt was one of the most spot at understanding the task and understanding my existing codebase and it could tie thing together.

DeepSeek-Code-33B-T got 5/8 we had a good start but because of the errors that I got, either the model could not handle the accumulated context window or something else at Poe, could not actually got it to working code.

ChatGPT 3.5 got 4/8

It produced some full code of some things that were not affected by the changes required, and some things that were affected but the script was not working after adding all the code, it was missing some things from the original function.

ChatGPT 4 got 3/8

I usually have a good experience with GPT4, but its possible because of the larger context window of 21k its starting to remember to much becoming more usesless for coding tasks

Google Gemeni Advanced got 3/8

At times this happened because it tried to be to smart.

Even if the task was suggest 3 different algorithms to visualize the data and by looking at the codebase it could see that its a D3.js codebase, it suggested 3 different implementation in 2 different programming languages.

At one time it suggested me a new function to use and in the same response it was asking me to give him the code sample for the new function that it just created in the response. It's not fun to have alzheimer

要查看或添加评论，请登录

Florin Badita的更多文章

Inside the TOP 1000 tags on Medium.com

2024年8月30日

Inside the TOP 1000 tags on Medium.com

During my 3 months stay in the US, one of my pet project was to download all of the posts on medium.com.
A adventure inside medium.com TOP 30.000 posts by number of hyperlinks in each post.

2024年7月30日

A adventure inside medium.com TOP 30.000 posts by number of hyperlinks in each post.

As I await an answer from Medium about my account that is currently blocked because I indexed medium.com in 2016 to do…

3 条评论
How I ended up being banned from TED AI after revealing that the TED AI app was inadvertently disclosing participants’ private email addresses.

2024年7月6日

How I ended up being banned from TED AI after revealing that the TED AI app was inadvertently disclosing participants’ private email addresses.

Reposting this old medium.com article on Linkedin now because My Medium.
Mapping the Global Workforce: "The Global Workforce Unveiled: Jobs, Gaps, and Trends Across 43 Domains"

2024年3月20日

Mapping the Global Workforce: "The Global Workforce Unveiled: Jobs, Gaps, and Trends Across 43 Domains"

This article is a collaboration between the Scrape The World team and MR Data - Data Mining in the real world The…

2 条评论
From ChatGPT to Mistral: How I Built an Interactive Graph Visualizer in 6 Hours (and Survived Google Gemeni's Advanced Forgetfulness)

2024年3月15日

From ChatGPT to Mistral: How I Built an Interactive Graph Visualizer in 6 Hours (and Survived Google Gemeni's Advanced Forgetfulness)

I was on a mission: find a way to build interactive graph visualizations online. Networkx? Bah, static images.

6 条评论
Looking forward to participate in the first ever Forbes Under 30 Global Retreat in Slovakia next month!

2019年5月24日

Looking forward to participate in the first ever Forbes Under 30 Global Retreat in Slovakia next month!

To learn more about what I do, read this article: https://florinbadita.eu/civic-activism/
Join me remote at this weekend Debug Politics Hackaton. Working on a cool project

2016年12月10日

Join me remote at this weekend Debug Politics Hackaton. Working on a cool project

Until Sunday, 11 Dec 2016, I'm taking part in the Debug Politics Hackaton , working on “The Outbrake”, a tool that can…
The Outbreak?—?Detecting fake Viral News, automatically.

2016年12月9日

The Outbreak?—?Detecting fake Viral News, automatically.

Two weeks ago i published this post on medium about how we can detect fake viral news, using the Outbreak, a tool…

See all articles

Provided HTML code?

Provided JS code?

Provided Algorithm suggestions first time?

Provided Algorithm implementation when asked?

Future suggestions provided?

Provided requested code when feeding it it's own future suggestions?

Provided working code?

Able to provide full code implementation?

1. OpenAI ChatGPT 3.5,

?Provided HTML code

?Provided JS code

?Provided Algorithm implementation when asked?

1. OpenAI ChatGPT 4

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

1. Force-Directed Layout

2. Hierarchical (Tree) Layout

3. Radial Layout

?Future suggestions provided

?Provided requested code when feeding it it's own future suggestions?

?Provided working code?

?Able to provide full code implementation?

?Able to provide full code implementation after showing old code and asking it to tie everything together?

1. Mistral Large

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

?Future suggestions provided

?Provided requested code when feeding it it's own future suggestions?

领英推荐

?Provided working code?

?Able to provide full code implementation?

1. Claude 3 Sonnet

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

?Future suggestions provided

?Provided requested code when feeding it it's own future suggestions?

?Provided working code?

?Able to provide full code implementation?

Google Gemeni Advanced (Paid version)

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

?Future suggestions provided

?Provided requested code when feeding it it's own future suggestions?

?Provided working code?

?Able to provide full code implementation?

DeepSeek-Coder-33B-T using Poe

?Provided HTML code

?Provided JS code

?Provided Algorithm suggestions first time?

?Provided Algorithm implementation when asked?

?Future suggestions provided

?Provided requested code when feeding it it's own future suggestions?

?Provided working code?

?Able to provide full code implementation?

Final results

Florin Badita的更多文章

Inside the TOP 1000 tags on Medium.com

A adventure inside medium.com TOP 30.000 posts by number of hyperlinks in each post.

How I ended up being banned from TED AI after revealing that the TED AI app was inadvertently disclosing participants’ private email addresses.

Mapping the Global Workforce: "The Global Workforce Unveiled: Jobs, Gaps, and Trends Across 43 Domains"

From ChatGPT to Mistral: How I Built an Interactive Graph Visualizer in 6 Hours (and Survived Google Gemeni's Advanced Forgetfulness)

Looking forward to participate in the first ever Forbes Under 30 Global Retreat in Slovakia next month!

Join me remote at this weekend Debug Politics Hackaton. Working on a cool project

The Outbreak?—?Detecting fake Viral News, automatically.

社区洞察

其他会员也浏览了

Build RAG applications using only APIs with Postman! ??

Building Retrieval Augmented Generation (RAG) from scratch - Feeding my Database Internal articles

New flagship and advanced LLM from MistralAI with a 32K context window ??

Launch your RAG powered ChatBot in Minutes Using MonsterAPI's no-code platform

Choosing the Right RAG Framework: LangChain or LlamaIndex?

Fine-Tuning vs. Prompting vs. RAG: Which to Pick for Your LLM?