Which AI Model Has the Most Rizz? Evaluating GitHub Models for Maximum Sauce
Asha Holla??
Analytics, Automation, AI @Bloom ? Data Nerd ? Speaker ? Technical Writer ? Open Source ? DE&I
GitHub Models on marketplace is a platform that lets developers discover and test different AI models directly within GitHub. This not only allows developers to assess and choose models that best fit their project requirements, but also environment enables users to experiment with different models by adjusting parameters and testing various prompts.
It features an interactive playground for experimentation, seamless integration with tools like Copilot Chat and GitHub CLI, and built-in best practices for responsible AI use.
.
You can experiment using a plethora of models - including the newest ones like Deepseek R1, GPT 4o to name a few. Playground also allows tweaking of parameters based on the kind of output expected - you can limit the output tokens to get crisp responses or adjust the temperature to control the randomness of the responses.
.
For our experiment today - let’s see which of these LLMs can generate peak brainrot renditions of popular songs off the billboard chart.
.
Testing GitHub’s AI Models for Maximum Sauce
To find out, I grabbed a few AI models from GitHub Models Marketplace and put them to the test. My goal? See which one could take a regular pop song and remix it into pure brainrot. I evaluated models based on:
.
Model Showdown
Here’s how some models stacked up
1. DeepSeek R1
The model did not understand what was meant by brainrot. I supplied a simple prompt to brainrot Taylor Swift's Blank Space and even supplied brainrot words like fanum tax and skibidi to set the context. Here was the output
Verdict: The model struggles to output just song lyrics, instead providing reasoning for each line, doesn’t understand "brainrot," responds slowly, and occasionally fails.
2. OpenAI GPT 4o
For the same prompt - GPT 4o delivered a good result , it faltered in few places and messed up rhyme schemes but overall a usable result
领英推荐
Verdict: Promising and the right amount of brainrot
3. Meta Llama 3.1
Bit on the fence about this one. Picked the right mix and variety of brainrot words and the breakdown of song into verse and chorus really helps set the tone.
Verdict: Impressive job on the lyrics, quick to respond, doesn't think for long - strong contender
4. Mistral Large 24.11
I'm not sure which song this model tried to summarize—it seems like a mix of random lyrics, and none of it quite aligns with any recognizable song.
Verdict: Strong start but ends up messing up halfway through. Wrong tool for the job
Final Take
For peak brainrot song generation, GPT-4o had the best mix of coherence and absurdity, but Llama 3.1 had raw chaotic energy. For meme-tier remixes, I'd use Llama 3.1, but GPT-4o gets brownie points for presentation.
DeepSeek and Mistral excel on many fronts but brainrot isn't really their strong suit.
If GitHub continues expanding its model marketplace, who knows? We might one day get a dedicated brainrot AI model—until then, I’ll keep experimenting.
#ADSBlogs #AzureDeveloperCommunity