Google’s Q*?
AIM Events
Hosting the World’s Most Impactful AI Conferences & Events. For Brand collaborations write to [email protected]
While everyone was gung-ho about Google’s Gemini, they seemed to have overlooked AlphaCode 2, a competitive coding model fine-tuned from Gemini. This could well be the Q* moment for Google.
AlphaCode 2 on Codeforces witnessed 2x improvement over the prior record-setting AlphaCode system, which solved 25 percent. Mapping this to competition ranking, the team estimated that AlphaCode 2 is on average at the 85th percentile. In other words, it performs better than 85 percent of the participants, ranking between the ‘Expert’ and ‘Candidate Master’ categories on Codeforces.?
Codeforces is a platform for testing competitive programming.?
When compared to other AI code generators, the likes of GitHub Copilot (based on OpenAI Codex), Amazon CodeWhispher, Replit, CodeLlama 2, EleutherAI Llemma, and Salesforce CodeGen, AlphaCode 2 shows a unique strength in competitive programming. Whereas the others serve as mere coding assistants, mostly for general coding help and solving basic maths problems.?
For OpenAI, Q* represents a significant advancement in AI capabilities for solving maths problems it hadn’t seen before, and for enhanced problem-solving abilities. Google’s AlphaCode 2, powered by Gemini, hints at it reaching the level of advancements as Q* – or even better.?
Enjoy the full story here.?
The Need for Benchmarks??
Since the beginning of LLMs, benchmarks have been the litmus test to judge their efficiency, at least on paper. There are plenty of them right now. Some popular ones include Human Evals (OpenAI), AGI Evals (Microsoft) MMLU, GSM8K, and others. However, companies often manipulate the data to project themselves at the top; and in this race, there’s yet to emerge a clear winner.?
领英推荐
For instance, the recent launch of Gemini and its comparison with GPT-4 on different benchmarks, gives a glimpse of the benchmark manipulation. Google claimed it outperformed GPT-4 on the MMLU benchmark. However, it was later discovered that Google used COT@32 instead of 5-shot learning. Read the full story to find out what happened next.
AI in Fashion with Snezhana
Born in Russia, Snezhana Paderina is an art director in fashion and technology. Based out of New York, she works with CG cinematics and 3D designing as a fashion film director. With a drive to learn design from an American university, given that English was not her first language, Paderina combined it with her technical know-how to help her get through.?
In an interview with AIM, Paderina shares her journey into the world of fashion and tech, and what the future looks like in the coming years. Read more here.
Deepfakes in Finance
Recently, Nithin Kamath, founder and CEO of Zerodha, posted on LinkedIn a video of him talking about the risk of deepfakes in the finance industry. He highlighted how deepfake is improving and that soon, it will become more challenging for people to figure out if a person online is real or AI-generated.
Surprisingly, the video in which Kamath explained this was also a deepfake! So, what is the solution? Read to find out.