Weekly review #4
Miko Pawlikowski ???
Follow for breakthrough ideas in tech, business & performance | Keynote speaker | Co-founder SREday.com
Friday 2024/08/23
??? Ideogram’s latest text-to-image model is here (1 min)
Image generation continues to be one of the fastest-growing areas of AI, with plenty of tools to choose from. Ideogram has made the move with a new iteration of its text-to-image model. Ideogram 2.0 is available for free and showcases enhanced image-text alignment, accuracy, and five new image styles.
Besides this, the company has also launched an iOS app and a beta API. Users can opt for premium features through their multiple subscription plans. It is noteworthy that the model, allegedly, outshines similar competitors, like Midjourney and DALL-E 3.
Research from China's Guilin University of Electronic Technology has introduced a technique based on the GSM8K dataset that allows large language models to filter irrelevant information and subsequently improve their reasoning process.
The GSMIR dataset has been tested with GPT-3.5-Turbo and GPT-3.5-Turbo-16k. Researchers intentionally fed irrelevant sentences into a series of elementary school maths problems, and both models were able to identify them 74.9% of the time through the "Analysis to Filtration Prompting" (ATF) method.
?? AI21 has unveiled Jamba 1.5 (2 min)
A combination of transformers and the Structured State Space (SSM) model approach has brought Jamba 1.5 to life. This model, developed by AI21, is an upgraded version of the one they showcased in March, and it includes Jamba 1.5 Mini and Jamba 1.5 large build.
Jamba 1.5 stands out as both of the models have a large context window (256K), which makes them a perfect fit for the development of agentic AI systems. Also, they're available under an open licence, suiting enterprises and developers alike.
???Putting more tools in the hands of developers, Google has added a new feature to their AI Studio platform. Prompt Gallery, as they decided to call it, is integrated with pre-built prompts that get the most out of their Gemini models, covering a broad range of applications, like coding, productivity tools, and even a maths tutor.
???Mark Zuckerberg and Daniel Ek, from Meta and Spotify respectively, have shared their thoughts on Europe's AI regulations. Earlier this year, rules were drafted to be applied to AI, ensuring its safe and responsible development. However, both CEOs believe these regulations can have an impact on the growing landscape of open-source.
Wednesday 2024/08/21
?? You can now fine-tune GPT4-o (2 min)
Sky's the limit now for developers as OpenAI has finally launched a fine-tuning feature for GPT-4o. This will allow the customization of the model for an infinite list of use cases, ensuring users full ownership of their data and promoting the surge of innovative applications by offering 1M training tokens per day for free next month.
Fine-tuning is available in all paid tiers and includes a fine-tuning option for GPT-4o mini, which will also have free training tokens. As an example of the breakthroughs developers can create with GPT-4o at hand, OpenAI shares the Genie AI software engineering assistant from Cosine, which recently achieved great results in the SWE-bench benchmark.
??Google Research presents HeAR (2 min)
Google Research's latest model showcases how AI can transform the health field. HeAR (Health Acoustic Representations) is a bioacoustic foundation model, trained on 300 million pieces of audio data, which can provide revealing insights into patients' well-being through audio analysis. Through this approach, over time, a sound can lead to a diagnosis.
HeAR also includes a cough model, which has been trained on 100 million cough sounds. As the model is available for custom research, companies like India's Salcit Technologies have used it to develop an application for early tuberculosis detection.
?? Nvidia’s on-device model for gaming (2 min)
Developed with a focus on gaming, Nvidia has presented Nemotron-4 4B Instruct, an on-device small language model that's part of the company's ACE suite. Nemotron harnesses retrieval-augmented generation and function-calling to improve the performance of game characters.
Mecha BREAK, a game developed by Amazing Seasun Games, is currently using the model, showcasing how Nemotron enhances role-playing, creating dynamic gameplay that allows seamless interactions between characters and players.
??OpenAI has revealed that Condé Nast is joining their list of partnerships, allowing content from leading publications to appear in ChatGPT, and now in their SearchGPT prototype. Through this alliance, which includes brands like Vogue, Wired, The New Yorker, and more, OpenAI continues to forge a path for AI in journalism.
??Microsoft introduced new versions of their Phi models to their family. Phi 3.5 is multimodal, with an open licence, and comes in mini, MoE, and Vision versions with 3.82B, 41.9B, and 4.15B parameters each. They've improved in tasks like reasoning, and image and video analysis, surpassing leading models like Gemini 1.5 or Meta's Llama 3.1 in certain benchmarks.
领英推荐
Tuesday 2024/08/20
?? Build your own robot with Hugging Face (3 min)
Hugging Face's mission involves the democratisation of AI development through openness, and they've taken this effort to a field that's set to bring major advancements to the AI community: robotics. This started with the launch of their LeRobot platform a few months ago, and now it continues to grow as they've released a tutorial that guides developers through building and training AI-powered robots.
Now, developers can build their own robots and teach them to move with just a laptop—an activity that, long ago, was only available to corporations and institutions is now in the hands of those who are curious enough. We can expect big innovations to come from this approach.
Many predict that a time will come when it will be difficult to differentiate humans from AI bots on the internet. Although this remains a possibility, it hasn't happened yet, and at the moment, researchers are looking for ways to prevent it from happening. A group that includes members from OpenAI, Microsoft, MIT, and others is developing a method called "Personhood Credentials" to tackle the problem.
What they hope to accomplish with this is a way to diminish the surge of AI bots, as well as avoid identity theft as they become more capable. The digital credentials will confirm that the holder is a human without revealing more information.
OpenResearcher is the result of a joint effort by the Generative AI Research Lab (GAIR). As its name reveals, this is an open-source project with a focus on scientific research, a domain that is being transformed by the use of AI. The application works as an assistant and easily answers researchers' questions.
By combining retrieval augmentation from the internet with base knowledge from ArXiv, OpenResearcher can ask reorienting questions to guide the analysis. It also includes customised tools that accelerate the research process.
??AMD is demonstrating that Nvidia isn't the only player in the AI arena. They've recently announced the acquisition of ZT Systems, a company specialising in building custom computing infrastructure for AI "hyperscalers." The deal has a valuation of $4.9 billion, and it seeks to improve the speed at which AMD develops its technologies.
???After announcing it back in May and with a limited rollout following, Google has made its Imagen 3 generator available to US users through their AI Test Kitchen. The image generator has received great praise from Google, as it creates images with higher quality, better detail, and richer lighting. They've also shared a research paper that goes in-depth about the system.
Monday 2024/08/19
?? Nous introduces HERMES 3 (4 min)
Hermes 3 is the latest breakthrough in open-source. This model, developed by Nous Research in collaboration with Lamba Labs, is a fine-tuned version of Meta's Llama 3.1 framework and is available in three parameter sizes: 8B, 70B, and 405B showcasing performance that actually surpasses Llama. It has been trained with synthesised data and a mix of reinforcement learning from human feedback and Neural Magic's FP8 method.
This model has been designed with a user-first mindset, which goes beyond a creative approach, displaying great adaptability to following instructions, long-term context retention, agentic functions, and even internal monologue abilities.
?? LLMs have their own understanding of reality (5 min)
Research from MIT's CSAIL is allowing us to have a glimpse into the thought process of LLMs. By training a model on over 1 million random puzzles and giving it solutions without showing how these solutions work, the model was consequently led to come up with its own.
Although the research was conducted with a small model and a simple programming language, the results are promising, as they showcase ways in which LLMs can develop a better understanding of language, which will further improve their learning capabilities.
?? Geekbench AI’s benchmark for ML (2 min)
Most companies are looking for ways to improve their current evaluation methods to ensure the development of AI technologies goes right. Geekbench is one of them, and they've introduced its benchmarking suite specially designed for machine learning, deep learning, and most AI workloads.
Geekbench AI is aimed at developers, as they can harness it to ensure their apps are working optimally across platforms and for engineers or general users who want to analyse how a specific device uses AI.
???Making image generation easier and getting one step ahead of new offerings, Midjourney has recently updated its website with tools that allow users to customise its AI creations even further. The website now includes a revamped editor with a canvas extension tool, which creates a larger version of the image with new visuals, and inpainting, which transforms selected areas through prompts.
??A 45-year-old ALS patient has regained his voice thanks to AI. Research by UC Davis has implemented a brain-computer interface and neural sensors to capture the brain commands involved in speech and translate them into words, which are processed by AI text-to-speech software based on old recordings of the patient's voice to make it sound exactly like him.