AI Takes a Deep Breath, Training Robots While We Sleep, Researchers Hack Copilot to Emit People's API Keys and the Coming Wave of Regulation
AI Infrastructure Alliance
We’re dedicated to bringing together the essential building blocks for the AI/ML applications of today and tomorrow.
Researchers continue to figure out ways to get LLMs to reason and think better, while also producing clearer results.? This in-depth article on prompt-engineering dives into Chain-Of-Verification (COVE). The novel approach to prompt engineering is designed to keep generative AI on the straight and narrow, ensuring that it remains honest and upright in its dealings.
Generative AI, although a marvel of modern technology, is not without its flaws. It can produce inaccuracies, biases, false information, and even downright errors. This is where the COVE technique steps in, offering a systematic method for prompt engineering and verification. By employing a combination of specific verification questions and careful inspection of AI responses, COVE aims to keep these digital entities in check. But it's not just about keeping AI honest - it's also about making sure that the information they provide is as accurate and useful as possible.
This is a significant development for AI developers and users alike. Until now, the onus has largely been on users to double-check the outputs of generative AI. But with the advent of COVE, AI makers are now incorporating checks and balances into their applications, reducing the need for users to second-guess the information they receive. This not only enhances the user experience but also increases the trustworthiness and reliability of AI systems.
However, as with any new technology, the road to perfection is paved with challenges. The verification process can sometimes lead to correct answers being altered incorrectly. This highlights the need for further research to refine and enhance the effectiveness of prompt engineering and verification techniques.
Nobody has delivered a fool-proof method to keep LLMs on the rails, but the wave of techniques are coming fast and furious and it’s only a matter of time before we get a suite of techniques and tools that keep LLMs delivering consistently and accurately.? Eventually those techniques will become a standard stack and every application will benefit from them.
Apparently telling an AI to "take a deep breath" can work wonders. DeepMind has been experimenting with AI models, optimizing their prompts which led to a noticeable improvement in their mathematical abilities. This new and exciting approach, called Optimization by PROmpting (OPRO), leverages natural language to guide these AI models in problem-solving.
Benj Edwards , writing for Ars Technica , digs into the mechanics of OPRO in his latest piece: “In OPRO, two large language models play different roles: a scorer LLM evaluates the objective function such as accuracy, while an optimizer LLM generates new solutions based on past results and a natural language description. Different pairings of scorer and optimizer LLMs are evaluated, including models like PaLM 2 and GPT variants. OPRO can optimize prompts for the scorer LLM by having the optimizer iteratively generate higher-scoring prompts. These scores help the system identify the best solutions, which are then added back into the 'meta-prompt' for the next round of optimization.”
OPRO is a bit like telling a story to the AI models, and they are the protagonists, working their way through the plot. The story uses the two large language models, a scorer and an optimizer, working in tandem to generate and evaluate solutions. Imagine two friends trying to solve a puzzle together, each bringing their unique skills to the table.
What's truly surprising is the impact of specific phrases like "let's think step by step." It's almost as if the AI models are taking the advice to heart, with a significant improvement in the accuracy of their output. The phrase "Take a deep breath and work on this problem step by step" proved to be the golden ticket for Google's PaLM 2 language model, securing an accuracy score of 80.2% on grade-school math word problems. It's like the AI models are school children, responding positively to the patient and methodical guidance of a teacher.
领英推荐
While it's crucial to remember that large language models (LLMs) don't reason like humans, their "reasoning" abilities are derived from an extensive dataset of language phrases. The technique of using specific prompts allows LLMs to produce more accurate and useful results in problem-solving. It's a bit like nudging a friend in the right direction when they're stuck on a tricky crossword puzzle. This discovery could potentially lead to improved performance and applications of LLMs in the future. So, the next time you're struggling with a problem, remember to "take a deep breath and work on it step by step." If it works for AI, it might just work for us too!
Anyscale , the main commercial vendor behind the Ray framework, is now enabling organizations to tweak and deploy open source large language models (LLMs) with ease. The company has even extended its partnership with Nvidia to streamline Nvidia's software for inference and training on the Anyscale Platform. With a track record of helping vendors like Instacart and Pinterest scale AI while trimming costs, Anyscale Endpoints presents a commercially supported version of Ray that offers enterprise capabilities for scaling and deploying training for inference. This service provides API access to open source LLMs, eliminating the need for organizations to deploy or manage the models themselves. Lastly, Anyscale is launching a Private Endpoints service that allows the deployment of Anyscale Endpoints within an organization's own virtual private cloud (VPC), opening up new avenues for customization and optimization.
In the world of AI artistry, OpenAI comes back to the plate with DALL-E 3, a more advanced and conscientious version of its visual art platform. This new model, which seamlessly integrates with ChatGPT, is not just smarter - it's safer. OpenAI has upped the ante on safety measures to thwart any attempts at generating explicit or hateful imagery. For now, DALL-E 3 is exclusive to ChatGPT Plus and ChatGPT Enterprise users, with a public release date still under wraps. Artists, meanwhile, have been given the power to opt out of future text-to-image AI models, a move aimed at dodging potential copyright lawsuits. OpenAI even allows artists to submit their copyrighted images for removal, ensuring DALL-E doesn't step on any creative toes. This comes in the wake of lawsuits against DALL-E's competitors and DeviantArt for alleged copyright infringement.
Toyota Research Institute (TRI) is taking robotics to the next level by developing an innovative method to teach robots new skills overnight. The technique, a fusion of traditional learning styles and diffusion models, has already been used to teach robots 60 different skills. The goal? To develop versatile robots that can adapt to various environments and navigate changes, particularly in less-structured or unstructured spaces. TRI's approach relies heavily on sight and force feedback data, and the system's neural networks continue to train even after the lights are out. The future of robotic learning, it seems, is a sleepless one!
As the world of AI chatbots evolves, we're seeing a shift from generic models like ChatGPT to more specialized ones, custom-tailored to fulfill our specific needs. The crux of this transformation lies in the data, the lifeblood of AI systems. Companies are now leveraging data insights to predict user behavior, with some even charging for API access to their data, making data acquisition a pricier affair. However, this could make obtaining data more costly as AI systems evolve. An intriguing solution? Synthetic data created by AI systems themselves. But it's a delicate balance - this data needs to be different enough to provide fresh insights, yet similar enough to remain accurate. The article also highlights the potential of developing smaller, specific language models that could benefit from expert feedback. So, in the not-so-distant future, we might see a landscape dotted with many of these 'little language models' rather than a few large ones. Interesting times ahead!
Also this week: