Watt's in our Query? Decoding the Energy of AI Interactions
Archana Vaidheeswaran
Building Community for AI Safety | Board Director| Machine Learning Consultant| Singapore 100 Women in Tech 2023
As we greet the New Year with aspirations and resolutions, let's add a critical one to our list: sustainability in our digital lives. With every leap in technology, like GPT-4, we marvel at the new horizons of human-like text generation and problem-solving capabilities. However, as we stand at the dawn of 2024, it's time to shine a light on a less-discussed aspect of these advancements: their environmental impact.
The mechanism that allows LLMs to converse, create, and compute is underpinned by a complex web of power consumption that stretches far beyond the data center. As the community becomes increasingly aware of our ecological responsibilities, we're introducing a game-changer: the AI Carbon Tracker Chrome Extension: Carbon ScaleDown. It's not just a tool; it's a movement towards mindful AI usage that aligns with our planet's health.
In this blog, we'll unfold the narrative of LLM’s power dynamics, from the hefty energy demands of models like GPT-4 to the nuanced differences between text and image processing. We'll explore the role of 'inference' - the AI's day job - in the broader picture of sustainability. Most importantly, we'll showcase how our Chrome Extension, built on React and Tailwind CSS and hosted on AWS, isn't just tracking but actively helping reduce the carbon footprint of your AI interactions.
As we venture into the details of AI's environmental footprint, it's crucial to understand the genesis of our concerns. Foundational models like BERT and GPT-2, while setting benchmarks in machine learning, also highlighted the intensive energy requirements of such technologies. The training of these models is not just a marvel of computation but also a hefty draw on power resources.
Here, the "Energy and Policy Considerations for Deep Learning in NLP" [1] paper focuses on examining the carbon ledger of AI and questioning the sustainability of our digital advancements. The paper talks about BERT’s training, for instance, is not a mere computational task; it's a power-intensive process. To put this into context, the BERT base model with 110 million parameters required 96 hours of training on 16 TPU chips. This is akin to leaving a modern LED light bulb turned on for over a decade. GPT-2, even larger with 1.5 billion parameters, demanded a week of training on 32 TPUv3 chips—a testament to its colossal energy footprint.
This power consumption isn't just about electricity bills but the carbon footprint. Training these models is equivalent to carbon emissions from an average American's life for a year. When we talk about LLM training, we speak the language of kilowatt-hours and carbon emissions, which translate to real-world environmental impact.
What's less discussed in this paper is the ongoing environmental cost as these models are put to work daily by millions. It's this continuous use—that far outstrips the duration of their training.
Cost of Inference
Moving on from the training phase, let's discuss inference, when a machine learning model is used to power applications. Here's where it gets interesting: not all machine learning tasks are equal, especially regarding energy consumption.
Energy consumption varies drastically, with more complex tasks consuming more power. GPT-4, with its billions of parameters, is akin to a digital polymath, capable of composing poetry, coding, and even creating art from textual descriptions. However, this versatility comes with an energy demand that's not just a step but a leap from its predecessors.
Inference, on the other hand, is generally less power-intensive than training. Once a model is trained, running it for inference doesn’t require the same level of continuous, heavy computation. Depending on the model's complexity and task, inference can be performed on less powerful machines, including CPUs (though usually not the case for LLMs).
The training phase is responsible for the bulk of the carbon footprint associated with LLMs. The extensive use of thousands of high-powered GPUs and the duration of training contribute significantly to greenhouse gas emissions.
The carbon footprint of the inference phase is markedly lower compared to training. Since inference requires fewer computational resources, the associated emissions are correspondingly reduced. However, it's crucial to consider the frequency of inference operations. The cumulative carbon footprint can become substantial in applications where LLMs are queried incessantly.
The real-world environmental cost of using LLMs hinges on the scale and frequency of their application. Services that continuously rely on these models for real-time responses, like chatbots or content generation tools, can accumulate significant energy usage over time.
Inference at Meta
Meta has been notably transparent about the environmental impact of its AI operations. In a paper, they disclosed that power is allocated in a 10:20:70 ratio within its AI infrastructure across three key phases: Experimentation, Training, and Inference—with Inference consuming the lion's share.
This distribution reflects a crucial aspect of AI usage: while Experimentation and Training are intensive, they are finite phases, and Inference is a long-running process. As such, the carbon emissions from Inference accumulate over time, potentially surpassing the total emissions from the initial training of the model.
The diagram from the paper showcases the operational carbon footprint of various large-scale machine-learning tasks. The black bars represent the carbon footprint during the offline training phase. This phase has a substantial carbon impact, indicating the energy required to process and learn from massive datasets.
The orange bars, although fewer, indicate that the models undergoing online training also contribute notably to carbon emissions. Online training allows models to update and refine their learning continuously, which, while beneficial for performance, adds to the carbon footprint.
The patterned bars illustrate the carbon footprint during the inference phase. For many models, this footprint is smaller per unit of time compared to training phases. However, because the inference is ongoing, these emissions will accumulate and, in many cases, eclipses the one-time training emissions, especially for heavily used models.
The Power Dynamic of AI Tasks
Imagine AI tasks as different appliances in your home. Some, like your LED bulb, sip electricity gently. Others, like your air conditioner on a hot day, gulp it down. The paper "Power Hungry Processing" [2] explains this dynamic. Text classification, for instance, is like your LED bulb - it's relatively energy-efficient. But when you have a model that generates text or images, the energy consumption jumps significantly – think of it as moving from the bulb to the air conditioner.
Reaching for the Plug: Energy Considerations for Deployment
The paper doesn't just highlight the differences in tasks. It also leads us to a vital understanding: models like GPT-4, which can do various tasks, tend to consume more energy in inference than models designed for a specific task. This difference can be staggering, especially when these models are deployed largely, serving millions of daily active users.
This brings us to the crux of the matter. When deploying AI in the real world, we must consider the energy it uses every time it's called into action. It's like deciding whether to walk, bike, or drive to work, considering the impact on your wallet and the environment.
When we think of LLM, we often marvel at its ability to learn. However, the real-world impact of AI comes from its 'inference' phase—when it applies what it has learned to new data. It's like an AI's day job, interacting with users to provide answers, create content, or make decisions. And just like any job, some tasks use more resources than others.
As we have established, the environmental cost of LLMs depends on the scale of their application. For instance, energy consumption becomes crucial with the increasing use of models like ChatGPT as search engines. A single ChatGPT query might consume around 0.3 kWh, compared to a mere 0.0003 kWh for a standard Google search. This means GPT-3's energy consumption is roughly 1000 times more than a simple Google search, highlighting the significant environmental impact of frequent LLM usage.
While GPT-4's capabilities are a technological tour de force, they also underscore a critical balance between efficiency and capability. The energy used during inference can vary based on the task—whether answering a simple query or generating an intricate image. This is where the energy considerations become pivotal for users and developers alike, highlighting the need for responsible deployment and usage of such advanced AI systems.
Empowering Conscious Computing - The Genesis of Our Chrome Extension
As we've explored the varying power demands of AI models and their tasks, the need for transparency and control over our GPT carbon footprint has never been clearer. This is the cornerstone upon which we built our Chrome extension - a tool designed to measure, manage, and mitigate the energy impact of GPT interactions, specifically for image versus text queries.
Why Carbon ScaleDown?
The Chrome extension incorporates these calculations to give users real-time estimates of the carbon footprint for using ChatGPT, whether they generate text or images. This immediate feedback loop is crucial for raising awareness about AI models' environmental impact and encouraging more sustainable practices among users.
As we navigate the digital landscape, our clicks and queries leave behind an invisible trail of energy consumption. But what if we could illuminate this path and choose a greener route? Enter the Carbon ScaleDown, our innovative Chrome Extension designed to bring transparency and control to your digital carbon footprint.
How does it work?
Unfortunately, we do not know much about the inference infrastructure of OpenAI. Sam Altman provided some insights into this in a tweet where he suggested that a single prompt costs "probably single-digits cents,” giving it a worst-case scenario of $0.09 per request.
领英推荐
This stack exchange answer delves into how this figure is arrived at and what it means in more tangible terms.
The cost of processing an AI request is not just a matter of computational power but also involves significant energy consumption. Altman estimates that at least half of the cost of a single AI request can be attributed to energy usage. Considering the energy cost at $0.15 per kilowatt-hour (kWh), we can dissect the expenses further:
Cost per Request: $0.09
Proportion of Energy Cost: 50% of the total cost
Energy Price: $0.15 per 1 kWh
Using these figures, the energy consumption per AI request can be calculated as follows:
Plugging in the numbers, we get an Energy Consumption of 0.3kWh/request
This translates to 300 watt-hours (Wh) per request.
Consider the energy required to charge a smartphone to put this into a more relatable context. An average smartphone charge might take about 5Wh. Therefore, the energy used for a single request to ChatGPT is equivalent to charging a smartphone 60 times!?
Calculating the Carbon Footprint of GPT4
The energy consumption of LLMs in a given time frame (say, one hour) can be calculated using the formula:
From leaks and tweets, we can get an estimate about the infrastructure running ChatGPT:
Number of Hardware Units: 28,936 Nvidia A100 GPUs
TDP: 6.5 kW/server
PUE: 1.2 (a measure of how efficiently a data center uses energy). This number is reported by Azure.
This formula gives us the total energy consumption for the ChatGPT infrastructure in one hour.
Next, we need to determine the total number of tokens generated by ChatGPT in one hour:
From leaks and industry estimates, we know the following:
DAU: 13 million
Average Daily Queries/User: 15
Average Tokens/Query: 2,000
With this data, we can finally calculate the Energy needed to generate each token. We can use that to calculate the carbon footprint:
Energy per Token: Based on the energy consumption and total tokens calculated.
gCO2e/KWh: 240.6 gCO2e/KWh for Microsoft Azure US West
This gives the gCO2e for the operations of GPT-4 to be 0.3 gCo2e for 1k tokens
Estimation for DALL-E 3
For DALL-E 2, the estimated carbon footprint was 2.2 gCO2e per image. Assuming technological advancements, increased efficiency, and the complexity of DALL-E 3, we hypothesize that DALL-E 3 might have a carbon footprint of at least four gCO2e per image.
Typical ChatGPT Emissions
So, a typical query with 1 thousand tokens and?two generated images will release approximately 8.3 gCO2e of carbon, equivalent to charging one smartphone or driving 30 meters in a gas-powered car.
Every ChatGPT conversation is made up of tokens, which are essentially chunks of text that the AI processes. We use the 'Tiktoken' library to count the tokens in the chat. We also monitor the number of images generated during your chats. This is important because image generation typically requires more computational power than text, thus having a higher carbon footprint.
The extension calculates the carbon footprint with the data on tokens and images. It uses predefined metrics that estimate the energy consumption per token and image, considering the average energy mix used to power the ChatGPT servers. These calculations translate digital activity into measurable environmental impact, expressed in grams of CO2 equivalent (gCO2e).
All the data is accessible through a user-friendly web application, where you can visualize your cumulative impact over time. This feature aims to increase awareness and encourage more sustainable digital practices.
As such, the extension continuously recalculates the carbon footprint with each new chat. This real-time updating ensures that users have the latest information on their environmental impact.
Our Vision: A Dashboard for Change
Knowledge is power(literally), and with Carbon ScaleDown, you're armed with the knowledge to make greener choices. It's a tool that doesn't just inform you but also transforms your online behavior. You'll see the impact of choosing a text response over an image, asking concise questions, and our collective power to shape a more sustainable AI future.
This extension is just the beginning. We're working towards a comprehensive dashboard allowing you to track, analyze, and reduce your digital carbon footprint over time. Imagine being able to offset your digital activities and strive for not just a net-zero but a net-positive online presence.
Conclusion
Understanding the energy consumption and carbon emissions associated with a single AI prompt brings to light the broader issue of sustainability in the AI sector. It's not just about the financial cost of running these models; it's about their long-term impact on our planet. This realization underscores the importance of ongoing research, development, and awareness within the AI community. We must continually seek ways to optimize the efficiency of these models and reduce their environmental footprint.
Web Content Writer at AEL BERKMAN GLOBAL BUSINESS SOLUTIONS (INDIA) PRIVATE LIMITED
10 个月great