AI for Business Leaders - Newsletter 23 May 2024
Daniel Karlsson, PhD
Ask me about AI: for project success I Talks I Courses I Workshops
Last week was PACKED with AI-news, so was my week with other work tasks - hence the newsletter is both late and condensed. We saw a glimpse of the future and I will come back with more thoughts.
Let's go ?? Another amazing week for AI ??
My take-aways:
1???AI can now have engaging conversations and make sense of the world through its eyes
2???AI is integrated into our favourite search tool, Google
3???AI or Environment? Microsoft says both
???Human? or AI? We don’t know for sure anymore (Turing test)
Going into the weeds
1???AI can now have engaging conversations and make sense of the world through its eyes
Summary:
OpenAI had their spring event, showing of several amazing features - I encourage you to go to their website and watch the demos, they show next level capabilities:
??New flagship model: GPT-4o (o for omni), which is truly multimodal.
What does it mean?
Multimodal refers to the ability to interact with various modalities, including text, speech, and video.
But wait… I've been communicating with ChatGPT through voice for several months now.
That's correct, but the conversation has not felt super smooth and responsive until now, since the system has been using a slower architecture. In the "older" architecture your voice is transcribed to text, analyzed by the model, and then the response is transcribed back to speech, i.e.:
voice → text → model analysis → text → speech.
As I understand it, the new model can now process your voice directly, making the back-and-forth conversation much more fluent. Now, it's:
voice → model analysis → speech.
This gives a response time in the order of 0.2 to 0.3 seconds which is normal in human-to-human conversation.
The capability will be released in alpha for plus subscribers in the next few weeks.
?? This architecture also gives the possibility for the AI to see the world (and your screen) and talk with you about it.
Seeing is believing, look at the demos on OpenAIs website, they are amazing, link below.
??Faster, improved non-English handling, 50% cheaper for developers.
Comment: the race to the bottom in terms of cost will continue.
??GPT-4o is the top level GPT and it is free for all users
Comment: Now that everyone has access to the best model for free, those who have been using GPT3.5 can expect a significant performance boost. This will further accelerate usage and development.
??Desktop ChatGPT for macOS (Windows later)
My take
I would like to bundle the news into two categories
Let’s dig a bit into multimodality, I think it will force us to re-think “computers”.
?? Imagine a user manual for a B2B lab instrument. Say you want to find out how to open the lid of your instrument.
Since it can also view your computer screen, we can have similar interactions, such as "please create a bar chart", "could you change the color of the bars to green", and so on.
As a participant in meetings, the AI will have instant access to all previous meetings, all internal company documentation, and the internet.
Will we start talking to computers now?
领英推荐
Yes, I think so, as humans, we naturally opt for the path of least resistance. Speaking and pointing are quicker than typing on a keyboard. Given the progress made by OpenAI, the technology is ripe for this transition.
We see similar capability in Microsoft products (they use OpenAI models), according to rumours Apple will use the same technology in their voice assistant (Siri), the same line of development is seen for Google (project Astra, see below).
?? Time for the rest of us to re-think.
2???AI is integrated into our favourite search tool
Summary
Google released a sleeve of news
??AI Overviews in Search Google's "AI Overviews" will soon synthesize responses to complex queries by providing concise answers from multiple web sources, while maintaining traditional search results for simpler queries.
??Gemini 1.5 Pro and Gemini 1.5 Flash The updated Gemini models can now process text, images, and videos with enlarged context windows, accommodating up to 2 million tokens. Note: We will soon stop discussing context window size as it's on the verge of becoming a solved issue.
??Google Workspace Integration Gemini integrates with Google Workspace tools such as Gmail, enabling advanced AI interactions for efficient email and data management.
??Project Astra - the start of the show ?? A universal AI assistant can answer questions in real-time using video and audio inputs. This agent has the capability to locate objects within a room, demonstrating its potential for advanced personal and professional assistance applications.
My take
What really distinguishes Gemini from GPT4o is its large context window, i.e., the amount of information that Gemini can retain in its working memory during a chat conversation. Gemini can hold the equivalent of 20 books in its working memory, which is sufficient for most research-related tasks.
This implies one less limitation to consider when using these models. Now, we can simply input everything (almost).
3???AI or Environment? Microsoft says both
Summary
January 16, 2020, Microsoft announced their carbon moonshot, pledging to be carbon negative by 2030. Then came generative AI…
Microsoft recently released its 2024 Environmental Sustainability Report, and don’t get me wrong - they are pushing hard to reach the carbon emission targets:
In 2023, the company expanded its contracted portfolio of renewable energy assets to over 19.8 gigawatts across 21 countries. To put this into perspective, a large data center might consume 0.1 gigawatt. Microsoft operates more than 200 data centers.
Both power consumption and the number of data centers are likely to increase due to advancements in AI.
The emissions of indirect greenhouse gas (GHG) have increased by 30.9% from the 2020 baseline.
Note his refers to so-called Scope 3 emissions, which account for over 96% of Microsoft's total emissions. These emissions are not produced directly by the company or the result of the energy consumed by the company, but indirect emissions both upstream and downstream of the company’s operations, such as building materials and hardware components for data centers.
The report acknowledges the challenges presented by the increased demand for AI. This demand escalates energy consumption and complicates the achievement of sustainability targets.
Despite the AI-induced increase in carbon emission, Microsoft leadership said the company remains committed to its environmental goals. Amazing.
My take
Being green is high prio for many companies and so is being profitable. I’m fairly bullish on the outcome since the efforts to reduce energy consumption aligns with being profitable.
However the Microsoft report indicates they allow for temporarily increase the carbon emissions to lead the AI-race.
???Human? or AI?, We don’t know for sure anymore
In a recent study human participants had a 5m-inute conversation with either a human or an AI, and judged whether or not they thought their interlocutor was human.
???GPT-4 was judged to be a human 54% of the time and
???Actual humans were judged as humans 67% of the time…
Pair this news with the recent progress from OpenAI above 1??, displaying capability of very natural speech with response-time and expressiveness like humans and imagine next generation toys, or pretty much everything…
I wonder what we will seek in AI and in humans moving forward?
?? Conversation starter: Do you think you could distinguish a human from an AI?
Thats all for now, see you next week.