#19 Amazon Nova = Commoditization?
Hey! Hope all is well. I'm digging the lull of the holidays. A few cool things still came out so let's dive in.
Amazon releases Nova foundation models
Amazon release a suite of their own models - Nova Micro, Lite, Pro, and Premier (coming soon). Text, multi-modal, long context length, agentic workflows - all the good stuff they can do it too.
Also, Canva (text/image to image) and Reel (text/image to video). It's sorta amazing how the vibe of the company affects the level of societal blowback. People were up in arms about OpenAI's Sora but this seemed to slip by or maybe the temperature has cooled.
Taking a peek at the technical report, it's pretty good at the benchmarks compared to the other leading models. Expected.
There are more benchmarks but it's all pretty similar.
Canvas benchmarks are mildly interesting - performing slightly better than the latest models like DALLE 3 and SD 3.5.
Nova Reel is competitive again Runway Gen3 and Luma 1.6
This red-teaming taxonomy is really interesting. This can be applicable to your own applications when you want to red-team your chatbots or other apps.
领英推荐
They also partnered with ActiveFence, Deloitte, Gomes Group, and Nemesys for external red teaming against Chemical, Biological, Radiological and Nuclear (CBRN) capabilities. This is another emerging field that's really interesting. Take a look at the report if you're interested!
Finally, compute. They used custom Trainium1 (TRN1) chips, A100s, and H100s. They also call out EKS, AWS FSx, and S3.
And what a fun name. Of course, you can think of supernovas but NoVA also stands for Northern Virginia. The infamous us-east data centers are in Northern Virginia.
Genie 2
Google's Genie 2 comes 9 months after Genie 1 - what they call a large-scale foundation world model. You may have heard about world models here and there. I can't find the YC podcast that talked about world models but the premise was like can we use synthetic data from engines like Unreal Engine to train models to learn things like gravity.
Conceptually, it's pretty simple. We still have the diffusion model (going from fuzzy to unfuzzy next state). We just also pass in input that changes our decoded state. Turning left looks different from turning right. And we stack them together so we can learn continuity.
The problem becomes this stacking state that needs to be retained - which I'm guessing is why video generation is only like 6 seconds long atm.
OpenAI Stories
Only found this recently but OpenAI publishes customer stories. If you're in doubt of the usefulness of generative AI, look to the source.
Whatever industry you're in, there's probably an article.
Keeping it short - and a quick plug. We're hiring at navapbc.com! Especially software engineers and engineering management. We're making government services simple, effective, and accessible. Great group of ppl. Please reach out!