Latest Updates: FREE Llama 3.2 Multimodal & FLUX.1 [schnell], NVIDIA H200s, and Enterprise Platform
Hey there!
Welcome to the first issue of Together We Build, a handpicked selection of news, product launches, novel research, and AI tools from Together AI , aimed at everyone interested in keeping up with the latest developments in generative AI and LLMs.
These last few weeks were packed with massive launches for vision and image models and major updates to our product, research, and tools.
To ensure you don't miss our future updates, subscribe to the LinkedIn newsletter.
New Models
Llama 3.2 Multimodal - With free endpoint
We partnered with Meta to launch support for the new Llama 3.2 vision models and Llama Stack API. You can access our Llama 3.2?11B ?&?90B ?Turbo endpoints to get ultimate speed and accuracy for vision tasks like image captioning, visual question answering, and image-text retrieval.
Even more exciting, we added a completely?FREE endpoint ?for Llama 3.2 11B Vision!
FLUX1.1 [pro] and FLUX.1 [schnell] - With free endpoint
We're thrilled to provide state-of-the-art image generation with the latest FLUX models! We added an endpoint for the powerful?FLUX1.1 [pro] ?model, plus two endpoints for FLUX.1 [schnell]: a?turbo endpoint ?(with the fastest performance) and a completely?FREE endpoint ?you can use to experiment with open-source image generation at no cost.
Qwen-2.5-7B & 72B
We also welcomed the latest super powerful large language model family by Alibaba Cloud! The 72B model rivals the capabilities of larger models like Llama 3.1 (405B) and Surpasses Qwen2 with higher scores: MMLU & HumanEval (85+), MATH (80+).
Product Updates
GPU Clusters with NVIDIA H200 and the Together Kernel Collection
Our?GPU clusters ?took a huge leap in performance! Now we offer NVIDIA H200 Tensor Core GPUs equipped with our custom-built?Together Kernel Collection , an optimized kernel stack. The result? Up to 24% speedup for operators used frequently in training, and up to 75% speedup for the fundamental operation used in FP8 inference (compared against PyTorch implementations). Fewer GPU hours = cost efficiencies = faster time to market.
Together Enterprise Platform
We launched the?Together Enterprise Platform ?to empower organizations to manage their entire Generative AI lifecycle: training, fine-tuning, and running inference on any model, in any environment. We deliver 2-3x faster inference, and up to 50% lower operational costs on your existing cloud (AWS, Azure, GCP, OCI) or on-premise infrastructure.
Analytics Dashboard
You know how “your AI is only as good as your data?” Well, with our new Together Analytics (beta), we now show your usage over time including requests, latency, and TPM.
Improved Reliability & New Status Page
You've been asking for it, we shipped it! Our reliability scores have been soaring after our recent updates and fixes. We also introduced a handy new?status page ?so you can keep track of the uptime of our different models and services.
New AI Apps
Napkins
To inspire you to build with Llama 3.2 vision models, we launched?napkins.dev ?— a tool where you can upload a screenshot of a simple site/design and get the code in seconds. 100% free and open source!
领英推荐
Blinkshot
We put the FLUX models to the test with?blinkshot.io ?— a tool that generates images as you type. This really shows the performance of our FLUX.1 [schnell] turbo endpoint. Blink once and you might miss it!
Product Descriptions
And to really showcase the flexibility of Llama 3.2 vision models, we also built a?demo app ?where you can upload product images and get descriptions in multiple languages for your e-commerce shop.
Featured Content & Research
Linearizing LLMs with LoLCATs
Meet LoLCATs (Low-rank Linear Conversion via Attention Transfer) — new work from our research team on linearizing LLMs. LoLCATs converts existing Transformers like Llama and Mistral into state-of-the-art subquadratic variants. Now for the same cost as a LoRA finetune!
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Our research team cooked up a?thrilling read ?on their findings in introducing a novel method to distill Transformers into linear RNN architectures, particularly focusing on the Mamba model. The results were impressive, surpassing open-source hybrid models trained with trillions of tokens.
Build Multimodal Document RAG with Llama 3.2 Vision and ColQwen2
RAG is the talk of the town but many are just getting started with building RAG apps. So we rolled up our sleeves and ran a practical?webinar ?where we discussed how you can perform RAG over complex PDF documents using ColPali, Llama 3.2 Vision and ColQwen2.
Together Talks
It's hard to keep up with the growth of AI. So we are bringing the brightest minds to the virtual stage to dive into AI's biggest questions and opportunities. Check out our?first episode ?with our founders, Percy Liang and Vipul Ved Prakash, and the?second episode ?with our Chief Scientist, Tri Dao, and best-selling author and VP of AI & Open Source at Voltron Data, Chip Huyen.
Community Spotlight
Pika 1.5
Huge congrats to our friends Pika for launching Pika 1.5, a powerful model you can use to create stunning footage, longer clips, and jaw-dropping moves with unreal Pikaffects. Fully trained on Together GPU Clusters!
Llama 3.1 in the wild
Quick shoutout to?@KevIsDev ?for creating a modified version of the bolt.new repo using our Llama 3.1 endpoint . Great source of inspiration for other builders out there!
Always stay in the know—subscribe to the LinkedIn newsletter to receive our future updates.
Credly Top Legacy Badge Earner | ISO/IEC FDIS 42001 | ISO/IEC 27001:2022 | NVIDIA | Google | IBM | Cisco Systems | Generative AI
1 个月Thank you for info.