Reka AI releases Reka Core: Understands images, videos, and audio
Reka AI

Reka AI releases Reka Core: Understands images, videos, and audio

Reka AI, a San Francisco-based AI startup founded by researchers from DeepMind, Google and Meta, is introducing a new multimodal language model called Reka Core.

"Reka is a frontier-class multimodal language model on par with leading models in the industry today. Core was efficiently trained from scratch on thousands of GPUs over a period of a few months." - Reka AI

Available by API, on-premise, or on-device deployment options, Core is the third member in Reka’s family of language models and offers the ability to understand multiple modalities, including image, audio and video, while offering a massive context window, exceptional reasoning skills, and even coding.

Reka Core is one of only two commercially available comprehensive multimodal solutions.

You can test out Reka Core in the Reka Playground.

Even though Reka was trained in less than a year, it matches or beats the performance of top models from leading players in the AI space, including OpenAI, Google and Anthropic.

"Core is comparable to GPT-4V on MMMU, outperforms Claude-3 Opus on our multimodal human evaluation conducted by an independent third party, and surpasses Gemini Ultra on video tasks. On language tasks, Core is competitive with other frontier models on well-established benchmarks." - Reka AI

The table below summarizes a comparison of Core with leading models in the market today.

Reka AI - Reka Core

Reka AI has 3 models: Reka Core, Flash, and Edge. All 3 of their models are trained to handle and analyze multimodal inputs.

Reka Core Capabilities

  1. Multimodal (image and video) understanding. Core is not just a frontier large language model. It has powerful contextualized understanding of images, videos, and audio and is one of only two commercially available comprehensive multimodal solutions.?
  2. 128K context window. Core is capable of ingesting and precisely and accurately recalling much more information.?
  3. Reasoning. Core has superb reasoning abilities (including language and math), making it suitable for complex tasks that require sophisticated analysis.?
  4. Coding and agentic workflow. Core is a top-tier code generator. Its coding ability, when combined with other capabilities, can empower agentic workflows.?
  5. Multilingual. Core was pretrained on textual data from 32 languages. It is fluent in English as well as several Asian and European languages.?
  6. Deployment Flexibility. Core, like our other models, is available via API, on-premises, or on-device to satisfy the deployment constraints of our customers and partners.

Reka Model Showcase:

Reka displayed some impressive results for image and data analysis on their Model Showcase Page:


Reka Core Image Analysis

Reka Core Video:

Reka Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the @3body trailer.

3 Body Problem Trailer - Netflix
Reka tested its Reka Core multimodal language model on Netflix’s “3 Body Problem” and it was able to translate what’s happening onscreen into text. Credit: Reka

Reka Core Use-Cases

Some users cases of Reka’s models include:

  • Image captioning and Tagging
  • Content moderation
  • Engineering
  • Engagement (sales, customer service, support)
  • Direct action / agentic workflows

Reka Core Tech Report:

State-of-the-Art Performance:

  • Reka models demonstrate state-of-the-art performance, especially Reka Core, which is competitive with the best models from other leading companies in both automatic and human evaluations across various benchmarks.

Model Details:

  • Reka Edge and Flash are smaller but powerful models with 7B and 21B parameters, respectively.
  • Reka Core is still in development but shows promising results comparable to leading models like GPT-4.

Training Data and Architecture:

  • Utilizes a diverse mix of public and proprietary data.
  • Incorporates advanced architectural features like a modular encoder-decoder structure and supports multimodal inputs.

Training and Infrastructure:

  • Extensive use of Nvidia’s latest GPUs, with training processes detailed to optimize performance.
  • Emphasis on overcoming the challenges of scaling up training infrastructure and managing computational resources efficiently.

Evaluation and Benchmarks:

  • Comprehensive evaluation across language understanding, multimodal tasks, and specialized domains like medical reasoning.
  • Demonstrates superior capabilities in handling complex queries over long context spans and multilingual content.

User and Developer Accessibility:

  • Models are accessible for use at chat.reka.ai and showcase.reka.ai.
  • Provides APIs and platforms for developers to interact with and integrate these models into various applications.

Ongoing Development and Future Prospects:

  • Continuous improvement is highlighted, with expectations for further advancements in model capabilities and applications.
  • Discussion on the balance of innovation and practical deployment in AI development.

You can try Reka Core at https://chat.reka.ai/

#ai #rekaai #rekacore #aivideo #aiaudio #multimodality

要查看或添加评论,请登录

社区洞察

其他会员也浏览了