Facebook's AI at Work Produces ConViT

Facebook's AI at Work Produces ConViT

Facebook AI says its developed a new computer vision model called ConViT, which combines two widely used AI architectures — convolutional neural networks (CNNs) and Transformer-based models — in order to overcome some important limitations of each approach on its own.?

‘ConViT’, A Computer Vision Model That Improves Vision Transformers (ViT) With Soft Convolutional Inductive Biases

By leveraging both techniques, this vision Transformer-based model can outperform existing architectures, especially in the low data regime, while achieving similar performance in the large data setting.

While AI is still very biased, AI Researchers have been utilizing certain assumptions (inductive biases) that help to train machine learning models.

You can read their July, 2021 paper here.

  • CNNs, which have proved extremely successful for vision tasks, rely on two of these inductive biases built into the architecture itself: that pixels near one another are related (locality) and that different portions of an image should be processed identically regardless of their absolute location (weight sharing).

No alt text provided for this image

Facebook's ConViT could be important in how Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification.

ConViT is a New Computer Vision Model

The Facebook AI Research team has developed a new computer vision model called?ConViT. The ConViT system combines two widely used architectures to overcome some important limitations of each approach on its own, namely convolutional neural networks (CNNs) and Transformer-based models.?

The resulting convolutional-like ViT architecture, ConViT, outperforms the DeiT on ImageNet, while offering a much improved sample efficiency.?

The goal of ConViT was to modify vision Transformers to encourage their networks act convolutionally. They introduced a soft inductive bias that allows the network model itself whether it wants to remain convolutional or not.

They did so by introducing gated positional self-attention (GPSA), where the model learns parameters that control how much standard content-based attention is used compared with an initialized position-based one.

Github: https://github.com/facebookresearch/convit

Paper: https://arxiv.org/pdf/2103.10697.pdf

While the future of Facebook is likely the Facebook Metaverse, Facebook AI will have to get moving if its wants to challenge ByteDance (Beijing's alternative to Facebook) in the future of how the metaverse actually works.

While the current AI is hopelessly biased for various reasons, the researchers hope that their ConViT approach will encourage the community to explore other ways of moving from hard inductive biases to soft inductive ones.

The success of deep learning over the last decade has largely been fueled by models with strong inductive biases, allowing efficient training across domains. What will the deep learning of the future bring?

Facebook is Hyping the Metaverse

Facebook has now successful demonstrated bridging the gap between CNNs and Transformers, by presenting a new method to “softly” introduce a convolutional inductive bias into the ViT.

The Metaverse thanks you for your attention. Zuckerberg (who now holds more centralized power at Facebook since the pandemic) envisions a metaverse as an all-encompassing online world where people can game, work and communicate in a virtual environment, often using VR headsets that has up until now largely been depicted in science fiction movies, not reality.

Nina Palolahti

??International Business Law /Business Management #MBA #LLM #IBL

3 年

Sounds Interesting! Is there any precise data how it might affect people’s interaction and behavior on the platform?:)

回复
Udayanathan Vettikkat

CMO (Global) & Head-Global Channels, Investor, Analyst, Public Relations at SEQURETEK. 38 years of Marketing, Sales and Technology Leadership experience @Cisco, IBM, HCL Tech, Novell, NTT Netmagic, Star TV, CSS Corp

3 年

Wow

回复
Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

3 年

The concentration of $Trillion market caps and AI talent is actually very dangerous for the future of technology as it creates an unbalanced wealth inequality and makes the actions of central bank stimulus less effective as the U.S. stock market is basically weighted too much in just a few companies - who consolidate too much of the extra added liquidity. It turns out China understand what antitrust regulation actually means, not America, this not only creates an innovation bottleneck for the U.S. but serious economic issues for how unbalanced capitalism is becoming. Data capitalism it turns out is risky for the entire system even as these duopolies want to maintain their surveillance capitalism dominance - the more dominant they become - the worse it is for everyone.

Jyotsna Sharma

Product Engineering/Management: Samsung/ex-Motorola/Application Development/Health/Security

3 年

I believe its good breakthrough. Though I am still not convinced with idea of always wearing VR headsets. May be more innovation to make VR glasses available as simple lens would help!

回复
Randall Hoogerhyde

CAD Senior Software Engineer/Project Manager | Windows/Web Software Developer | Full Stack MCPD Consultant - DISABLED | Retired 2013

3 年

Good luck with all that. Just keep trying abd see where you get. You never know.

回复

要查看或添加评论,请登录

Michael Spencer的更多文章

  • The Fundamental Lie of OpenAI's Mission

    The Fundamental Lie of OpenAI's Mission

    Welcome Back, Everyone from OpenAI to DeepSeek claims they are an AGI startup, but the way these AI startups are…

    13 条评论
  • Vibe Coding: Revolution or Regression Students and Non-coders?

    Vibe Coding: Revolution or Regression Students and Non-coders?

    Good Morning, As the vibe coding interface takes shape, I’ve been checking out a new startup coming out of stealth this…

    9 条评论
  • The Truth about DeepSeek's Integration in China and WeChat Explained

    The Truth about DeepSeek's Integration in China and WeChat Explained

    DeepSeek's rapid integration in China is a bigger story that is being told. It's not just the China Cloud leaders…

    4 条评论
  • How AI Datacenters Work

    How AI Datacenters Work

    Good Morning, Get the full inside scoop on key AI topics for less than $2 a week with a premium subscription to my…

    5 条评论
  • How Nvidia is down 30% from its Highs

    How Nvidia is down 30% from its Highs

    If like me, you are wondering why Nvidia is down more than 20% this year even when the demand is still raging for AI…

    8 条评论
  • What DeepSeek Means for AI Innovation

    What DeepSeek Means for AI Innovation

    Welcome to another article by Artificial Intelligence Report. LinkedIn has started to "downgrade" my work.

    16 条评论
  • What is Vibe Coding?

    What is Vibe Coding?

    Good Morning, Get access to my best and complete work for less than $2 a week with premium access. I’m noticing two…

    23 条评论
  • TSMC "kisses the Ring" in Trump Chip Fab Announcement

    TSMC "kisses the Ring" in Trump Chip Fab Announcement

    Good Morning, To get the best of my content, for less than $2 a week become a premium subscriber. In the history of the…

    9 条评论
  • GPT-4.5 is Not a Frontier Model

    GPT-4.5 is Not a Frontier Model

    To get my best content for less than $2 a week, subscribe here. Guys, we have to talk! OpenAI in the big picture is a…

    16 条评论
  • On why LLMs cannot truly reason

    On why LLMs cannot truly reason

    ?? In partnership with HubSpot ?? HubSpot Integrate tools on HubSpot The HubSpot Developer Platform allows thousands of…

    3 条评论

社区洞察

其他会员也浏览了