CVPR 2024 Papers

CVPR 2024 Papers

Attending #CVPR2024 in Seattle was an incredible experience! The conference showcased a diverse range of groundbreaking papers, highlighting the latest advancements in computer vision and pattern recognition. From innovative neural network architectures to cutting-edge applications in autonomous driving and healthcare, the sessions were truly inspiring.

Key highlights included:

  • Impressive Papers: The presentations covered various topics such as generative models, visual recognition, and augmented reality, pushing the boundaries of what's possible in the field.
  • Engaging Workshops: Interactive workshops provided deep dives into specialized topics, offering hands-on experiences and valuable networking opportunities.
  • Poster Sessions: These sessions were particularly insightful, allowing for in-depth discussions with researchers and gaining a better understanding of their work.

Overall, CVPR 2024 was an enriching event that provided a fantastic platform for learning, collaboration, and inspiration. Looking forward to applying the insights gained and staying connected with the brilliant minds I met!

About CVPR

CVPR is the foremost computer vision event of the year. Covering advances in computer vision, pattern recognition, artificial intelligence (AI), machine learning, and more, it is the field’s must-attend event for computer scientists and engineers, researchers, academia, technology-forward companies, and of course, media.?


With a breadth of ways to experience the subject matter, from in-depth workshops and tutorials to research presentations and exhibits, as well as direct access to the leading scientists, technologists, and industry experts, CVPR 2024 is the most comprehensive forum to learn, debate, and get the latest details on the most innovative developments within the industry.???

CVPR Papers

As a press member covering the prestigious Computer Vision and Pattern Recognition (CVPR) conference, I witnessed firsthand the immense scale and quality of this event. In 2024, CVPR saw a remarkable 11,532 paper submissions, with 2,719 making the cut. To help you navigate through this wealth of knowledge, I've created a repository featuring the crème de la crème of CVPR publications. If you don't find the paper you're looking for in my curated shortlist, I invite you to explore the full list of accepted papers for additional insights. #HatTip to Piotr Skalski who posted his list here: https://github.com/SkalskiP/top-cvpr-2024-papers

I am adding my top picks every day and will add more papers as I dive deep into all of the amazing research and development!

https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers

3D from multi-view and sensors

?? SpatialTracker: Tracking Any 2D Pixels in 3D Space Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou [paper] [code] Topic: 3D from multi-view and sensors Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models Lukas H?llein, Alja? Bo?i?, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollh?fer, Matthias Nie?ner [paper] [code] [video] Topic: 3D from multi-view and sensors Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20

Deep learning architectures and techniques

?? Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan [paper] [video] [demo] [colab] Topic: Deep learning architectures and techniques Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102

Efficient and scalable vision

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel [paper] [code] [demo] Topic: Efficient and scalable vision Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #130

Explainable computer vision

?? Describing Differences in Image Sets with Natural Language Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy [paper] [code] Topic: Explainable computer vision Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #115

Image and video synthesis and generation

?? Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models Daniel Geng, Inbum Park, Andrew Owens [paper] [code] [colab] Topic: Image and video synthesis and generation Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #118

Low-level vision

XFeat: Accelerated Features for Lightweight Image Matching Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento [paper] [code] [video] [demo] [colab] Topic: Low-level vision Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #245

Robust Image Denoising through Adversarial Frequency Mixup Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han [paper] [code] [video] Topic: Low-level vision Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #250

Multi-modal learning

?? Improved Baselines with Visual Instruction Tuning Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee [paper] [code] Topic: Multi-modal learning Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #209

Recognition: categorization, detection, retrieval

DETRs Beat YOLOs on Real-time Object Detection Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen [paper] [code] [video] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229

YOLO-World: Real-Time Open-Vocabulary Object Detection Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan [paper] [code] [video] [demo] [colab] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223

?? Object Recognition as Next Token Prediction Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim [paper] [code] [video] [colab] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #199

Segmentation, grouping and shape analysis

?? RobustSAM: Segment Anything Robustly on Degraded Images Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang [paper] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378

?? Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351

?? Semantic-aware SAM for Point-Prompted Instance Segmentation Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #331

?? General Object Foundation Model for Images and Videos at Scale Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #350

Self-supervised or unsupervised representation learning

?? InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai [paper] [code] [demo] Topic: Self-supervised or unsupervised representation learning Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #412

Video: low-level analysis, motion, and tracking

?? Matching Anything by Segmenting Anything Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu [paper] [code] [video] Topic: Video: Low-level analysis, motion, and tracking Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng [paper] [code] Topic: Video: Low-level analysis, motion, and tracking Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455

Vision, language, and reasoning

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang [paper] [code] [video] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327

?? LISA: Reasoning Segmentation via Large Language Model Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia [paper] [code] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee [paper] [code] [video] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317

?? MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen [paper] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #382

Summary:

Attending CVPR 2024 in Seattle was an incredible experience. I had the privilege of meeting brilliant minds from top tech companies like Microsoft, Intel, Sony, Facebook, ByteDance, Amazon, and Snap. Additionally, I connected with researchers and professionals from over 25 countries, all contributing to the global brain trust in computer vision, pattern recognition, and generative AI.


I want to extend my heartfelt thanks to the event producers, sponsors, and PR team for making this extraordinary event possible!


#CVPR2024 #ComputerVision #AIResearch #TechConference #InnovationInTech #MachineLearning #GlobalNetworking #TechLeaders #GenerativeAI #EventHighlights #ThankYou


Michael Falato

GTM Expert! Founder/CEO Full Throttle Falato Leads - 25 years of Enterprise Sales Experience - Lead Generation Automation, US Air Force Veteran, Brazilian Jiu Jitsu Black Belt, Muay Thai, Saxophonist, Scuba Diver

2 周

Tim, thanks for sharing! Any good events coming up for you or your team? I am hosting a live monthly roundtable every first Wednesday at 11am EST to trade tips and tricks on how to build effective revenue strategies. I would love to have you be one of my special guests! We will review topics such as: -LinkedIn Automation: Using Groups and Events as anchors -Email Automation: How to safely send thousands of emails and what the new Google and Yahoo mail limitations mean -How to use thought leadership and MasterMind events to drive top-of-funnel -Content Creation: What drives meetings to be booked, how to use ChatGPT and Gemini effectively Please join us by using this link to register: https://www.eventbrite.com/e/monthly-roundtablemastermind-revenue-generation-tips-and-tactics-tickets-1236618492199

回复
Hope Frank

Global Chief Marketing, Digital & AI Officer, Exec BOD Member, Investor, Futurist | Growth, AI Identity Security | Top 100 CMO Forbes, Top 50 CXO, Top 10 CMO | Consulting Producer Netflix | Speaker | #CMO #AI #CMAIO

6 个月

Tim, thanks for sharing! How are you doing?

回复
Tim Reha

Creative Technologist | Electric Sports | Sales | Launches | Digital Marketing | Video | Social Media | Generative AI | GTM | Product Marketing | SEO | PR | Events

8 个月

So many papers! It will take a month to digest all of the innovations!

  • 该图片无替代文字
回复

要查看或添加评论,请登录

Tim Reha的更多文章

社区洞察

其他会员也浏览了