CVPR 2024 Papers
Attending #CVPR2024 in Seattle was an incredible experience! The conference showcased a diverse range of groundbreaking papers, highlighting the latest advancements in computer vision and pattern recognition. From innovative neural network architectures to cutting-edge applications in autonomous driving and healthcare, the sessions were truly inspiring.
Key highlights included:
Overall, CVPR 2024 was an enriching event that provided a fantastic platform for learning, collaboration, and inspiration. Looking forward to applying the insights gained and staying connected with the brilliant minds I met!
About CVPR
CVPR is the foremost computer vision event of the year. Covering advances in computer vision, pattern recognition, artificial intelligence (AI), machine learning, and more, it is the field’s must-attend event for computer scientists and engineers, researchers, academia, technology-forward companies, and of course, media.?
With a breadth of ways to experience the subject matter, from in-depth workshops and tutorials to research presentations and exhibits, as well as direct access to the leading scientists, technologists, and industry experts, CVPR 2024 is the most comprehensive forum to learn, debate, and get the latest details on the most innovative developments within the industry.???
CVPR Papers
As a press member covering the prestigious Computer Vision and Pattern Recognition (CVPR) conference, I witnessed firsthand the immense scale and quality of this event. In 2024, CVPR saw a remarkable 11,532 paper submissions, with 2,719 making the cut. To help you navigate through this wealth of knowledge, I've created a repository featuring the crème de la crème of CVPR publications. If you don't find the paper you're looking for in my curated shortlist, I invite you to explore the full list of accepted papers for additional insights. #HatTip to Piotr Skalski who posted his list here: https://github.com/SkalskiP/top-cvpr-2024-papers
I am adding my top picks every day and will add more papers as I dive deep into all of the amazing research and development!
3D from multi-view and sensors
?? SpatialTracker: Tracking Any 2D Pixels in 3D Space Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou [paper] [code] Topic: 3D from multi-view and sensors Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models Lukas H?llein, Alja? Bo?i?, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollh?fer, Matthias Nie?ner [paper] [code] [video] Topic: 3D from multi-view and sensors Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20
Deep learning architectures and techniques
?? Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan [paper] [video] [demo] [colab] Topic: Deep learning architectures and techniques Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102
Efficient and scalable vision
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel [paper] [code] [demo] Topic: Efficient and scalable vision Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #130
Explainable computer vision
?? Describing Differences in Image Sets with Natural Language Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy [paper] [code] Topic: Explainable computer vision Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #115
Image and video synthesis and generation
?? Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models Daniel Geng, Inbum Park, Andrew Owens [paper] [code] [colab] Topic: Image and video synthesis and generation Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #118
Low-level vision
XFeat: Accelerated Features for Lightweight Image Matching Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento [paper] [code] [video] [demo] [colab] Topic: Low-level vision Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #245
Robust Image Denoising through Adversarial Frequency Mixup Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han [paper] [code] [video] Topic: Low-level vision Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #250
Multi-modal learning
?? Improved Baselines with Visual Instruction Tuning Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee [paper] [code] Topic: Multi-modal learning Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #209
Recognition: categorization, detection, retrieval
DETRs Beat YOLOs on Real-time Object Detection Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen [paper] [code] [video] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229
领英推荐
YOLO-World: Real-Time Open-Vocabulary Object Detection Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan [paper] [code] [video] [demo] [colab] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223
?? Object Recognition as Next Token Prediction Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim [paper] [code] [video] [colab] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #199
Segmentation, grouping and shape analysis
?? RobustSAM: Segment Anything Robustly on Degraded Images Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang [paper] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378
?? Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351
?? Semantic-aware SAM for Point-Prompted Instance Segmentation Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #331
?? General Object Foundation Model for Images and Videos at Scale Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #350
Self-supervised or unsupervised representation learning
?? InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai [paper] [code] [demo] Topic: Self-supervised or unsupervised representation learning Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #412
Video: low-level analysis, motion, and tracking
?? Matching Anything by Segmenting Anything Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu [paper] [code] [video] Topic: Video: Low-level analysis, motion, and tracking Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng [paper] [code] Topic: Video: Low-level analysis, motion, and tracking Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455
Vision, language, and reasoning
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang [paper] [code] [video] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327
?? LISA: Reasoning Segmentation via Large Language Model Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia [paper] [code] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee [paper] [code] [video] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317
?? MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen [paper] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #382
Summary:
Attending CVPR 2024 in Seattle was an incredible experience. I had the privilege of meeting brilliant minds from top tech companies like Microsoft, Intel, Sony, Facebook, ByteDance, Amazon, and Snap. Additionally, I connected with researchers and professionals from over 25 countries, all contributing to the global brain trust in computer vision, pattern recognition, and generative AI.
I want to extend my heartfelt thanks to the event producers, sponsors, and PR team for making this extraordinary event possible!
#CVPR2024 #ComputerVision #AIResearch #TechConference #InnovationInTech #MachineLearning #GlobalNetworking #TechLeaders #GenerativeAI #EventHighlights #ThankYou
GTM Expert! Founder/CEO Full Throttle Falato Leads - 25 years of Enterprise Sales Experience - Lead Generation Automation, US Air Force Veteran, Brazilian Jiu Jitsu Black Belt, Muay Thai, Saxophonist, Scuba Diver
2 周Tim, thanks for sharing! Any good events coming up for you or your team? I am hosting a live monthly roundtable every first Wednesday at 11am EST to trade tips and tricks on how to build effective revenue strategies. I would love to have you be one of my special guests! We will review topics such as: -LinkedIn Automation: Using Groups and Events as anchors -Email Automation: How to safely send thousands of emails and what the new Google and Yahoo mail limitations mean -How to use thought leadership and MasterMind events to drive top-of-funnel -Content Creation: What drives meetings to be booked, how to use ChatGPT and Gemini effectively Please join us by using this link to register: https://www.eventbrite.com/e/monthly-roundtablemastermind-revenue-generation-tips-and-tactics-tickets-1236618492199
Global Chief Marketing, Digital & AI Officer, Exec BOD Member, Investor, Futurist | Growth, AI Identity Security | Top 100 CMO Forbes, Top 50 CXO, Top 10 CMO | Consulting Producer Netflix | Speaker | #CMO #AI #CMAIO
6 个月Tim, thanks for sharing! How are you doing?
Creative Technologist | Electric Sports | Sales | Launches | Digital Marketing | Video | Social Media | Generative AI | GTM | Product Marketing | SEO | PR | Events
8 个月So many papers! It will take a month to digest all of the innovations!