登录查看更多内容

CVPR 2024 Papers

Tim Reha

Creative Technologist | Electric Sports | Sales | Launches | Digital Marketing | Video | Social Media | Generative AI | GTM | Product Marketing | SEO | PR | Events

发布日期: 2024年6月25日

Attending #CVPR2024 in Seattle was an incredible experience! The conference showcased a diverse range of groundbreaking papers, highlighting the latest advancements in computer vision and pattern recognition. From innovative neural network architectures to cutting-edge applications in autonomous driving and healthcare, the sessions were truly inspiring.

Key highlights included:

Impressive Papers: The presentations covered various topics such as generative models, visual recognition, and augmented reality, pushing the boundaries of what's possible in the field.
Engaging Workshops: Interactive workshops provided deep dives into specialized topics, offering hands-on experiences and valuable networking opportunities.
Poster Sessions: These sessions were particularly insightful, allowing for in-depth discussions with researchers and gaining a better understanding of their work.

Overall, CVPR 2024 was an enriching event that provided a fantastic platform for learning, collaboration, and inspiration. Looking forward to applying the insights gained and staying connected with the brilliant minds I met!

About CVPR

CVPR is the foremost computer vision event of the year. Covering advances in computer vision, pattern recognition, artificial intelligence (AI), machine learning, and more, it is the field’s must-attend event for computer scientists and engineers, researchers, academia, technology-forward companies, and of course, media.?

With a breadth of ways to experience the subject matter, from in-depth workshops and tutorials to research presentations and exhibits, as well as direct access to the leading scientists, technologists, and industry experts, CVPR 2024 is the most comprehensive forum to learn, debate, and get the latest details on the most innovative developments within the industry.???

CVPR Papers

As a press member covering the prestigious Computer Vision and Pattern Recognition (CVPR) conference, I witnessed firsthand the immense scale and quality of this event. In 2024, CVPR saw a remarkable 11,532 paper submissions, with 2,719 making the cut. To help you navigate through this wealth of knowledge, I've created a repository featuring the crème de la crème of CVPR publications. If you don't find the paper you're looking for in my curated shortlist, I invite you to explore the full list of accepted papers for additional insights. #HatTip to Piotr Skalski who posted his list here: https://github.com/SkalskiP/top-cvpr-2024-papers

I am adding my top picks every day and will add more papers as I dive deep into all of the amazing research and development!

https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers

3D from multi-view and sensors

?? SpatialTracker: Tracking Any 2D Pixels in 3D Space Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou [paper] [code] Topic: 3D from multi-view and sensors Session: Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models Lukas H?llein, Alja? Bo?i?, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollh?fer, Matthias Nie?ner [paper] [code] [video] Topic: 3D from multi-view and sensors Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20

Deep learning architectures and techniques

?? Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan [paper] [video] [demo] [colab] Topic: Deep learning architectures and techniques Session: Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102

Efficient and scalable vision

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel [paper] [code] [demo] Topic: Efficient and scalable vision Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #130

Explainable computer vision

?? Describing Differences in Image Sets with Natural Language Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy [paper] [code] Topic: Explainable computer vision Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #115

Image and video synthesis and generation

?? Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models Daniel Geng, Inbum Park, Andrew Owens [paper] [code] [colab] Topic: Image and video synthesis and generation Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #118

Low-level vision

XFeat: Accelerated Features for Lightweight Image Matching Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento [paper] [code] [video] [demo] [colab] Topic: Low-level vision Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #245

Robust Image Denoising through Adversarial Frequency Mixup Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han [paper] [code] [video] Topic: Low-level vision Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #250

Multi-modal learning

?? Improved Baselines with Visual Instruction Tuning Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee [paper] [code] Topic: Multi-modal learning Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #209

Recognition: categorization, detection, retrieval

DETRs Beat YOLOs on Real-time Object Detection Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen [paper] [code] [video] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229

领英推荐

Anil Ananthaswamy: The Elegant Math Behind AI

Danielle Newnham 6 个月前

Writing a collaborative paper in #AI as an industry…

Ajit Jaokar 7 个月前

Websites to Access Research Papers

Vipin Jain (Ph.D, FCMA) 1 个月前

YOLO-World: Real-Time Open-Vocabulary Object Detection Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan [paper] [code] [video] [demo] [colab] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223

?? Object Recognition as Next Token Prediction Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim [paper] [code] [video] [colab] Topic: Recognition: Categorization, detection, retrieval Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #199

Segmentation, grouping and shape analysis

?? RobustSAM: Segment Anything Robustly on Degraded Images Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang [paper] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378

?? Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351

?? Semantic-aware SAM for Point-Prompted Instance Segmentation Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #331

?? General Object Foundation Model for Images and Videos at Scale Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai [paper] [code] [video] Topic: Segmentation, grouping and shape analysis Session: Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #350

Self-supervised or unsupervised representation learning

?? InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai [paper] [code] [demo] Topic: Self-supervised or unsupervised representation learning Session: Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #412

Video: low-level analysis, motion, and tracking

?? Matching Anything by Segmenting Anything Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu [paper] [code] [video] Topic: Video: Low-level analysis, motion, and tracking Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng [paper] [code] Topic: Video: Low-level analysis, motion, and tracking Session: Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455

Vision, language, and reasoning

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang [paper] [code] [video] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327

?? LISA: Reasoning Segmentation via Large Language Model Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia [paper] [code] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee [paper] [code] [video] [demo] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317

?? MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen [paper] Topic: Vision, language, and reasoning Session: Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #382

Summary:

Attending CVPR 2024 in Seattle was an incredible experience. I had the privilege of meeting brilliant minds from top tech companies like Microsoft, Intel, Sony, Facebook, ByteDance, Amazon, and Snap. Additionally, I connected with researchers and professionals from over 25 countries, all contributing to the global brain trust in computer vision, pattern recognition, and generative AI.

I want to extend my heartfelt thanks to the event producers, sponsors, and PR team for making this extraordinary event possible!

Computer Vision Foundation
IEEE Computer Society
Michelle Tubb, CAE , Director of Sales and Marketing at IEEE Computer Society
Colleen Morrison , Principal at CFM Communications, LLC

#CVPR2024 #ComputerVision #AIResearch #TechConference #InnovationInTech #MachineLearning #GlobalNetworking #TechLeaders #GenerativeAI #EventHighlights #ThankYou

Michael Falato

GTM Expert! Founder/CEO Full Throttle Falato Leads - 25 years of Enterprise Sales Experience - Lead Generation Automation, US Air Force Veteran, Brazilian Jiu Jitsu Black Belt, Muay Thai, Saxophonist, Scuba Diver

2 周

Tim, thanks for sharing! Any good events coming up for you or your team? I am hosting a live monthly roundtable every first Wednesday at 11am EST to trade tips and tricks on how to build effective revenue strategies. I would love to have you be one of my special guests! We will review topics such as: -LinkedIn Automation: Using Groups and Events as anchors -Email Automation: How to safely send thousands of emails and what the new Google and Yahoo mail limitations mean -How to use thought leadership and MasterMind events to drive top-of-funnel -Content Creation: What drives meetings to be booked, how to use ChatGPT and Gemini effectively Please join us by using this link to register: https://www.eventbrite.com/e/monthly-roundtablemastermind-revenue-generation-tips-and-tactics-tickets-1236618492199

Hope Frank

6 个月

Tim, thanks for sharing! How are you doing?

Tim Reha

8 个月

So many papers! It will take a month to digest all of the innovations!

查看更多评论

要查看或添加评论，请登录

Tim Reha的更多文章

30 Years of Global Events and Productions

2024年5月7日

30 Years of Global Events and Productions

Here is a short list of my events and productions and photos from building communities, funding startups, launching…

4 条评论
Fujifilm Create with Us Seattle 2023 - A Huge Success!

2023年8月23日

Fujifilm Create with Us Seattle 2023 - A Huge Success!

Fujifilm Wins the Hearts of Creators by Building Community! The highly anticipated Fujifilm Create with Us Seattle…

2 条评论
NAB 2023 Coverage with Silverdraft

2023年4月17日

NAB 2023 Coverage with Silverdraft

Virtual Production at NAB 2023 This year at NAB 2023 I am covering the event with the Silverdraft Team and partners at…
Durango Independent Film Festival 2023

2023年3月4日

Durango Independent Film Festival 2023

2 条评论
Affordable Virtual Production Studios featured at NAB 2022 with Sean von Tagen, President, Pro Cyc and DisruptAR

2022年5月13日

Affordable Virtual Production Studios featured at NAB 2022 with Sean von Tagen, President, Pro Cyc and DisruptAR

Interview with Sean von Tagen, President, Pro Cyc at NAB 2022. Pro Cyc showcased their modular cycloramas and…

9 条评论
Michael Katz, Developer Alliance Manager AMD, w/ Tim Reha at NAB 2022 - Virtualization for Everyone!

2022年5月10日

Michael Katz, Developer Alliance Manager AMD, w/ Tim Reha at NAB 2022 - Virtualization for Everyone!

Michael Katz, Developer Alliance Manager AMD, joins me for an interview at NAB 2022 covering Virtualization for…

5 条评论
Strategies To Unlock Your Autodesk Revit Potential

2020年2月12日

Strategies To Unlock Your Autodesk Revit Potential

Event: Seattle, WA, USA on Wednesday, February 19, 2020 from 11:30AM to 1:30PM @ Seattle Public Library I am excited to…

2 条评论
Interview with Eric Carter, Approach Technology. 2019 Citrix Cloud CSP of the Year Award Winner

2020年1月25日

Interview with Eric Carter, Approach Technology. 2019 Citrix Cloud CSP of the Year Award Winner

Tim Reha: I’m here with Eric Carter, CEO of Approach Technology, who’s just recently returning from the Citrix Summit…

1 条评论
Raffaella Camera, Accenture XR , Shares “What is Coming Next?” for the Event Planning Business: The Accenture XR Event Planner

2020年1月24日

Raffaella Camera, Accenture XR , Shares “What is Coming Next?” for the Event Planning Business: The Accenture XR Event Planner

At CES 2020 it was a pleasure to be invited to the Accenture suite at the Wynn Resort to interview Raffaella Camera…

5 条评论
CES 2020 Trends: Augmented Realty Ranks #1

2020年1月13日

CES 2020 Trends: Augmented Realty Ranks #1

CES2020 is always an interesting experience. One part planning, one part total freestyle when you land in Las Vegas.

2 条评论

See all articles

CVPR 2024 Papers

Tim Reha

Creative Technologist | Electric Sports | Sales | Launches | Digital Marketing | Video | Social Media | Generative AI | GTM | Product Marketing | SEO | PR | Events

About CVPR

CVPR Papers

3D from multi-view and sensors

Deep learning architectures and techniques

Efficient and scalable vision

Explainable computer vision

Image and video synthesis and generation

Low-level vision

Multi-modal learning

Recognition: categorization, detection, retrieval

领英推荐

Segmentation, grouping and shape analysis

Self-supervised or unsupervised representation learning

Video: low-level analysis, motion, and tracking

Vision, language, and reasoning

Summary:

I want to extend my heartfelt thanks to the event producers, sponsors, and PR team for making this extraordinary event possible!

Tim Reha的更多文章

社区洞察

其他会员也浏览了

The Mathematical Miracle of the Qur’an: A Definitive Empirical Analysis

We need to talk about Wikipedia

Academia's Endless Outcry: Same Story, Different Year

Impact FACTOR or FETISH?

Introducing Repertoires: A Series on Scholarly Ways of Working

Innovation and Technology: Peer Review Week 2024 Shines a Spotlight on the Evolving Landscape of Scholarly Communication

"I am a Scientist. I rely on logic and evidence only." #lifebeautiful 1

Academic Myth and Pitfalls of H-index

Are computers the key to inclusivity?

An Academic- Non Academic Dialogue as it should be!

About CVPR

CVPR Papers

3D from multi-view and sensors

Deep learning architectures and techniques

Efficient and scalable vision

Explainable computer vision

Image and video synthesis and generation

Low-level vision

Multi-modal learning

Recognition: categorization, detection, retrieval

领英推荐

Segmentation, grouping and shape analysis

Self-supervised or unsupervised representation learning

Video: low-level analysis, motion, and tracking

Vision, language, and reasoning

Summary:

I want to extend my heartfelt thanks to the event producers, sponsors, and PR team for making this extraordinary event possible!

Tim Reha的更多文章

30 Years of Global Events and Productions

Fujifilm Create with Us Seattle 2023 - A Huge Success!

NAB 2023 Coverage with Silverdraft

Durango Independent Film Festival 2023

Affordable Virtual Production Studios featured at NAB 2022 with Sean von Tagen, President, Pro Cyc and DisruptAR

Michael Katz, Developer Alliance Manager AMD, w/ Tim Reha at NAB 2022 - Virtualization for Everyone!

Strategies To Unlock Your Autodesk Revit Potential

Interview with Eric Carter, Approach Technology. 2019 Citrix Cloud CSP of the Year Award Winner

Raffaella Camera, Accenture XR , Shares “What is Coming Next?” for the Event Planning Business: The Accenture XR Event Planner

CES 2020 Trends: Augmented Realty Ranks #1

社区洞察

其他会员也浏览了

The Mathematical Miracle of the Qur’an: A Definitive Empirical Analysis

We need to talk about Wikipedia

Academia's Endless Outcry: Same Story, Different Year

Impact FACTOR or FETISH?

Introducing Repertoires: A Series on Scholarly Ways of Working

Innovation and Technology: Peer Review Week 2024 Shines a Spotlight on the Evolving Landscape of Scholarly Communication

"I am a Scientist. I rely on logic and evidence only." #lifebeautiful 1

Academic Myth and Pitfalls of H-index

Are computers the key to inclusivity?

An Academic- Non Academic Dialogue as it should be!