IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

3 Mins read

Apple is sponsoring the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which is taking place in person from June 17 to 21 in Seattle, Washington. CVPR is the annual computer vision event comprising the main conference and several co-located workshops and short courses. Below is the schedule of our sponsored workshops and events at CVPR 2024.


Stop by the Apple booth in the Arch Building, Exhibit Hall Level 4, booth #1905, from 10:30am – 6:30pm PST June 19 and 20; 10:00am – 3:00pm PST on June 21.

Monday, June 17

Tuesday, June 18

Wednesday, June 19

Thursday, June 20

Friday, June 21

Accepted Papers

Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching
Hongkai Chen, Zixin Luo, Ray Tian, Aron Wang, Lei (VE) Zhou, Xuyang Bai, Mingmin Zhen, Tian Fang, Yanghai Tsin, David McKinnon, Long Quan (The Hong Kong University of Science of Technology)

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu (Nanjing University), Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan (The Hong Kong University of Science and Technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University)

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas, Hubert Tsai, Tim Barfoot (University of Toronto), Jian (AIML) Zhang

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang (University of Illinois Urbana-Champaign), Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pour Ansari

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu, Hadi Pour Ansari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren Yang, Anurag Ranjan, Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

Efficient Diffusion Models without Attention
Jing Nathan Yan (Cornell University), Jiatao Gu, Alexander M. Rush (Cornell University)

HUGS: Human Gaussian Splatting
Muhammed Kocabas (Max Planck Institute for Intelligent Systems), Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal (McGill University), Stefano Alletto, Denis Tome’


MobileCLIP: Real-Time Image-Text Models

Wednesday, June 19 – Friday June 21, during exhibition hours

Demo shows zero-shot scene classification running real-time on an iPhone. Since these models align image and text modalities, they can perform zero-shot image classification or image-text/text-image retrieval at blazing speeds. The app showcases the research work, “MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training” being presented at the same venue. The app is built by David Koski, Megan Maher Welsh with contributions from Hugues Thomas, Mouli Sivapurapu, Jian Zhang.

Flow Composer for Apple ML

Wednesday, June 19 – Friday June 21, during exhibition hours

Demo shows usage of Apple ML features on Mac Book Pro and iPad, which leverages several technologies such as Vision, CoreML, Core Graphics.


Alex Schwing and Philipp Kraehenbuehl are Senior Area Chairs for CVPR 2024.

Alex Toshev, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pour Ansari and Fartash Faghri are Area Chairs for CVPR 2024.

Fartash Faghri, Jason Ren, Jianrui Cai, Jiajia Luo, Jierui Lin, Liangchen Song, Or Dinari, Pavan Kumar Anasosalu Vasu,
Peter Fu, Raviteja Vemulapalli, Haotian Zhang, Hong-You Chen, Wen Shi, Yongzhi Su, Yuyan Li, Trevine Oorloff, Yongxi Lu and Jeff Lai are reviewers for CVPR 2024.

Anshul Shah is a co-organizer for the workshop Learning from Procedural Videos and Language: What is Next?

Jeff Bigham is a co-organizer for the VizWiz Grand Challenge Workshop

Pau Rodriguez Lopez is a co-organizer for the Workshop on Continual Learning in Computer Vision

Jeff Lai has a PhD dissertation selected for Doctoral Consortium.

Source link

Related posts

Bringing Silent Videos to Life: The Promise of Google DeepMind's Video-to-Audio (V2A) Technology

2 Mins read
In the rapidly advancing field of artificial intelligence, one of the most intriguing frontiers is the synthesis of audiovisual content. While video…

Rethinking Neural Network Efficiency: Beyond Parameter Counting to Practical Data Fitting

3 Mins read
Neural networks, despite their theoretical capability to fit training sets with as many samples as they have parameters, often fall short in…

MaPO: The Memory-Friendly Maestro - A New Standard for Aligning Generative Models with Diverse Preferences

3 Mins read
Machine learning has achieved remarkable advancements, particularly in generative models like diffusion models. These models are designed to handle high-dimensional data, including…



Leave a Reply

Your email address will not be published. Required fields are marked *