Apple Machine Learning Research at NeurIPS 2024

Apple researchers are advancing the field of ML through fundamental research that improves the world’s understanding of this technology and helps to redefine what is possible with it. This work may lead to advancements in Apple’s products and services, and the benefits of the research extend beyond the Apple ecosystem as it is shared with the broader research community through publication, open source resources, and engagement at industry and research community events.

Next week, the 38th annual Conference on Neural Information Processing Systems (NeurIPS), will be held in Vancouver, Canada. NeurIPS is the largest annual ML and AI research conference, and Apple is proud to once again to participate this important event for the community and to support it with our sponsorship.

At the main conference and associated workshops, Apple researchers will present many papers across a variety of topics in ML. As highlighted below, this includes new works advancing privacy-preserving ML, making multimodal models more capable, improving LLM pretraining, exploring LLMs’ ability to reason, and understanding self-supervised learning.

NeurIPS attendees will be able to experience demonstrations of Apple’s ML research in our booth (#323 in West Hall A), during exhibition hours, and Apple is also sponsoring and participating in a number of affinity group-hosted events that support underrepresented groups in the ML community. A comprehensive overview of Apple’s participation in and contributions to NeurIPS 2024 can be found here, and a selection of highlights follow below.

Advancing Privacy-Preserving ML

At Apple, we believe privacy is a fundamental human right, and advancing privacy-preserving ML techniques is an important area of ongoing research. The works Apple researchers will present at NeurIPS this year include two papers related to federated learning (FL).

Researchers working on FL often conduct experiments in simulation to quickly iterate on new ideas. Apple researchers will present pfl-research: Simulation Framework for Accelerating Research in Private Federated Learning, a fast, modular, and easy-to-use Python framework for simulating FL that will enable the research community to make further progress on this topic.

Apple researchers will also present Private and Personalized Frequency Estimation in a Federated Setting, which describes a new approach using Private Federated Learning to privately compute personalized frequency histograms. Personalized frequencies of words (or tokens) are useful for next-word prediction for keyboard input on user devices. This is challenging because most users have little usage data and users’ diverse vocabularies, topics, and styles lead to varied data distributions. The paper presents a new technique that discovers and leverages similar subpopulations of users, and the approach is shown to outperform existing clustering-based algorithms.

Making Multimodal Models More Capable

Multimodal and multitask models have become increasingly powerful, but their effectiveness can be hindered by limitations in their training data. At NeurIPS, Apple ML researchers will present novel methods to surpass those limitations and enhance the performance of these models.

Large pre-trained vision-language models like CLIP have been shown to generalize well, but can still have difficulty with tasks like fine-grained classification (e.g. identifying car models) for which the visual concepts were under-represented in their pre-training data. At NeurIPS, Apple ML researchers will present Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP, which shows a new method for prompt learning to fine-tune CLIP when limited annotation data is available. With Aggregate-and-Adapted Prompt Embedding (AAPE), textual knowledge is distilled from natural language prompts (generated by human or LLM) to enrich under-represented concepts in the model’s training data. This approach improves the downstream generalization of CLIP, achieving strong performance on various vision-language tasks including image-to-text retrieval, few-shot classification, image captioning and VQA.

While multimodal and multitask foundation models like 4M show promising results, their ability to accept diverse inputs and perform diverse tasks are limited by the modalities and tasks on which they’ve been trained. At NeurIPS, Apple ML researchers and our collaborators from EPFL will present 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities, which shows how to significantly expand upon the capabilities of 4M by training it on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora (see Figure 1). The resulting models are scaled up to 3 billion parameters and showcase strong out-of-the-box vision performance, any-conditional & steerable generation, cross-modal retrieval, and multi-sensory fusion capabilities.

Figure 1: 4M-21 demonstrates training a single model on tens of highly diverse modalities without a loss in performance compared to specialized single/few task models. The modalities are mapped to discrete tokens using modality-specific tokenizers. The model can generate any of the modalities from any subset of them.

Improving LLM Pretraining

LLMs are used in a variety of production applications, including some Apple services, and fundamental improvements to these models could have significant impact for developers and their users across the industry. At NeurIPS, the work Apple ML researchers will present includes a new technique for more efficient LLM pretraining.

LLMs are commonly trained of datasets of fixed-length token sequences, because their training infrastructure often supports only a limited sequence length. To create these, documents of various lengths are combined, and then split into chunks of the specified length. Because documents are randomly combined in this approach, the model may use context from an unrelated document to predict the next token, rather than using context from the relevant document. Beyond being a poor learning signal, this also expends unnecessary computation. Apple researchers will present Dataset Decomposition: Pretrain LLMs with Variable Sequence Lengths, which addresses this issue with a novel method in which a dataset containing documents of various lengths is decomposed into into a union of “buckets” or subsets, with sequences of the same length, then, at training time, the variable sequence length and batch-sizes are used, sampled simultaneously from all buckets (see Figure 2). This enables efficient pretraining on long sequences, scales effectively with dataset size, and is shown to significantly improve model performance on standard evaluations.

Figure 2: Each cell in the figure represents a token. Left: Original documents with variable lengths. Middle: Concat-and-chunk baseline to form sequences with a fixed target length (here = 4). Right: Dataset decomposition method with D1, D2, and D3 buckets.

Exploring LLMs’ Ability to Reason

LLMs have proven capable across many tasks, but the extent to which today’s models can reason remains an important open research question. Understanding these models’ current capabilities and limitations not only enables the research community to continue to improve them, but also helps developers more intelligently leverage LLMs in their production applications.

At NeurIPS, Apple researchers will present How far Can Transformers Reason? The Globality Barrier and Inductive Scratchpad, a paper that investigates why transformer-based models struggle with tasks that require “global reasoning,” where combining learned concepts and extrapolation is required. The work shows that these models are unable to compose long chains of syllogisms (e.g. inferring a⇒c from a⇒b and b⇒c), because they cannot efficiently learn distributions with high globality, and the paper introduces the idea of an “inductive scratchpad” that can enable transformers to surpass these limitations.

Understanding Self-Supervised Learning (SSL)

Effectively and efficiently learning representations is a fundamental goal of deep learning, as these representations can be used for many downstream tasks. By advancing the field’s understanding of how different approaches learn representations, research in this area could ultimately lead to improved performance across those downstream tasks.

At NeurIPS, Apple researchers will present How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks, which explores the differences in how representations are learned with two leading SSL paradigms: Masked Auto Encoders (MAE) and Joint Embedding Predictive Architectures (JEPA). The work shows that in a simplified linear setting where both approaches learn similar representations, JEPAs are biased to learn “high-influence” features (i.e., features characterized by having high regression coefficients), providing a formal explanation for a phenomenon empirically observed in the field, that JEPA seems to prioritize abstract features over fine-grained pixel information.

Demonstrating ML Research in the Apple Booth

During exhibition hours, NeurIPS attendees will be able to interact with live demos of Apple ML research in booth #323, West Hall A, including:

MLX – an open source array framework designed for Apple silicon that enables fast and flexible ML and scientific computing on Apple hardware. The framework is optimized for Apple silicon’s unified memory architecture and leverages both the CPU and GPU. At NeurIPS, the MLX demo will show large model inference and training on device using MLX; specifically, fine-tuning of a 7B parameter LLM on an iPhone, image generation using a large diffusion model on an iPad, and text generation using a number of large language models on a Mac with Apple silicon.

MobileClip – a family of mobile-friendly image-text models with hybrid CNN/Transformer architectures. In combination, these models attain the best accuracy-latency tradeoff. MobileCLIP-B obtains state-of-the-art results on zero-shot classification and retrieval as well as understanding of relationships, attributes, and order information. At NeurIPS, visitors will be able to experience how MobileCLIP performs zero-shot scene classification in real time on an iPhone.

Supporting the ML Research Community

Apple is committed to supporting underrepresented groups in the ML community, and we are proud to again sponsor several affinity groups hosting events onsite at NeurIPS 2024, including Black in AI (workshop on December 10), Women in Machine Learning (WiML) (workshop on December 10), LatinX in AI (workshop on December 10), and Queer in AI (workshop on December 11, social on December 12). In addition to supporting these workshops with sponsorship, Apple employees will also be participating at each of these as well as others.

Learn More about Apple ML Research at NeurIPS 2024

NeurIPS is the largest and one of the most important annual ML research conferences, and Apple is proud to once again share innovative new research at the event and connect with the community attending it. The above post highlights just a handful of the works Apple ML researchers will present at NeurIPS 2024, and a comprehensive overview and schedule of our participation can be found here.

Source link

Apple Machine Learning Research at NeurIPS 2024

Advancing Privacy-Preserving ML

Making Multimodal Models More Capable

Improving LLM Pretraining

Exploring LLMs’ Ability to Reason

Understanding Self-Supervised Learning (SSL)

Demonstrating ML Research in the Apple Booth

Supporting the ML Research Community

Learn More about Apple ML Research at NeurIPS 2024

Leave a Reply Cancel reply

About

Categories

Apple Machine Learning Research at NeurIPS 2024

Advancing Privacy-Preserving ML

Making Multimodal Models More Capable

Improving LLM Pretraining

Exploring LLMs’ Ability to Reason

Understanding Self-Supervised Learning (SSL)

Demonstrating ML Research in the Apple Booth

Supporting the ML Research Community

Learn More about Apple ML Research at NeurIPS 2024

Related posts

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal AI System for Long-Term Streaming Video and Audio Interactions

Cohere AI Releases Command R7B: The Smallest, Fastest, and Final Model in the R Series

How LLMs Store and Use Knowledge? This AI Paper Introduces Knowledge Circuits: A Framework for Understanding and Improving Knowledge Storage in Transformer-Based LLMs

Leave a Reply Cancel reply

About

Categories