AI

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

1 Mins read

Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of distribution locality to capture when weak learning is efficiently achievable by regular Transformers, where the locality measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions, distributions with high locality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Furthermore we show that (i) an agnostic scratchpad cannot help to break the locality barrier, (ii) an educated scratchpad can help if it breaks the locality at each step, (iii) a notion of ‘inductive scratchpad’ can both break the locality and improve the out-of-distribution generalization, e.g., generalizing to almost double input size for some arithmetic tasks.


Source link

Related posts
AI

PredBench: A Comprehensive AI Benchmark for Evaluating 12 Spatio-Temporal Prediction Methods Across 15 Diverse Datasets with Multi-Dimensional Analysis

3 Mins read
Spatiotemporal prediction is a critical area of research in computer vision and artificial intelligence. It leverages historical data to predict future events….
AI

NVIDIA Researchers Introduce Flextron: A Network Architecture and Post-Training Model Optimization Framework Supporting Flexible AI Model Deployment

3 Mins read
Large language models (LLMs) such as GPT-3 and Llama-2 have made significant strides in understanding and generating human language. These models boast…
AI

Whispering Experts: Toxicity Mitigation in Pre-trained Language Models by Dampening Expert Neurons

1 Mins read
An important issue with Large Language Models (LLMs) is their undesired ability to generate toxic language. In this work, we show that…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *