AI

Generalizable Error Modeling for Human Data Annotation: Evidence from an Industry-Scale Search Data Annotation Program

1 Mins read

Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context is the occurrence of annotation errors, as their effects can degrade model performance. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps). Drawing on real-world data from an extensive search relevance annotation program, we demonstrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). In contrast to past research, which has often focused on predicting annotation labels from task-specific features, our model is trained to predict errors directly from a combination of task features and behavioral features derived from the annotation process, in order to achieve a high degree of generalizability. We demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors (e.g., 40% efficiency gains for the music streaming application). These results highlight that behavioral error detection models can yield considerable improvements in the efficiency and quality of data annotation processes. Our findings reveal critical insights into effective error management in the data annotation process, thereby contributing to the broader field of human-in-the-loop ML.


Source link

Related posts
AI

How Large Language Models (LLMs) can Perform Multiple, Computationally Distinct In-Context Learning (ICL) Tasks Simultaneously

3 Mins read
Large Language Models (LLMs) have demonstrated remarkable proficiency in In-Context Learning (ICL), which is a technique that teaches them to complete tasks…
AI

Transforming software with generative AI

2 Mins read
Where exactly are we on this transformative journey? How are enterprises navigating this new terrain—and what’s still ahead? To investigate how generative…
AI

Google AI Researchers Propose 'MODEL SWARMS': A Collaborative Search Algorithm to Flexibly Adapt Diverse LLM Experts to Wide-Ranging Purposes

2 Mins read
There is a need for flexible and efficient adaptation of large language models (LLMs) to various tasks. Existing approaches, such as mixture-of-experts…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *