AI

Researchers from ETH Zurich and Microsoft Introduce EgoGen: A New Synthetic Data Generator that can Produce Accurate and Rich Ground-Truth Training Data for EgoCentric Perception Tasks

2 Mins read

Understanding the world from a first-person perspective is essential in Augmented Reality (AR), as it introduces unique challenges and significant visual transformations compared to third-person views. While synthetic data has greatly benefited vision models in third-person views, its utilization in tasks involving embodied egocentric perception still needs to be explored. A major obstacle in this domain is the accurate simulation of natural human movements and behaviors, crucial for steering embodied cameras to capture faithful egocentric representations of the 3D environment.

In response to this challenge, researchers at ETH Zurich and Microsoft present EgoGen, a novel synthetic data generator designed to produce precise and comprehensive ground-truth training data for egocentric perception tasks. At the core of EgoGen lies a pioneering human motion synthesis model that directly utilizes egocentric visual inputs from a virtual human to perceive the surrounding 3D environment. 

This model is augmented with collision-avoiding motion primitives and employs a two-stage reinforcement learning strategy, thereby providing a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly integrated. Unlike previous approaches, their model eliminates the need for a predefined global path and directly applies to dynamic environments.

With EgoGen, one can seamlessly augment existing real-world egocentric datasets with synthetic images. Their quantitative evaluations showcase significant improvements in the performance of state-of-the-art algorithms across various tasks, including mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. These results underscore the efficacy of EgoGen in enhancing the capabilities of existing algorithms and highlight its potential to advance research in egocentric computer vision.

EgoGen is complemented by an easy-to-use and scalable data generation pipeline, showcasing its effectiveness across three key tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. By making EgoGen fully open-sourced, researchers aim to provide a practical solution for creating realistic egocentric training data and serve as a valuable resource for egocentric computer vision research.

Furthermore, EgoGen’s versatility and adaptability make it a promising tool for various applications beyond tasks such as human-computer interaction, virtual reality, and robotics. With its release as an open-source tool, researchers anticipate EgoGen fostering innovation and advancements in the field of egocentric perception and contributing to the broader landscape of computer vision research.


Check out the Paper and Code. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our Telegram Channel


Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.




Source link

Related posts
AI

PRISE: A Unique Machine Learning Method for Learning Multitask Temporal Action Abstractions Using Natural Language Processing (NLP)

2 Mins read
In the domain of sequential decision-making, especially in robotics, agents often deal with continuous action spaces and high-dimensional observations. These difficulties result…
AI

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

3 Mins read
Large Language Models (LLMs) face deployment challenges due to latency issues caused by memory bandwidth constraints. Researchers use weight-only quantization to address…
AI

Self-Route: A Simple Yet Effective AI Method that Routes Queries to RAG or Long Context LC based on Model Self-Reflection

3 Mins read
Large Language Models (LLMs) have revolutionized the field of natural language processing, allowing machines to understand and generate human language. These models,…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *