AI

AgentGen: Automating Environment and Task Generation to Enhance Planning Abilities in LLM-Based Agents with 592 Environments and 7,246 Trajectories

3 Mins read

Large Language Models (LLMs) have transformed artificial intelligence, particularly in developing agent-based systems. These systems require interacting with various environments and executing actions to achieve specific goals. Enhancing the planning capabilities of LLM-based agents has become a critical area of research due to the intricate nature and essential need for precise task completion in numerous applications.

One significant challenge in this research domain is the intensive manual labor required to create diverse and extensive planning environments and tasks. Current methodologies predominantly depend on manually designed scenarios, limiting the diversity and quantity of training data available. This limitation hampers the potential of LLMs to generalize and perform well across a wide range of situations. Addressing this issue, researchers have introduced automated techniques to generate a broad spectrum of environments and planning tasks, thus enriching the training datasets for LLM-based agents.

The research team from the University of Hong Kong and Microsoft Corporation has proposed a novel framework named AGENTGEN, which utilizes LLMs to automate the generation of environments and their corresponding planning tasks. This innovative approach involves two primary stages: environment generation and task generation. Initially, the framework uses an inspiration corpus comprising diverse text segments to create detailed and varied environment specifications. Following this, AGENTGEN generates related planning tasks that range from simple to complex, ensuring a smooth progression of difficulty and facilitating effective learning for the LLMs.

AGENTGEN distinguishes itself by employing a sophisticated environment generation process. The researchers designed an inspiration corpus to serve as the context for synthesizing environment specifications, which include a comprehensive overview of the environment, descriptions of the state and action spaces, and definitions of transition functions. For instance, one sample text segment might prompt the creation of an environment where the agent is a nutritionist tasked with developing a new recipe book featuring peanut butter powder. This method ensures a high level of diversity in the generated environments, creating numerous unique and challenging scenarios for agent training.

The task generation process within AGENTGEN further enhances the training data by applying a bidirectional evolution method known as BI-EVOL. This method evolves tasks in two directions: simplifying goal conditions to create easier tasks and increasing complexity to develop more challenging ones. This bidirectional approach results in a comprehensive set of planning tasks that support a gradual and effective learning curve for the LLMs—by implementing BI-EVOL, the research team generated 592 unique environments, each with 20 tasks, resulting in 7,246 high-quality trajectories for training.

The efficacy of AGENTGEN was rigorously evaluated using the AgentBoard platform. The results were impressive, demonstrating significant improvements in the planning abilities of LLM-based agents. The AGENTGEN-tuned Llama-3 8B model surpassed GPT-3.5 in overall performance and, in certain tasks, even outperformed GPT-4. Specifically, AGENTGEN achieved over five times the improvement compared to the raw Llama-3 8B on in-domain tasks, with success rates increasing from 1.67 to 11.67. Additionally, AGENTGEN showed a substantial performance enhancement in out-of-domain tasks, achieving a success rate of 29.1 on Alfworld, compared to 17.2 for GPT-3.5.

AGENTGEN demonstrated robust generalization capabilities across various models and tasks. The framework’s success was evident in its ability to improve the planning performance of multiple LLMs, including the smaller 7-8B models. For example, Llama-3 8B, after training with AGENTGEN, exhibited a success rate increase of 10.0 and a progress rate increase of 9.95. These results underscore the effectiveness of AGENTGEN in enhancing the capabilities of LLM-based agents, regardless of the specific model used.

In conclusion, AGENTGEN, by automating the generation of diverse environments and planning tasks, addresses the limitations of manual design and offers a scalable, efficient approach to improving agent performance. The framework’s ability to generate high-quality trajectory data and its demonstrated success in and out of domain tasks highlight its potential to revolutionize the training and application of LLM-based agents. AGENTGEN’s contributions to agent training methodologies are poised to enhance the development of intelligent systems capable of performing complex planning tasks with greater accuracy and efficiency.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 47k+ ML SubReddit

Find Upcoming AI Webinars here



Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.



Source link

Related posts
AI

NOVA: A Novel Video Autoregressive Model Without Vector Quantization

3 Mins read
Autoregressive LLMs are complex neural networks that generate coherent and contextually relevant text through sequential prediction. These LLms excel at handling large…
AI

OpenAI Announces OpenAI o3: A Measured Advancement in AI Reasoning with 87.5% Score on Arc AGI Benchmarks

2 Mins read
On December 20, OpenAI announced OpenAI o3, the latest model in its o-Model Reasoning Series. Building on its predecessors, o3 showcases advancements…
AI

Viro3D: A Comprehensive Resource of Predicted Viral Protein Structures Unveils Evolutionary Insights and Functional Annotations

3 Mins read
Viruses infect organisms across all domains of life, playing key roles in ecological processes such as ocean biogeochemical cycles and the regulation…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *