The ALICE Project

Introduction

Large Language Models have revolutionized code generation and execution tasks, significantly enhancing development efficiency, lowering barriers to coding, and improving code quality. However, existing frameworks struggle to perform well in complex development cycles with various industry standards that heavily rely on coordination and communication. Additionally, we need more suitable data to align the system, especially in highly interactive scenarios.

To address these issues, we developed a multi-agent collaboration system that treats humans and LLMs as agents, enabling them to interact. This system facilitates error correction before execution and allows adjustments based on existing code. To enhance code performance and execution efficiency, we defined meta-agents and meta-tools, abstracting user instructions into modular tasks that can be executed in parallel. We aligned the system by leveraging the data we obtained to achieve continuous self-improvement. Finally, we developed an efficient meta-agent collaboration system integrating interactive code generation and parallelizable system execution feedback, which we call ALICE.

ALICE has several applications:

ALICE is able to generate high-quality data through multi-turn interactions and feedback without human intervention. Most importantly, we can produce data with traces from agent strategies like ReAct and Reflexion, which are scarce but offer potential for aligning advanced LLMs.
We can generate multimodal data from code based on human instructions, which can be used to train a Sora-like model with modularized refinement.
ALICE can develop highly interactive open-ended games, generate immersive 3D scenes in VR, and act as world simulators.

Currently, we focus on code generation and execution in game engines like Unity, as they provide a fully controllable, observable, and modular virtual environment. We plan to generalize this system to broader domains in the future.

Method

We propose an efficient meta-agent collaboration framework named ALICE, designed to follow user instructions through active communication, aligning with their intentions, and generating code based on continuous feedback and interaction. This section will discuss 3 main components of ALICE, Controller LLM, Intent LLM and Execution LLM.

Controller LLM for Communication

ALICE distinguishes from traditional multi-agent collaboration frameworks by its allowance of dynamic creation and management of agents and tools.

Instruction Prompts as Agent Creation: by employing a modular approach where different agents are created to handle distinct parts of the task sequence, ALICE achieves a level of personalization and responsiveness that is more faithful to the user's intentions.
Tools as Agent Instruction: In ALICE, tools (like RAG, execution and vision feedback) are not merely supplementary; they are integral to how agents perceive and execute tasks. Tools can be used to create other tools that affect how each agent behave. Agent creation via prompt is also a core tool of the controller LLM.

We setup a controller LLM with the following fixed operations that manage and interact with a suite of intention LLMs, which can be treated as general LLM agents seen in other frameworks.

Create/Edit/Delete: modifying another controller's instruction prompt, creating or deleting a new controller by writing done its instruction prompt.
Route Setup: setup communication route for a new intention LLM to interact with other intention LLMs it controls.
Call: send instruction to one intention LLM under its control.

This configuration enables the controller LLM to act as the creator of other LLMs, which are essentially LLMs with different instruction prompts. Note that the LLMs it creates are also controllers, i.e. they are equipped with the same 3 operations as their parent controller. The controller LLMs do not do any task; instead, they are communicators with other LLMs and the user.

The following figure gives an example of a Controller LLM.

Intent LLM for Alignment

Intent LLMs are named by their nature of following personalized instructions. Each intent LLM is in charge of all the changes of a special code script, that is called a game component, and an execution LLM.

Game Components
Physic engines usually use a component-based architecture, where each object in the virtual scene can have various components attached to them that perform operations in parallel. This modular approach allows developers to compose behavior and properties by attaching different components to objects, such as renderers, scripts, colliders, or custom components. A game component is a script that an object can have, for example, a Terrain.cs script defines how an area of geographic structure is rendered, containing methods (tools) to create trees, grass and mountains. It can be treated as a class of any API library. As such, we attach a fine-tunable LLM to each API script we care about, by reading its class documentation in its declaration, attribute getter and setter, to each method (function) example usage as part of the model prompt.

The intent LLM is equipped with the following operations.

Create/Edit/Delete: modifying tools from the API it is in charge of, for example, it can add or delete a RAG or a vision feedback system for the Terrain.cs API when it see fits to reduce memory overhead or boost execution LLM's performance.
Route Setup: create inherited scripts from existing APIs, which creates new intent LLM, upon doing this, it must ask the controller LLM for the communication route of the new intention.
Call: call into the execution LLM with the task it should do.

This setup allows the intent LLM to be a creator for code scripts, especially higher-level code from existing code. This design is analagous to the idea of skill library in Voyager (Wang et al., 2023), where the agent is prompted to adapt new tools that incorporates existing atomic tools.

Execution LLM for Interactive Coding

An execution LLM controls how an API script should be used when receiving user instructions related to its component update, augmented with tools (functions) written in the given script. In practice, we give it the header file (Terrain.h).

Different from a general instruction-tuned code generation model, an interactive code execution LLM generates code via multi-turn conversation feedback from the game engine and the user. As we modularize the API separation, each execution model can criticize the given instruction for its game component by asking back the parent intent LLM for clarification in code execution with unknown parameters. For example, to add trees in a Terrain component, the user might want to further specify the tree density and range requirements before letting the model proceed with the code generation task. We collect feedback from each execution LLM and send them to the upper level to wait for the user or the controller with a broader knowledge about the APIs to respond.

The interactive code execution LLM is equipped with the operations to generate code, incorporating feedback from the tools it is given from the intent LLM, or report the final code it writes that is ready for execution.

Agent Strategies

A big advantage of the ALICE system is that it is orthogonal to popular agent strategies by its nature of design. The operations that are primarily equipped to each LLM component can be boosted by planning ahead, reasoning through, self-critized, and virtually executed as actions to better follow user instructions.

To use ReAct (Yao et al., 2023), for example, the intent LLM can be equipped with the following action space (tools): reasoning trace/thought, retrieving relevant targets/components in the simulated environment, communication with the user, and finally, answering the question. Essentially, at each trace stage, the agent can either make a reasoning thought, generate code after retrieving relevant components to the task at hand, communicate with the user to learn further specifications, or provide a final answer. The tools it have is controlled by its upper-level controller LLM for efficiency and flexibility.

ALICE example input and output. The ALICE (icon) result from left to right shows the result of GPT-3.5 before, as fine-tuning progress, and after alignment. The ground truth is not included in the training.

Welcome to ALICE

Controller LLM for Communication

Intent LLM for Alignment

Execution LLM for Interactive Coding