Technical Overview: RPGGO's Text-to-Game Framework for AI RPG

Technical Overview: RPGGO's Text-to-Game Framework for AI RPG

The initial paper was prepared by the development team of RPGGO, which can be downloaded from https://arxiv.org/abs/2407.08195.


RPGGO introduced a novel text-to-game framework that leverages generative AI to democratize RPG development. The system combines two core components: Game Building Copilot and Zagii Engine, enabling automated game creation from text descriptions while maintaining dynamic player interactions for non-professional User Generated Content (UGC) creators, who have interests in idea realizations but with little skills in professional coding and design.

In this article, we will give an overview of the key procedures and conclusions on how to build a text-to-game multi-modal system as discussed in the paper, and how games can be rendered.

Core Value Propositions

Traditional development lifecycles typically took substantial time, money, and effort. The revolution of generative AI empowers the design of characters, game mechanics, game assets, etc., effectively reducing the barrier of entry for game design and realizations, bringing broader non-professional developers to join the game.

While traditional games often feature repetitive dialogues and limited storylines, generative AI is particularly good at dynamically responding and rationalizing to players' personalized inputs, creating interesting and unexpected interactions, bringing more freedom and customized experience during players' game experiences.

Unlike traditional RPG development, which creates a complete game, the AI engine establishes only the starting framework of world settings, characters, and initial conditions, which sets the initialization of the AI engine to interpret from.

This paper covers the game-building process, where the AI engine, acting as the "brain" of the game, dynamically generates content and adapts to user input, and through the game rendering process, it creates a personalized narrative with limited human intervention.


Comparing the past processes of game development to the AI powered processes

Core Technical Framework

RPGGO's text-to-game framework is composed of Game Building Copilot and Zagii Engine.

Game Building Copilot - Multi-agent controlled creation process

The Game Building Copilot utilizes multiple AI agents to help creators build worlds, whether based on existing IPs or entirely new creations. Starting from simple user requests on the content, these agents work collaboratively to develop the game. Different agents specialize in distinct tasks, from generating the worldview, characters, narratives, game mechanics, and creating visual and audio content. A dedicated integration agent then combines all these outputs for the game rendering process.

Zagii Engine - Architecture of the Rendering and Interaction Process

Game rendering, powered by the self-developed Zagii Engine, is realized through several interconnected systems. A centralized Message Bus facilitates communication and information sharing among all systems, maintaining data consistency and system-wide coordination. Like a well-orchestrated team, these systems work together to create an immersive gaming experience.


The Game Status Manager module in this framework plays a critical role in tracking game progression and enabling the seamless introduction of new plot elements. The Perception, Memory, Thinking and Action (PMTA) framework we proposed can let NPCs conduct human-like decisions in complex environments.



Role-playing System:

The game agent LLM operates through the PMTA framework, composed of four essential modules: perception, memory, thinking, and action. The perception module processes environmental changes and user inputs. The memory module maintains settings and historical context. The thinking module utilizes RAG to analyze situations and produce dynamic prompts for each character. The action module takes in the details above and executes behaviors. This cognitive architecture enables NPCs to exhibit intelligent and contextually appropriate responses, creating lifelike character interactions.

Player Assistant System:

Through multimodal LLM's text and visual processing capabilities, this system bridges player interaction with the game world. It optimizes character attributes in real-time, provides contextual gameplay suggestions through advanced prompt engineering, and assists strategic decisions via vector-based analysis of historical interactions.

Game Status Manager:

This section manages the pace and goal/task status as in a traditional game, detecting changes and completions and pushing forward the game progress and lead to a final conclusion of sucess/fail. We initiate two modules. One module will guide the LLM to comprehend global settings and integrate information into goals-related prompts for the Cold Start. Another module will check deficiencies and refine the LLM's generation based on synchronous evaluation of the game process as a Real-Time Assessment.

Emergent Narrative System:

This system is developed by introducing the Real-time Narrative Generation and Interactive Narrative Consumption. In Real-time Narrative Generation, a recall mechanism is utilized for producing goal-oriented prompts. Generated narratives vary with concurrent changes in the play situation sent by the copilot and game status manager. Meanwhile, Interactive Narrative Consumptions suggested the NPCs' evolving narratives will be measured by metrics to ensure their accuracy, coherence, relativeness, and storytelling performance.

Multi-modal Rendering System:

Operating through event-driven pipeline architecture, this system manages real-time content generation. It synchronizes frame-level visual outputs, employs neural rendering networks for content synthesis, and maintains temporal consistency through state-tracking mechanisms.

This system utilizes diffusion models to generate images based on the current game scenery and progress. During gameplay, entities of integrated, specific information are recorded and upgraded with add-on information through perception and retrieval. The image assets in them then become reference information for regional conditional diffusion. Combining multi-modal information and regional conditions, the generating workflow achieves semantic accuracy and feature consistency.

Implementation & Experiment

LLM Application Optimization

The framework implemented a Dynamic Prompt Generation system with a two-layer fine-tuning strategy. The base layer utilized general game dialogue data for continuous model improvement, while the specialized layer focused on genre-specific training. This approach enabled the system to generate appropriate responses for different game types and character interactions.

Game Creation Experiments

The experimental phase demonstrated the framework's efficiency through the deployment of 8 game templates across 6 categories. Out of 803 total games created, 746 games (93%) were completed within 24 hours, validating the system's capability for rapid game development. This high completion rate proved the framework's effectiveness in streamlining the game creation process.

Gameplay Testing

Player engagement testing involved 168 selected games that accumulated 60,301 gameplay sessions. The results showed that engagement followed the 80/20 rule, with 29 games receiving over 100 plays and 6 games exceeding 500 plays.

Conclusion

Games created by our text-to-game engine perform well in diversity, interaction, and freedom under UGC cases, which proves the feasibility of our framework. Issues such as cold starts, 2D & 3D assets generation, and A/B testing framework also need to be solved. This research on generating 'Single-player, Multi-NPC' game structure with AI proves that UGC-based RPGs can be done with simplified processes, democratizing the development of creators. It has the potential to grow into large-scale open worlds and unlimited gaming experience in the future.

要查看或添加评论,请登录

RPGGO Inc.的更多文章