World Model Based Multi-Agent Proximal Policy Optimization Framework for Multi-Agent Pathfinding

Multi-agent pathfinding plays a crucial role in various robot applications. Recently, deep reinforcement learning methods have been adopted to solve large-scale planning problems in a decentralized manner. Nonetheless, such approaches pose challenges such as non-stationarity and partial observability. This thesis addresses these challenges by introducing a centralized communication block into a multi-agent proximal policy optimization framework. The evaluation is conducted in a simulation based environment, featuring continuous state and action spaces. The simulator consists of a vectorized 2D physics engine where agents are bound by the laws of physics.

Within the framework, a World model is utilized to extract and abstract representation features from the global map, leveraging the global context to enhance the training process. This approach involves decoupling the feature extractor from the agent training process, enabling a more accurate representation of the global state that remains unbiased by the actions of the agents. Furthermore, the modularized approach offers the flexibility to replace the representation model with another model or modify tasks within the global map without the retraining of the agents.

The empirical study demonstrates the effectiveness of the proposed approach by comparing three proximal policy optimization-based multi-agent pathfinding frameworks. The results indicate that utilizing an autoencoder-based state representation model as the centralized communication model sufficiently provides the global context. Additionally, introducing centralized communication block improves performance and the generalization capability of agent policies.