MACRPO: Multi-agent cooperative recurrent policy optimization
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Kargar, Eshagh | en_US |
dc.contributor.author | Kyrki, Ville | en_US |
dc.contributor.department | Department of Electrical Engineering and Automation | en |
dc.contributor.groupauthor | Intelligent Robotics | en |
dc.date.accessioned | 2025-01-10T15:18:54Z | |
dc.date.available | 2025-01-10T15:18:54Z | |
dc.date.issued | 2024 | en_US |
dc.description.abstract | This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic’s network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents’ rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 15 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Kargar, E & Kyrki, V 2024, ' MACRPO: Multi-agent cooperative recurrent policy optimization ', Frontiers in Robotics and AI, vol. 11, 1394209 . https://doi.org/10.3389/frobt.2024.1394209 | en |
dc.identifier.doi | 10.3389/frobt.2024.1394209 | en_US |
dc.identifier.issn | 2296-9144 | |
dc.identifier.other | PURE UUID: fb8c8967-00a7-4282-83ba-8f2297ea1777 | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/fb8c8967-00a7-4282-83ba-8f2297ea1777 | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85213888669&partnerID=8YFLogxK | en_US |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/169962479/frobt-4-1394209.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/132830 | |
dc.identifier.urn | URN:NBN:fi:aalto-202501101126 | |
dc.language.iso | en | en |
dc.publisher | Frontiers Research Foundation | |
dc.relation.ispartofseries | Frontiers in Robotics and AI | |
dc.relation.ispartofseries | Volume 11 | |
dc.rights | openAccess | en |
dc.rights | CC BY | en_US |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject.keyword | information sharing | en_US |
dc.subject.keyword | multi-agent | en_US |
dc.subject.keyword | interaction | en_US |
dc.subject.keyword | cooperative | en_US |
dc.subject.keyword | policy | en_US |
dc.subject.keyword | reinforcement learning | en_US |
dc.title | MACRPO: Multi-agent cooperative recurrent policy optimization | en |
dc.type | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä | fi |
dc.type.version | publishedVersion |