MACRPO: Multi-agent cooperative recurrent policy optimization

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKargar, Eshaghen_US
dc.contributor.authorKyrki, Villeen_US
dc.contributor.departmentDepartment of Electrical Engineering and Automationen
dc.contributor.groupauthorIntelligent Roboticsen
dc.date.accessioned2025-01-10T15:18:54Z
dc.date.available2025-01-10T15:18:54Z
dc.date.issued2024en_US
dc.description.abstractThis work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic’s network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents’ rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.en
dc.description.versionPeer revieweden
dc.format.extent15
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKargar, E & Kyrki, V 2024, ' MACRPO: Multi-agent cooperative recurrent policy optimization ', Frontiers in Robotics and AI, vol. 11, 1394209 . https://doi.org/10.3389/frobt.2024.1394209en
dc.identifier.doi10.3389/frobt.2024.1394209en_US
dc.identifier.issn2296-9144
dc.identifier.otherPURE UUID: fb8c8967-00a7-4282-83ba-8f2297ea1777en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/fb8c8967-00a7-4282-83ba-8f2297ea1777en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85213888669&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/169962479/frobt-4-1394209.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/132830
dc.identifier.urnURN:NBN:fi:aalto-202501101126
dc.language.isoenen
dc.publisherFrontiers Research Foundation
dc.relation.ispartofseriesFrontiers in Robotics and AI
dc.relation.ispartofseriesVolume 11
dc.rightsopenAccessen
dc.rightsCC BYen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordinformation sharingen_US
dc.subject.keywordmulti-agenten_US
dc.subject.keywordinteractionen_US
dc.subject.keywordcooperativeen_US
dc.subject.keywordpolicyen_US
dc.subject.keywordreinforcement learningen_US
dc.titleMACRPO: Multi-agent cooperative recurrent policy optimizationen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files