whitepaper
  • 1. Introduction
    • 1.1 AI Agents in Web3
    • 1.2 Introduction to DeAgentAI and our mission
  • 2 System Architecture and Workflow
  • 3. Technical Architecture of DeAgentAI’s Multi-Agent Index Network
    • 3.1 Neural Network Infrastructure
    • 3.2 LLM Model Integration
    • 3.3 Copilot3: A Tool LLM Designed for Web3 Scenarios
    • 3.4 InterConnect Rollup: Key for Management and Supervision
    • 3.5 Agent AKKA: Real-time Communication and Weak Consensus in the MAIN System
    • 3.6 QKV Index Network: Core of Intelligent Tool Management
    • 3.7 Agent Registration and Operation: Ensuring Trustworthy Execution Results
    • 3.8 Agent Coordination: Multi-Agent Reinforcement Learning
    • 3.9 Multi-Agent Collaboration in Web3
    • 3.10 User Intention Recognition
    • 3.11 Intent Modeling Techniques
    • 3.12 Advanced Intelligence in Meeting User Needs
  • 4. Applications
  • 5 Security and Privacy Protection
    • 5.1 User Privacy Protection
    • 5.2 Distributed Execution Security
Powered by GitBook
On this page
  1. 3. Technical Architecture of DeAgentAI’s Multi-Agent Index Network

3.8 Agent Coordination: Multi-Agent Reinforcement Learning

Previous3.7 Agent Registration and Operation: Ensuring Trustworthy Execution ResultsNext3.9 Multi-Agent Collaboration in Web3

Last updated 8 months ago

In scenarios where multiple agents must collaborate to achieve a common goal, the MAIN system leverages Multi-Agent Reinforcement Learning (MARL). MARL is a specialized branch of reinforcement learning where multiple agents learn to coordinate their actions in a shared environment, optimizing for collective outcomes rather than individual rewards.

MARL in the MAIN system is built on the principles of Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs), which model the decision-making environment as a series of states and actions that are partially observable by each agent. This framework allows agents to learn and adapt their strategies based on limited information, simulating real-world scenarios where each agent may only have a partial view of the environment. By coordinating their actions, agents work towards maximizing the overall team reward, which is crucial for tasks that require complex, multi-step collaboration.

Enhanced Coordination

To further optimize agent coordination, the MAIN system incorporates enhanced mechanisms such as Leader Guiding Followers and Reward Generating and Distributing (RGD).

The Leader Guiding Followers model is inspired by hierarchical reinforcement learning, where a leader agent, typically with more information or higher ocomputational capabilities, sets specific goals for follower agents. This leader-follower dynamic is facilitated by Meta-Policy Networks, which enable the leader to guide the followers by providing high-level objectives that align with the team’s overall goals. The followers, in turn, optimize their actions to meet these objectives, ensuring that the team operates cohesively and efficiently.

Given a leader agent L L L and multiple follower agents F1,F2,…,Fn F_1, F_2, \dots, F_n F1​,F2​,…,Fn​, the leader agent’s policy πL \pi_L πL​ defines the goal g g g, and the follower agents’ policies πFi \pi_{F_i} πFi​​ optimize their actions to achieve the goal g g g:

g=πL(s),g = \pi_L(s),g=πL​(s),
πFi(ai∣s,g)=arg⁡max⁡aiE[R(s,ai,g)],\pi_{F_i}(a_i | s, g) = \arg \max_{a_i} \mathbb{E} \left[ R(s, a_i, g) \right],πFi​​(ai​∣s,g)=argai​max​E[R(s,ai​,g)],

where s s s is the environmental state, ai a_i ai​ is the action of follower agent Fi F_i Fi​, and R(s,ai,g) R(s, a_i, g) R(s,ai​,g) is the reward received under the goal g g g.

The Reward Generating and Distributing (RGD) mechanism further enhances this coordination by ensuring that rewards are allocated based on contributions towards the team’s success. This system uses Shapley Value Calculations from cooperative game theory to fairly distribute rewards among agents, taking into account the marginal contribution of each agent to the overall outcome. By generating synthetic rewards that reflect the collective achievement and distributing them according to individual contributions, the RGD mechanism incentivizes collaboration and ensures that all agents are motivated to work towards the common goal.

Given a set of n n n agents N={1,2,…,n} N = \{1, 2, \dots, n\} N={1,2,…,n}, and any subset S⊆N S \subseteq N S⊆N, the Shapley value ϕi \phi_i ϕi​ representing the reward distribution for agent i i i is calculated as:

ϕi(v)=∑S⊆N∖{i}∣S∣!(n−∣S∣−1)!n![v(S∪{i})−v(S)],\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(n-|S|-1)!}{n!} \left[ v(S \cup \{i\}) - v(S) \right],ϕi​(v)=S⊆N∖{i}∑​n!∣S∣!(n−∣S∣−1)!​[v(S∪{i})−v(S)],

By combining cryptographic privacy protocols, distributed execution environments, and sophisticated multi-agent coordination strategies, the MAIN system ensures that agent operations are secure, efficient, and aligned with the overarching objectives of the web3 ecosystem. The Agent Registration and Operation mechanisms within the MAIN system are designed to meet the highest standards of security, privacy, and efficiency. Through the use of advanced theoretical models and cutting-edge technologies, the MAIN system not only ensures trustworthy execution results but also fosters an environment where agents can collaborate effectively to achieve complex, multi-faceted tasks. This level of sophistication positions the MAIN system at the forefront of AI development in the web3 space, offering unparalleled capabilities for intelligent, decentralized operations.

where v(S) v(S) v(S) is the contribution value of subset S S S, and v(S∪{i}) v(S \cup \{i\}) v(S∪{i}) is the contribution value when agent i i i joins the subset. By generating synthetic rewards that reflect collective achievement and distributing them according to individual contributions, the RGD mechanism encourages collaboration and ensures that all agents are motivated to work toward the common goal.