You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was unable to verify the results reported for algorithm MAA2C_NS and TAG
task. Even after correcting for the add_value_last_step=False as per issue #43.
Upon cross validation I found evidence pointing to the possibility of
swapped values between the maximum returns for shared parameters, Table 3, and
the maximum returns, Table 7, for non-shared parameters modalities.
Set configurations: (i) maa2c_ns.yaml according to Section C.1, subsection
MPE PredadorPrey and Table 23 from Supplemental. (ii) Set time_limit=25
in gymma.yaml.
The first consideration is that I have ran experiments for both MAA2C and MAA2C_NS,
and got better results for the MAA2C.
The second consideration is the consistency of results for the Tag task, as reported: We
observe that in all environments except the matrix games, parameter sharing
improves the returns over no parameter sharing. While the average values
presented in Figure 3 do not seem statistically significant, by looking closer
in Tables 3 and 7 we observe that in several cases of algorithm-task pairs the
improvement due to parameter sharing seems significant. Such improvements can
be observed for most algorithms in MPE tasks, especially in Speaker-Listener
and Tag.
Table A groups the results for all the algorithms, minus COMA, for both
modalities for the MPE environment and shows the variation of the results. A
positive change means that the parameter sharing variation has excess of
maximum returns over the non-shared parameters.
Table A: Maximum returns over five seeds for eight algorithms with
parameter sharing (PS), without parameter sharing (NS), and the change in
excess of returns for MPE tasks.
Algorithm
Task
PS
NS
Change (%)
IQL
Speaker-Listener
-18.36
-18.61
1.36%
Spread
-132.63
-141.87
6.97%
Adversary
9.38
9.09
3.09%
Tag
22.18
19.18
13.53%
IA2C
Speaker-Listener
-12.6
-17.08
35.56%
Spread
-134.43
-131.74
-2.00%
Adversary
12.12
10.8
10.89%
Tag
17.44
16.04
8.03%
IPPO
Speaker-Listener
-13.1
-15.56
18.78%
Spread
-133.86
-132.46
-1.05%
Adversary
12.17
11.17
8.22%
Tag
19.44
18.46
5.04%
MADDPG
Speaker-Listener
-13.56
-12.73
-6.12%
Spread
-141.7
-136.73
-3.51%
Adversary
8.97
8.81
1.78%
Tag
12.5
2.82
77.44%
MAA2C
Speaker-Listener
-10.71
-13.66
27.54%
Spread
-129.9
-130.88
0.75%
Adversary
12.06
10.88
9.78%
Tag
19.95
26.5
-32.83%
MAPPO
Speaker-Listener
-10.68
-14.35
34.36%
Spread
-133.54
-128.64
-3.67%
Adversary
11.3
12.04
-6.55%
Tag
18.52
17.96
3.02%
VDN
Speaker-Listener
-15.95
-15.47
-3.01%
Spread
-131.03
-142.13
8.47%
Adversary
9.28
9.34
-0.65%
Tag
24.5
18.44
24.73%
QMIX
Speaker-Listener
-11.56
-11.59
0.26%
Spread
-126.62
-130.97
3.44%
Adversary
9.67
11.32
-17.06%
Tag
31.18
26.88
13.79%
Average Change (%): 7.51%
Total Change (%): 240.40%
More strictly, the differences are even larger when we take into account only the Tag task.
Table B: Maximum returns over five seeds for the Tag task with parameter sharing (PS),
without parameter sharing (NS), the excess of returns of PS over NS, and the change in
excess of returns for the eight algorithms.
Algorithm
PS
NS
Excess of Returns
Change (%)
IQL
22.18
19.18
3
13.53%
IA2C
17.44
16.04
1.4
8.03%
IPPO
19.44
18.46
0.98
5.04%
MADDPG
12.5
2.82
9.68
77.44%
MAA2C
19.95
26.5
-6.55
-32.83%
MAPPO
18.52
17.96
0.56
3.02%
VDN
24.5
18.44
6.06
24.73%
QMIX
31.18
26.88
4.3
13.79%
Average Change: 2.42875 14.09%
Total Change: 19.43 112.75%
Can you confirm that is indeed the case? Or point to the right direction.
Thanks,
The text was updated successfully, but these errors were encountered:
I was unable to verify the results reported for algorithm MAA2C_NS and TAG
task. Even after correcting for the
add_value_last_step=False
as per issue #43.Upon cross validation I found evidence pointing to the possibility of
swapped values between the maximum returns for shared parameters, Table 3, and
the maximum returns, Table 7, for non-shared parameters modalities.
Reproduce:
maa2c_ns.yaml
according to Section C.1, subsectionMPE PredadorPrey and Table 23 from Supplemental. (ii) Set
time_limit=25
in
gymma.yaml
.add_value_last_step=False
Config:
Considerations
The first consideration is that I have ran experiments for both MAA2C and MAA2C_NS,
and got better results for the MAA2C.
The second consideration is the consistency of results for the Tag task, as reported: We
observe that in all environments except the matrix games, parameter sharing
improves the returns over no parameter sharing. While the average values
presented in Figure 3 do not seem statistically significant, by looking closer
in Tables 3 and 7 we observe that in several cases of algorithm-task pairs the
improvement due to parameter sharing seems significant. Such improvements can
be observed for most algorithms in MPE tasks, especially in Speaker-Listener
and Tag.
Table A groups the results for all the algorithms, minus COMA, for both
modalities for the MPE environment and shows the variation of the results. A
positive change means that the parameter sharing variation has excess of
maximum returns over the non-shared parameters.
Table A: Maximum returns over five seeds for eight algorithms with
parameter sharing (PS), without parameter sharing (NS), and the change in
excess of returns for MPE tasks.
More strictly, the differences are even larger when we take into account only the Tag task.
Table B: Maximum returns over five seeds for the Tag task with parameter sharing (PS),
without parameter sharing (NS), the excess of returns of PS over NS, and the change in
excess of returns for the eight algorithms.
Can you confirm that is indeed the case? Or point to the right direction.
Thanks,
The text was updated successfully, but these errors were encountered: