Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are values from Tables 3-7 for task MPE Tag, algorithms MAA2C and MAA2C_NS swapped? #44

Open
gsavarela opened this issue Feb 18, 2023 · 0 comments

Comments

@gsavarela
Copy link

I was unable to verify the results reported for algorithm MAA2C_NS and TAG
task. Even after correcting for the add_value_last_step=False as per issue #43.
Upon cross validation I found evidence pointing to the possibility of
swapped values between the maximum returns for shared parameters, Table 3, and
the maximum returns, Table 7, for non-shared parameters modalities.

Reproduce:

Config:

{   
    "action_selector": "soft_policies",
    "add_value_last_step": false,
    "agent": "rnn_ns",
    "agent_output_type": "pi_logits",
    "batch_size": 10,
    "batch_size_run": 10,
    "buffer_cpu_only": true,
    "buffer_size": 10,
    "checkpoint_path": "",
    "critic_type": "cv_critic_ns",
    "entropy_coef": 0.01,
    "env": "gymma",
    "env_args": {   "key": "mpe:SimpleTag-v0",
                    "pretrained_wrapper": "PretrainedTag",
                    "seed": 343532797,
                    "state_last_action": false,
                    "time_limit": 25},
    "evaluate": false,
    "gamma": 0.99,
    "grad_norm_clip": 10,
    "hidden_dim": 128,
    "hypergroup": null,
    "label": "default_label",
    "learner": "actor_critic_learner",
    "learner_log_interval": 10000,
    "load_step": 0,
    "local_results_path": "results",
    "log_interval": 250000,
    "lr": 0.0003,
    "mac": "non_shared_mac",
    "mask_before_softmax": true,
    "name": "maa2c_ns",
    "obs_agent_id": false,
    "obs_individual_obs": false,
    "obs_last_action": false,
    "optim_alpha": 0.99,
    "optim_eps": 1e-05,
    "q_nstep": 5,
    "repeat_id": 1,
    "runner": "parallel",
    "runner_log_interval": 10000,
    "save_model": false,
    "save_model_interval": 500000,
    "save_replay": false,
    "seed": 343532797,
    "standardise_returns": false,
    "standardise_rewards": true,
    "t_max": 20050000,
    "target_update_interval_or_tau": 0.01,
    "test_greedy": true,
    "test_interval": 500000,
    "test_nepisode": 100,
    "use_cuda": false,
    "use_rnn": true,
    "use_tensorboard": true
}

Considerations

The first consideration is that I have ran experiments for both MAA2C and MAA2C_NS,
and got better results for the MAA2C.

The second consideration is the consistency of results for the Tag task, as reported: We
observe that in all environments except the matrix games, parameter sharing
improves the returns over no parameter sharing. While the average values
presented in Figure 3 do not seem statistically significant, by looking closer
in Tables 3 and 7 we observe that in several cases of algorithm-task pairs the
improvement due to parameter sharing seems significant. Such improvements can
be observed for most algorithms in MPE tasks, especially in Speaker-Listener
and Tag.

Table A groups the results for all the algorithms, minus COMA, for both
modalities for the MPE environment and shows the variation of the results. A
positive change means that the parameter sharing variation has excess of
maximum returns over the non-shared parameters.

Table A: Maximum returns over five seeds for eight algorithms with
parameter sharing (PS), without parameter sharing (NS), and the change in
excess of returns for MPE tasks.

Algorithm Task PS NS Change (%)
IQL Speaker-Listener -18.36 -18.61 1.36%
Spread -132.63 -141.87 6.97%
Adversary 9.38 9.09 3.09%
Tag 22.18 19.18 13.53%
IA2C Speaker-Listener -12.6 -17.08 35.56%
Spread -134.43 -131.74 -2.00%
Adversary 12.12 10.8 10.89%
Tag 17.44 16.04 8.03%
IPPO Speaker-Listener -13.1 -15.56 18.78%
Spread -133.86 -132.46 -1.05%
Adversary 12.17 11.17 8.22%
Tag 19.44 18.46 5.04%
MADDPG Speaker-Listener -13.56 -12.73 -6.12%
Spread -141.7 -136.73 -3.51%
Adversary 8.97 8.81 1.78%
Tag 12.5 2.82 77.44%
MAA2C Speaker-Listener -10.71 -13.66 27.54%
Spread -129.9 -130.88 0.75%
Adversary 12.06 10.88 9.78%
Tag 19.95 26.5 -32.83%
MAPPO Speaker-Listener -10.68 -14.35 34.36%
Spread -133.54 -128.64 -3.67%
Adversary 11.3 12.04 -6.55%
Tag 18.52 17.96 3.02%
VDN Speaker-Listener -15.95 -15.47 -3.01%
Spread -131.03 -142.13 8.47%
Adversary 9.28 9.34 -0.65%
Tag 24.5 18.44 24.73%
QMIX Speaker-Listener -11.56 -11.59 0.26%
Spread -126.62 -130.97 3.44%
Adversary 9.67 11.32 -17.06%
Tag 31.18 26.88 13.79%
  • Average Change (%): 7.51%
  • Total Change (%): 240.40%

More strictly, the differences are even larger when we take into account only the Tag task.

Table B: Maximum returns over five seeds for the Tag task with parameter sharing (PS),
without parameter sharing (NS), the excess of returns of PS over NS, and the change in
excess of returns for the eight algorithms.

Algorithm PS NS Excess of Returns Change (%)
IQL 22.18 19.18 3 13.53%
IA2C 17.44 16.04 1.4 8.03%
IPPO 19.44 18.46 0.98 5.04%
MADDPG 12.5 2.82 9.68 77.44%
MAA2C 19.95 26.5 -6.55 -32.83%
MAPPO 18.52 17.96 0.56 3.02%
VDN 24.5 18.44 6.06 24.73%
QMIX 31.18 26.88 4.3 13.79%
  • Average Change: 2.42875 14.09%
  • Total Change: 19.43 112.75%

Can you confirm that is indeed the case? Or point to the right direction.

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant