Reproduce https://openreview.net/forum?id=Hk6WhagRW&noteId=Hk6WhagRW , "Emergent Communication through Negotation", ICLR 2018 anonymous submission.
- install pytorch 0.2, https://pytorch.org
- download this repo,
git clone https://github.com/asappinc/emergent_comms_negotiation
python ecn.py [--disable-comms] [--disable-proposal] [--disable-prosocial] [--enable-cuda] [--term-entropy-reg 0.5] [--utterance-entropy-reg 0.0001] [--proposal-entropy-reg 0.01] [--model-file model_saves/mymodel.dat] [--name gpu3box]
Where options are:
--enable-cuda
: use NVIDIA GPU, instead of CPU--disable-comms
: disable the comms channel--disable-proposal
: disable the proposal channel (ie agent can create proposals, but other agent cant see them)--disable-prosocial
: disable prosocial reward--term-entropy-reg VALUE
: termination policy entropy regularization--utterance-entorpy-reg VALUE
: utterance policy entropy regularization--proposal-entropy-reg VALUE
: proposal policy entropy regularization--model-file models_saves/FILENAME
: where to save the model to, and where to look for it on startup--name NAME
: this is used in the logfile name, just to make it easier to find/distinguish logfiles, no other purpose
eg if we have:
000000 4:4/0 7:5/5 9:4/4
000000 4:5/0 6:1/5 7:2/4
000000 4:0/0 7:0/5 9:1/4
ACC
r: 0.91
Then:
- each of the first 4 lines here is the action of a single agent
- the
ACC
line is the agent accepting previous proposal - each proposal line is laid out as:
[utterance] [utility 0]:[proposal 0]/[pool 0] ... etc ...
- if the agents run out of time, last line will be
[out of time]
One negotation is printed out every 3 seconds or so, using the training set; the other negotations are executed silently. There is no test set for now.
Agent sociability | Proposal | Linguistic | Both | None |
---|---|---|---|---|
Self-interested, random term | >=0.80 | |||
Prosocial, random term | ~0.91 | ~0.83 | ~0.96 | >= 0.90 |
Notes:
- prosocial runs all use termreg=0.5, uttreg=0.0001, propreg=0.01
- self-interested run uses: termreg=0.05, uttreg=0.0001, propreg=0.005
Prop? | Comm? | Soc? | Rend term? | Term reg | Utt reg | Prop reg | Subjective variance | Reward | Greedy ratios |
---|---|---|---|---|---|---|---|---|---|
Y | Y | Y | Y | 0.5 | 0.0001 | 0.01 | Low | ~0.96 | term=0.7345 utt=0.7635 prop=0.8304 |
Y | - | Y | Y | 0.5 | 0.0001 | 0.01 | Medium-High | ~0.91 | term=0.6965 utt=0.0000 prop=0.8741 |
- | Y | Y | Y | 0.5 | 0.0001 | 0.01 | High | ~0.83 | term=0.6889 utt=0.7849 prop=0.8222 |
- | - | Y | Y | 0.5 | 0.0001 | 0.01 | Very low | >= 0.90 (climbing) | term=0.7781 utt=0.0000 prop=0.6006 |
Y | Y | - | Y | 0.5 | 0.0001 | 0.01 | Very High | ~0.25 | term=0.7467 utt=0.9284 prop=0.8137 |
Y | Y | - | Y | 0.05 | 0.0001 | 0.005 | Very Low | >= 0.80 (climbing) | term=0.9820 utt=0.7040 prop=0.6523 |
proposal, comms, prosocial
Three training runs, identical settings:
Proposal, no comms, prosocial
No proposal, comms, prosocial
No proposal, no comms, prosocial
Proposal, comms, no social
Run 1, same entropy regularization as prosocial graphs:
Run 2, with reduced entropy regularization:
- install pytest, ie
conda install -y pytest
, and then:
py.test -svx
- there are also some additional tests in:
python net_tests.py
(which allow close examination of specific parts of the network, policies, and so on; but which arent really 'unit-tests' as such, since neither termination criteria, nor success criteria)
Assumptions:
- running the training on remote Ubuntu 16.04 instances
ssh
access, as userubuntu
, to these instances- remote has home directory
/home/ubuntu
- logs are stored in subdirectory
logs
of current local directory - the location of
logs
relative to~
is identical on local computer and remote computer
Setup/configuration:
- copy
instances.yaml.templ
to~/instances.yaml
, on your own machine- configure
~/instances.yaml
with:- name and ip of each instance (names are arbitrary)
- the path to your private sshkey, that can access these instances
- configure
Procedure
- run:
python merge.py --hostname [name in instances.yaml] [--logfile logs/log_20171104_1234.log] \
[--title 'my graph title'] [--y-min 75 --y-max 85]
This will:
rsync
the logs from the remote instance identified by--hostname
- if
--logfile
is specified, load the results from that logfile- else, will look for the most recent logfile, ordered by name
- plots the graph into
/tmp/out-reward.png