Official Implementation of On the Multi-turn Instruction Following for Conversational Web Agents (Paper, Datasets)
Use Python version <= 3.11, and install the required packages using the following command:
pip install -r requirements.txt
-
Download the MT-Mind2Web dataset and place it in the
data/
directory. -
Fill in the DATA_PATH and LOG_PATH variables as within the
src/candidate_generation/conf/config.yaml
andsrc/action_prediction/conf/config.yaml
files.data: data_path: $(DATA_PATH) train_split_file: data/train/*.json test_split_files: test_task: data/test_task/*.json test_website: data/test_website/*.json test_subdomain: data/test_subdomain/*.json
hydra: run: dir: $(LOG_PATH)
-
Set the environment variable
OPENAI_API_KEY
to your OpenAI API key.
python src/candidate_generation/train.py model=deberta-v3-base
Generate the ranks and scores for element candidates within the test-X set, where X can represent task, website, or subdomain.
python src/candidate_generation/evaluate.py\
--model_path ${MODEL_PATH}\
--data_path ${DATA_PATH}\
--split_file data/test_${X}/*.json\
--output_dir ${OUTPUT_PATH}\
Generate the pickle files for conversational action planning.
import pickle
def load_pickle(file_path):
with open(file_path, 'rb') as file:
data = pickle.load(file)
return data
def write_pickle(data, file_path):
with open(file_path, 'wb') as file:
pickle.dump(data, file)
all_dict = {"scores": {}, "ranks": {}}
for file_path in [test_task_path, test_website_path, test_subdomain_path]:
data = load_pickle(file_path)
for key in ["scores", "ranks"]:
for annotation_id, element in data[key].items():
all_dict[key].setdefault(annotation_id, {}).update(element)
write_pickle(all_dict, output_path)
torchrun --nproc-per-node 4 --master_port=10086 \
/src/action_prediction/train.py \
model=flan-t5-base \
train.per_device_train_batch_size=8 \
train.gradient_accumulation_steps=1 \
train.fsdp=False \
train.num_gpus=4 \
train.epoch=5 \
run_id="full" \
++self_map.generation=False \
++self_map.memory_simplification=False \
++self_map.memory_refinement=False \
++self_map.multifaceted_matching=False \
python src/action_prediction/evaluate.py\
+model_path=${MODEL_PATH}\
model=flan-t5-base\
+output_path=${OUTPUT_PATH}\
+top_k=50\
++self_map.generation=False \
++self_map.memory_simplification=False \
++self_map.memory_refinement=False \
++self_map.multifaceted_matching=False \
Set self_map.${generation, memory_simplification, memory_refinement, multifaceted_matching} to True to enable the corresponding module.
Our code is derived from the Mind2Web project under the MIT License.
Our MT-Mind2Web dataset is made available under the CC-BY-4.0 license.
@inproceedings{self-map,
author = {Deng, Yang and Zhang, Xuan and Zhang, Wenxuan and Yuan, Yifei and Ng, See-Kiong and Chua, Tat-Seng},
booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
doi = {10.18653/v1/2024.acl-long.477},
editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek},
pages = {8795--8812},
publisher = {Association for Computational Linguistics},
title = {On the Multi-turn Instruction Following for Conversational Web Agents},
url = {https://aclanthology.org/2024.acl-long.477},
year = {2024}
}