ActPlan-1K

This is the code repo for EMNLP 2024 main conference paper: ActPlan-1K: Benchmarking the Procedural Planning Ability of Visual Language Models in Household Activities.

Counterfactual Activity Definition

The definition is based on BDDL language. It is an extention of Behavior100. Our dataset redefines the activity based on the seed activities in Behavior100, utilitizing the annotation tool and annotation interface.

Details of the annotation steps:

For each activity in Behavior100, translate the BDDL description into natural language.
Given each activity, request ChatGPT for specific procedures and situated circumstances which might happend during the process. The prompting contexts are the folder chatgpt/
Ground the situated circumstance in igibson enviorment and annotate initial and goal description with tool. It will generate a new BDDL case. Or directly modify the normal activity BDDL file to build a counterfactual activity (careful check over it is required as it will later be used to generate scene instances for image collection)
Convert the bddl description into natural language task description, as prompting context.

The collected counterfactual activity definitions are placed under folder ./bddl/activity-definitions

Multimodal Dataset Collection

Besides the natural language task description from BDDL files, vision information of the enviroments are another key part. To acquire vision information, images covering the main contents of the activity in the environments are collected. Detailed procedures are as follows:

For counterfactual activities, scene instances are first sampled with the activity-definitions in last step, by following the instructions. The sampled results are in urdf file forms.

For normal activities, we use the predefined activities in Behavior100. The sampled scene instances can be directly downloaded from iGibson2 data.
With the sampled counterfactual activity and downloaded normal activity urdf instances, then load them in iGibson2 simulator follow the example in iGibson sample loader.
Record vedios when touring the house after loading the scene instances. Select images that cover the main contents from the recorded image collections. The sampled images will be used as additional visual input in prompting.

An example ActPlan-1K instance is under folder ./annotation. Beechwood_0_int/assembling_gift_baskets/0 contains normal activity and Beechwood_0_int/assembling_gift_baskets/1 contains counterfactual activity. The full dataset including all annotations and sampled urdfs for counterfactual activities are released and can be downloaded.

Auto Evaluation

With the natural language description and selected image set, we prompt VLMs(e.g., GPT-4V, Claude, Gemini-pro-1.5) to generate procedural plans. The generated plans and gold plans are compared by both human metrics and automatic metrics. We provide two automatic evaluation metrics: longest common subsequence(LCS) and finetuned BLEURT score.

1. LCS

Details are in folder ./auto_lcs

2. Finetuned BLEURT score

Details are in folder ./bleu-cls

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
annotation/Beechwood_0_int/assembling_gift_baskets		annotation/Beechwood_0_int/assembling_gift_baskets
auto_lcs		auto_lcs
bddl/activity_definitions		bddl/activity_definitions
bleu-cls		bleu-cls
chatgpt		chatgpt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ActPlan-1K

Counterfactual Activity Definition

Multimodal Dataset Collection

Auto Evaluation

1. LCS

2. Finetuned BLEURT score

About

Releases

Packages

Languages

License

HKUST-KnowComp/ActPlan-1K

Folders and files

Latest commit

History

Repository files navigation

ActPlan-1K

Counterfactual Activity Definition

Multimodal Dataset Collection

Auto Evaluation

1. LCS

2. Finetuned BLEURT score

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages