This repository is the official implementation of GUI Odyssey.
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Quanfeng Lu, Wenqi Shao✉️⭐️, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo✉️
✉️ Wenqi Shao (shaowenqi@pjlab.org.cn) and Ping Luo (pluo@cs.hku.hk) are correponding authors.
⭐️ Wenqi Shao is project leader.
2024/06/24
: The data of GUI Odyssey is released! Please check out OpenGVLab/GUI-Odyssey!2024/06/13
: The paper of GUI Odyssey is released!
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
GUI Odyssey comprises six categories of navigation tasks. For each category, we construct instruction templates with items and apps selected from a predefined pool, resulting in a vast array of unique instructions for annotating GUI episodes. Human demonstrations on an Android emulator capture the metadata of each episode in a comprehensive format. After rigorous quality checks, GUI Odyssey includes 7,735 validated cross-app GUI navigation episodes.
Splits | # Episodes | # Unique Prompts | # Avg. Steps | Data location | Model |
---|---|---|---|---|---|
Total | 7,735 | 7,735 | 15.4 | GUI-Odyssey | OdysseyAgent |
Train-Random & Test-Random | 5,802 / 1,933 | 5,802 / 1,933 | 15.4 / 15.2 | random_split.json | OdysseyAgent-Random |
Train-Task & Test-Task | 6,719 / 1,016 | 6,719 / 1,016 | 15.0 / 17.6 | task_split.json | OdysseyAgent-Task |
Train-Device & Test-Device | 6,473 / 1,262 | 6,473 / 1,262 | 15.4 / 15.0 | device_split.json | OdysseyAgent-Device |
Train-App & Test-App | 6,596 / 1,139 | 6,596 / 1,139 | 15.4 / 15.3 | app_split.json | OdysseyAgent-App |
The whole GUI Odyssey is hosted on Huggingface.
Clone the entire dataset from Huggingface:
git clone https://huggingface.co/datasets/OpenGVLab/GUI-Odyssey
And then move the cloned dataset into ./data
directory. After that, the structure of ./data
should look like this:
GUI-Odyssey
├── data
│ ├── annotations
│ │ └── *.json
│ ├── screenshots
│ │ └── data_*
│ │ └── *.png
│ ├── splits
│ │ ├── app_split.json
│ │ ├── device_split.json
│ │ ├── random_split.json
│ │ └── task_split.json
│ ├── format_converter.py
│ └── preprocessing.py
└── ...
Then organize the screenshots folder:
cd data
python preprocessing.py
Finally, the structure of ./data
should look like this:
GUI-Odyssey
├── data
│ ├── annotations
│ │ └── *.json
│ ├── screenshots
│ │ └── *.png
│ ├── splits
│ │ ├── app_split.json
│ │ ├── device_split.json
│ │ ├── random_split.json
│ │ └── task_split.json
│ ├── format_converter.py
│ └── preprocessing.py
└── ...
Please refer to this.
Please refer to this to quick start.
- Dataset
- Screenshots of GUI Odyssey
- annotations of GUI Odyssey
- split files of GUI Odyssey
- Code
- data preprocessing code
- inference code
- Models
If you feel GUI Odyssey useful in your project or research, please kindly use the following BibTeX entry to cite our paper. Thanks!
@misc{lu2024gui,
title={GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices},
author={Quanfeng Lu and Wenqi Shao and Zitao Liu and Fanqing Meng and Boxuan Li and Botong Chen and Siyuan Huang and Kaipeng Zhang and Yu Qiao and Ping Luo},
year={2024},
eprint={2406.08451},
archivePrefix={arXiv},
primaryClass={cs.CV}
}