1School of Computer Science, Fudan University
2Meituan, China
3Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. It takes test requirements written in natural language as input, generates and conducts UI interactions, and verifies whether the UI response aligns with the expectations outlined in the requirements.
To enhance the performance of LLM-based agents in the domain-specific area of UI testing, AUITestAgent decouples GUI interaction and function verification into two separate modules, performing verification after the interaction.
In terms of implementation, AUITestAgent extracts GUI interactions from test requirements using dynamically organized agents to tackle the diversity of requirement expressions. Then, a multi-dimensional data extraction strategy is employed to retrieve data relevant to the test requirements from the interaction trace and perform verification.
Task: View the rating of the first scenic spot in the scenic view, check whether its rating is consistent
demo1.mp4
Task: Send a post with content 'Hello everyone' and like it, check whether it is correctly displayed, and whether the like button turns blue
demo2.mp4
We evaluate AUITestAgent’s performance with two customized benchmark, interaction benchmark and verification benchmark, including 8 widely used commercial apps (i.e., Meituan, Little Reb Book, Douban, Facebook, Gmail, linkedIn, Google play and YouTube Music). To provide a comprehensive assessment, we categorized the difficulty of interaction tasks into three levels: easy (L1), moderate (L2), and difficult (L3). For each level, we constructed ten interaction tasks, with descriptions evenly split between English and Chinese.
Our experiments reveal that AUITestAgent accurately completes 100% tasks at Level 1, 80% of Level 2 tasks, and 50% of Level 3. Additionally, 94% of the interactions generated by AUITestAgent align with the ground truth through manual interactions. These metrics demonstrate that AUITestAgent significantly outperforms existing methods in translating natural language commands to GUI interactions. Moreover, AUITestAgent achieves a recall of 90% for injected GUI functional bugs while maintaining a low false positive rate of just 4.5%. Furthermore, its success in detecting unseen bugs in Meituan underscores the practical advantages of using AUITestAgent for GUI testing in complex commercial apps.
For detail information, please refer to our paper and evalution results.
For detail results, please refer to the interaction benchmark.
Baseline:
For detail results, please refer to the verification benchmark.
Since AUITestAgent is the first to focus on natural language driven GUI function verification and there are no existing studies in this field, we constructed a verification method based on multi-turn dialogue using GPT-4o as a baseline.
If you find this work helpful to your research, please kindly consider citing our paper.
@misc{hu2024auitestagent,
title={AUITestAgent: Automatic Requirements Oriented GUI Function Testing},
author={Yongxiang Hu and Xuan Wang and Yingchuan Wang and Yu Zhang and Shiyu Guo and Chaoyi Chen and Xin Wang and Yangfan Zhou},
year={2024},
eprint={2407.09018},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
AUITestAgent is joint work from Prof. Zhou’s team at Fudan University and the Meituan In-Store R&D platform. We have long been dedicated to the field of AI for full-stack front-end technology. In addition to AUITestAgent, we have developed several other technological innovations, including vision-ui, Appaction and AutoConsis.