Task 1: Binary Multilingual Machine-Generated Text Detection

Large language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as GPT-4o, Claude3.5 and Gemini1.5-pro, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated texts makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human-written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse.

In the COLING Workshop on MGT Detection Task 1, we adopt a straightforward binary problem formulation: determining whether a given text is generated by a machine or authored by a human. This is the continuation and improvement of the SemEval Shared Task 8 (subtask A). We aim to refresh training and testing data with generations from novel LLMs and include new languages.

There are two subtasks:

Subtask A: English-only MGT detection.
Subtask B: Multilingual MGT detection.

NEWS

7 Nov 2024: Dev-test and Test Gold Labels Release

Please Check the Gold Labels of Dec-test and Test Sets from Google Drive

6 Nov 2024: Results for the Test Phase are Released

Results

31 Oct 2024: Update Test Sets

Updated Test Sets! Some of our participants provide valuable feedback to us for the test sets. To improve the quality of test sets, we removed some rows, while kept the original ids. Please Check the Updated Test Sets from Google Drive

You may see score=-1 when you submit your results to Codabench. This is normal. You will see the accuracy and rank by the next mintues of evaluation phase. The last submission counts.

29 Oct 2024

We are excited to release our test sets for both English and Multilingual Track. Download Test Set from Google Drive

Note that submit by the original text order, human label = 0, machine label = 1. Looking forward to your excellent detection results!

21 Oct 2024

We extend test set release time to Oct 29, 2024

27 Aug 2024

We have released our training and dev set.

Competition

The competition id held on Codabench

Results

Official results for the test phase

Dataset

Download the training and dev sets by Google Drive or by huggingface (English and Multilingual).

English:

Multilingual:

Important Dates

All dates are AoE.

27th August, 2024: Training/dev set release
~~20th October~~ (extended to 29th Oct), 2024: Test set release and evaluation phase starts
~~25th October~~ (extended to 2dn Nov), 2024: Evaluation phase closes
~~28th October~~ (extended to 5th Nov), 2024: Leaderboard to be public
15th November, 2024: System description paper submission

Prediction File Format and Format Checkers

A prediction file must be one single JSONL file for all texts. The entry for each text must include the fields "id" and "label".

The format checkers verify that your prediction file complies with the expected format. They are located in the format_checker module.

python3 format_checker.py --prediction_file_path=<path_to_your_results_files>

Note that format checkers can not verify whether the prediction file you submit contains predictions for all test instances because it does not have an access to the test file.

Scorer and Official Evaluation Metrics

The scorers for the subtasks are located in the scorer modules. The scorer will report the official evaluation metric and other metrics for a given prediction file.

The official evaluation metric is macro f1-score. However, the scorer also reports accuracy and micro-F1.

The following command runs the scorer:

python3 scorer.py --gold_file_path=<path_to_gold_labels> --prediction_file_path=<path_to_your_results_file>

Baselines

Running the Transformer baseline:

python3 baseline.py --train_file_path <path_to_train_file> --dev_file_path <path_to_development_file> --test_file_path <path_to_test_file> --prediction_file_path <path_to_save_predictions> --model <path_to_model>

The result for the English track using RoBERTa is 81.63.

The result for the multilingual track using XLM-R is 65.46.

Organizers

Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, UAE
Iryna Gurevych, Mohamed bin Zayed University of Artificial Intelligence, UAE; Technical University of Darmstadt, Germany
Nizar Habash, New York University Abu Dhabi, UAE
Alham Fikri Aji, Mohamed bin Zayed University of Artificial Intelligence, UAE
Yuxia Wang, Mohamed bin Zayed University of Artificial Intelligence, UAE
Artem Shelmanov, Mohamed bin Zayed University of Artificial Intelligence, UAE
Ekaterina Artemova, Toloka AI, Netherlands
Osama Mohammed Afzal, Mohamed bin Zayed University of Artificial Intelligence, UAE
Jonibek Mansurov, Mohamed bin Zayed University of Artificial Intelligence, UAE
Zhuohan Xie, Mohamed bin Zayed University of Artificial Intelligence, UAE
Jinyan Su, Cornell University, USA
Akim Tsvigun, Nebius AI, Netherlands
Giovanni Puccetti, Institute of Information Science and Technology “A. Faedo”, Italy
Rui Xing, Mohamed bin Zayed University of Artificial Intelligence, UAE
Tarek Mahmoud, Mohamed bin Zayed University of Artificial Intelligence, UAE
Jiahui Geng, Mohamed bin Zayed University of Artificial Intelligence, UAE
Masahiro Kaneko, Mohamed bin Zayed University of Artificial Intelligence, UAE
Ryuto Koike, Tokyo Institute of Technology, Japan
Fahad Shamshad, Mohamed bin Zayed University of Artificial Intelligence, UAE

Contacts

Website: https://genai-content-detection.gitlab.io
Email: genai-content-detection@googlegroups.com

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
format_checker.py		format_checker.py
scorer.py		scorer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task 1: Binary Multilingual Machine-Generated Text Detection

NEWS

7 Nov 2024: Dev-test and Test Gold Labels Release

6 Nov 2024: Results for the Test Phase are Released

31 Oct 2024: Update Test Sets

29 Oct 2024

21 Oct 2024

27 Aug 2024

Competition

Results

Dataset

English:

Multilingual:

Important Dates

Prediction File Format and Format Checkers

Scorer and Official Evaluation Metrics

Baselines

Organizers

Contacts

About

Releases

Packages

Contributors 6

Languages

License

mbzuai-nlp/COLING-2025-Workshop-on-MGT-Detection-Task1

Folders and files

Latest commit

History

Repository files navigation

Task 1: Binary Multilingual Machine-Generated Text Detection

NEWS

7 Nov 2024: Dev-test and Test Gold Labels Release

6 Nov 2024: Results for the Test Phase are Released

31 Oct 2024: Update Test Sets

29 Oct 2024

21 Oct 2024

27 Aug 2024

Competition

Results

Dataset

English:

Multilingual:

Important Dates

Prediction File Format and Format Checkers

Scorer and Official Evaluation Metrics

Baselines

Organizers

Contacts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages