Skip to content

zeyadusf/DAIGT-Catch-the-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

DAIGT | CATCH THE AI

I am pleased to present to you our graduation project that includes 9 member. "AI-generated media (Audio, Image, and Text) Detection"  or as we called it, "Catch the AI".

Catch The AI: It is a complete system that enables you to have your account to save what you have recorded from previous media. With this system, you can display it on different media (text, audio, and image), each of which is self-contained.
Register now to be able to catch the AI:

  • Demo : catchtheai
  • Main Repo on github : private repo
  • Can use all models RoBERTa,DeBERTa,DistilBERT,BERT,FeedForwardWithRoBERTaDeBERTa of DAIGT on SpaceHuggingface :DAIGT Space

image


🎯 About

Detect AI Generated Text (DAIGT).

This is part of our graduation project, I was working on this project on the Detect AI-Generated Text DAIGT model.
about this repo Here I present to you everything related to my work on the text model, from the stages of data collection to testing some models to the final model...

↪️ Problem definition

     One of the goals of Large Language Models “LLMs” is to create texts similar to what humans write and after the many LLMs currently available and ready for use, such as GPT 4, Gemini, etc. The open-source models that you can train on your data and your task, such as Mistral. And as we use them a lot in many tasks, it will become difficult to differentiate between these texts and their authors.

     This will affect many things, such as Students using LLMs for homework; this will affect their academic level, and it will be difficult for the teacher to determine the level of his students, and his estimation of these students will be wrong. Another example that we encountered during the data collection stage is the lack of trust in articles. We always wondered who wrote this article. If we consider that the one who wrote it was one of the LLMs, and our suspicion was wrong, this reduces the quality of the data that the model will be trained on.

These are some of the simple examples and cases in which you want the DAIGT model to intervene to help you, and in “Catch The AI”.


↪️ DAIGT Solution

     After many stumbles and experiments to obtain data and a suitable architecture capable of achieving our goals of building a robust and generalized model. We trained many models, including from scratch Bi-LSTM, Conv1D, etc., and we used different tokenizers in them, such as ELMo model, BERT-tokenizer and pre-trained models such as Mistral-7B, BERT, DistilBERT, RoBERTa, and DeBERTa.

  • Here is an explanation of the latest architecture:
         In the DAIGT model, we relied on two models that proved their efficiency in some of the data that they were not trained on RoBERTa and DeBERTa. Therefore, we decided to use them together and create an ensemble technique through the Feedforward Layer 'ReLU activation function', consisting of 32 neurons that were trained on the outputs coming out of RoBERTa and DeBERTa.

image image

❓ How can you catch Ai-Generated Text?

⏲️ Time-Line : Timeline  All results and notebooks can be accessed here.

🗞️ All details about final version is here: Documant of text model

🔗Links of Notebooks and dataset 'final version':

Data was collected from different areas on Kaggle and HuggingFace, and you can access it through this link:


CATCH THE AI Team

We did not just work as a team, but we were a family. These people are truly skilled and creative. Follow them and wait for their wonderful projects, from which they learn a lot and benefit a lot of people.❤️

Romani Nasrat Shawqi Abdalla Mohammed
Mohannad Ayman Mohammed Abdeldayem
Ahmed Abo-Elkassem Sara Reda
Reham Mostafa Rawan Aziz

📞 Contact :

Gmail Facebook Instagram LinkedIn GitHub kaggle

ا

Releases

No releases published

Packages

No packages published