In this paper, we describe our system entry for SemEval 2022 Task 8 which is on Multilingual News Article Similarity, where we leverage the knowledge of pre-trained language models to evaluate the Overall Similarity between a given pair of Articles. In our system, we use a Sentence transformer based approach to estimate the contextualized embeddings, on which we apply the Cosine similarity followed by renormalisation, to get the final score. We further finetune the Model using the Cosine Similarity Loss (details of which is provided in Section 3) on the provided dataset. We also try to leverage the metadata provided with the Articles, by concatinating 'Title' with the textual content, so as to improve the performance. We evaluate the model performance using the Pearson Correlation Score in both Multilingual and Translated to English settings. Our proposed approach using the Multilingual Setting is ranked 19th in the official SemEval 2022 Task 8 Leaderboard with a Pearson correlation score of 0.721. In addition to our final approach, we also discuss some other approaches we experimented on, before arriving at our final model, in Section 4.
-
Notifications
You must be signed in to change notification settings - Fork 0
abhinav-bohra/Multilingual-News-Article-Similarity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published