Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 1.23 KB

File metadata and controls

5 lines (3 loc) · 1.23 KB

Multilingual-News-Article-Similarity


In this paper, we describe our system entry for SemEval 2022 Task 8 which is on Multilingual News Article Similarity, where we leverage the knowledge of pre-trained language models to evaluate the Overall Similarity between a given pair of Articles. In our system, we use a Sentence transformer based approach to estimate the contextualized embeddings, on which we apply the Cosine similarity followed by renormalisation, to get the final score. We further finetune the Model using the Cosine Similarity Loss (details of which is provided in Section 3) on the provided dataset. We also try to leverage the metadata provided with the Articles, by concatinating 'Title' with the textual content, so as to improve the performance. We evaluate the model performance using the Pearson Correlation Score in both Multilingual and Translated to English settings. Our proposed approach using the Multilingual Setting is ranked 19th in the official SemEval 2022 Task 8 Leaderboard with a Pearson correlation score of 0.721. In addition to our final approach, we also discuss some other approaches we experimented on, before arriving at our final model, in Section 4.