Skip to content

https://matthieuvion.github.io/lmd_viz/ 236k comments of Le Monde on Ukraine. A proxy to measure people' engagement. Semantic search & SBERT models testing via Sentence-Transformer / Faiss

Notifications You must be signed in to change notification settings

matthieuvion/lmd_viz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Le Monde, 1 y. of comments - Ukraine Invasion

License: MIT made-with-python

Notebook-as-article accessible here : https://matthieuvion.github.io/lmd_viz/

As a reader of Le Monde —and the comments section ;) I would regularly encounter familiar subscribers’ names. One in particular –more on that on the notebook, would manually keep track, count & cite “pro-russian” contributors, directly in the comments. That triggered my need to collect and perform some analysis in a bit more data-science oriented fashion.

You might also want to check the custom API & tools I used to build the 200k+ comments dataset —or simply download it, on the sibling repo : lmd_ukr (parquet, 40mb)

Scroll it fast (preview)


demo gif

Cool things


  • Source / analysis notebook (lmd_viz/playground.ipynb) is rendered as an html article: github.io/lmd_viz, via Quarto.
  • Most of data operations are done using Polars instead of Pandas. Made sure to include a lot of annotations for re-use.
  • Aggregations include cohort analysis and I like those viz.
  • Curated and benchmarked a few SBert models for documents embedding + Semantic search efficiently via a Faiss index.

Facts


  • Around 2% of Le Monde subscribers have engaged (as: commented) on the conflict
  • The usual authors distribution shape : hardcore posters vs. the rest. 2 people have more than 2k comments in a single year.
  • Honestly, I found that paraphrase-multilingual-mpnet-base-v2 to be very good baseline for semantic search on French content.

About

https://matthieuvion.github.io/lmd_viz/ 236k comments of Le Monde on Ukraine. A proxy to measure people' engagement. Semantic search & SBERT models testing via Sentence-Transformer / Faiss

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published