Web app showing the most mentioned cities on r/digitalnomad.
See scripts/README.md to see how data was downloaded and processed.
City mentions were identified by running a Spacy model on comments to determine ranges of text referring to geopolitical entities, and then querying Geonames for that text. Both steps, but especially the first, contain errors. The number of mentions and ranking of cities will change if the Spacy model is improved. Only comments were scanned, not posts.
The data includes comments from the beginning of the subreddit until around June 9, 2022. I realized after downloading the data from pushshift that pushshift can contain multiple comments with the same comment id, probably due to comment edits. My data only includes one entry for each comment id.
- react-globe.gl
- Favicon icon from icons8.com
- Lifted some code from pislagz/spacex-live