Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.16 KB

README.md

File metadata and controls

33 lines (26 loc) · 1.16 KB

lrl

For script work concerning low resource languages. This does not include visualisations, Semantic Web data, or other random scripts that can be found in my other repositories.

facebook-scraper

The Facebook scripts in here are for non-automatically harvesting data from Facebook groups, using manual AJAX querying and saving the source from the browser. It is not an automatic data collection scheme, nor a scraper, which makes it legal (afaik). A paper based is currently in progress.

maltese-dict

I have developed a GUI and terminal-side dictionary program based on word lists I have access to; one from the internet, and a cleaned-up copy available via METASHARE on a CC BY-NC-SA license. I will presumably keep working on this throughout my time in Malta.

maltese-*

In development. These are for courses at the University of Malta. One is a stemmer, based on the NLTK stemmers (Snowball, ISRI). One is a broken plural noun morphological analyser, based on previous work by Farrugia. The other is a chunker and basic code switching identifier, based on the work done in facebook-scraper and on Fabri's theoretical research on Maltese compounds.