Releases: fergiemcdowall/stopword
Even more Ukranian stopwords
What's Changed
- Update stopwords_ukr.js by @imposeren in #330
Full Changelog: v3.1.2...v3.1.3
Thanks to @imposeren for an even better Ukranian stopword-list.
Better Ukrainian stopword list
Improved stopword list for Ukrainian stopwords thanks to @imposeren
What's Changed
- build(deps-dev): bump standard from 17.1.0 to 17.1.1 by @dependabot in #326
- build(deps-dev): bump standard from 17.1.1 to 17.1.2 by @dependabot in #327
- build(deps-dev): bump rollup from 2.79.1 to 2.79.2 by @dependabot in #328
- Update stopwords_ukr.js by @imposeren in #329
New Contributors
- @imposeren made their first contribution in #329
Full Changelog: v3.1.1...v3.1.2
v3.1.1 - Swapped Danish and Swedish with versions with better licenses
- Removed CC BY-SA for Danish and Swedish with libraries that have MIT-license, so it's in line with the stopword-library.
- Updated Swedish test since one additional word was now defined as a stopword
- Updated 3rd-party.txt with the 3rd party license texts needed for the minified dist files.
Full Changelog: v3.0.1...v3.1.1
v3.0.1 Treeshaking for `module` and `browser`
package.json module
and browser
is now pointing to ./src/stopword.js
to make it tree shakeable. And jsdelivr
is pointed to ./dist/stopword.umd.min.js
to not make issues for the CDN-usage.
The stopword module has gotten bigger over the years, so making treeshaking possible will make it possible to reduse unnecessary loading of languages not used.
Non-breaking release in most cases, but a big enough change to bump it to a new major version and there might be some corner cases that breaks.
Typo in Hungarian word
amelybol -> amelyből
Thanks to @dsdenes
Bug-fix: correct module-pointer to .mjs-file from package.json
One word per line in stopword files - better diffs
- For better diff on small word-changes in existing stopword files. Minified dist-files are like before - Just as compact.
- Some name changing on 3rd party license files to maybe not confuse GitHut as to which license is used for the library.
Fixing three word-numbers in porBr - Brazillian Portugese
Changing three numbers from Portugese to Brazillian Portugese
Number | por | porBr |
---|---|---|
19 | Dezanove | Dezenove |
16 | Dezasseis | Dezesseis |
17 | Dezassete | Dezessete |
Thanks to rodfeal for spotting the error and the PR 🎉
Breaking changes! Import destructuring + 3 letter language codes + lots more
Breaking changes:
- Import destructuring (Only ESM can not use the old sw. prefix, CJS can, and UMD will work like before, if you prefer that). If you're using CJS and not defining stopword language (using default english stopword list), you should be fine.
- 3-letter ISO 639-3 language codes (swapping from ISO 639-1) - This is generally done to have the possibility for more languages, and short term more specifically to fit several sami languages.
Documentation to be almost backwards compatible:
- What to do to still use ISO-639-1 codes.
- What to do to still use sw.-prefix for function and variables (arrays of stopwords)
And lots more:
- 5 languages added (stopword lists): Ukrainian, Lithuanian, Kurdish, Malay and Gujarati (Thansk to stopwords-iso).
- Using batr for building CJS, ESM and UMD + testing (StandardJS, Playwright, AvaJS and Rollup-stuff in one devDep)
- UI-tests for demo (testing UMD) + ESM and CJS tests
- Minified builds and all licenses (stopword + 3rd party) in one file, pointed to from minified. 62 languages in 130 kb
- Numbers from 0-9 in different scripts moved to it's own "language". Numbers should be handeled by regex, like words-n-numbers can do easily, but we're keeping this as a possibility to also remove numbers 0-9.
- From TravisCI to GitHub Workflow for CI
- For testing new languages added, we're using words-n-numbers to extract words (and/or numbers)
A leaner, more structured and more robust version
- Now building CJS-, ESM- and UMD-scripts with minified alterantives.
- import/require deconstructing now possible. Old style will also work if you want the
sw.
prefix for function and arrays. - ISO-639-3 language codes (swapping from ISO-639-1). Room for more languages.
- The languages that aren't fully standard first has 3 characters that are actually following the standard followed by 2 characters in camelCase.
- lggo -> lgg. We meant the 'o' for 'official', but lgg is the official in ISO-639-3, so the unofficial is now lggNd (the language array without diacritics)
- If you want to use old codes from ISO-639-1 you could either rename on import or after import, do a i.e.
const en = eng
- Better license handling and visibility for third part libraries. There is now accumulated License file with all third party licenses listed. This is referenced in the minified scripts.
- Moved to
batr
test library (rollup + standardjs + playwright dependencies all in one) - CI-testing moved from TravisCI to GitHub workflow / actions.