Changelog

0.4.0 (2024-11-01)

A significant update with various issues fixed and new enhancements. Provides scripts for automated corpus updates (see README).

Features

add stanza secondary_pipeline (1524851)

Bug Fixes

various

Documentation

various

0.3.1 (2024-06-21)

Bug Fixes

stanza pipeline fix bad var name, refactor (a43c170)

0.3.0 (2024-06-20)

Features

redo stanza pipeline (070f60a)
use stanza, deprecated fasttext for langid (2e2d5e5)
working stanza pipeline (aa52528)

Bug Fixes

add mupdf exception for pdf extraction (5f51358)
conll to vert fix mwt handling (6731d04)
improve stanza pipeline (e84e8bf)
reduce export_text chunksize to 10000 (caac056)
remove old stanza pipeline (86a77d1)
update df.applymap to df.map (2a5393f)
update gitignore (60ca73b)
update pd.Timestamp format (f00606e)
wip redo stanza pipeline (aa681b3)

Documentation

fix docstring (fc43853)
update main deps (aad5949)
update readme (5657223)
update readme (b35e30c)
update readme (fd0cc1f)

0.2.2 (2023-09-16)

Bug Fixes

add date args for export_text (ea10b93)
move corpus attributes to config yml (e0594cd)
update freeling pipeline init_locale func (7d81c49)

Documentation

update readme (0199f41)

0.2.1 (2023-07-17)

Bug Fixes

add FreeLing EN pipeline (783ac38)
add pipeline/compare_vert script (c0db3e4)
fix changelog release number (10eebe7)

0.2.0 (2023-07-07)

This release has various significant changes and is not backwards compatible with previous versions. See README.md for current workflow.

Version 0.2.0 has pipelines for building Spanish and French corpora with FreeLing. An English pipeline is currently being redesigned and will be integrated soon.

Corpora can now be built using both HTML and PDF content on ReliefWeb.

Features

added FreeLing NLP
added language identification with fastText
added PDF extraction module
pipeline/ is now used for the final steps of corpus creation

Bug Fixes

Various bug fixes and small improvements

0.1.1 (2022-12-14)

Includes various bug fixes and incremental improvements for making/managing a corpus.

Bug Fixes

corpus: drop empty vert content before insert (1c1a2c4)
corpus: export_attribute 'parameters' arg (c908447)
corpus: remove drop_attr arg (879b441)
corpus: update vertical content when outdated (2641324)
corpus: use quoteattr, fix sql query syntax (70db0c6)
corpus: vertical docstrings, 'update' arg (866eddf)
db: add _about table (e437afd)
db: add_missing_columns method (b80181a)
source: add manual override to _set_wait (56ca19c)
source: add_missing_cols & drop fields_id (f64682e)
source: date.changed:asc - he-alike params (6481199)
source: rw - replace run method with one (fc95364)
source: rw, abort insert if empty df (701fd38)
source: rw, add all, new methods (03f8ef6)
source: rw, automatically set wait (9bfc159)
source: rw, improve set limit behavior (d2ab0fd)
source: rw, set default limit to 1000 (526aacc)
source: update rw-en, rw-es API parameters (93bbc38)
source: update variables (06bdadf)
source: use SystemExit, fix if/else behavior (26e99ab)
util: add clean_xml and xml_quoteattr methods (e835d66)
util: add logging to convert.py (108b052)
util: nan_to_none return a series of [None] (a37fa62)
util: use UTC time for timestamps (9d1504e)

Documentation

corpus: standardize docstrings (dfc2ad1)
db: standardize docstrings (593c790)
source: standardize docstrings (511a1d1)
standardize docstrings (e222e06)
util: standardize docstrings (9b172e6)

0.1.0 (2022-11-29)

Initial release

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

0.4.0 (2024-11-01)

Features

Bug Fixes

Documentation

0.3.1 (2024-06-21)

Bug Fixes

0.3.0 (2024-06-20)

Features

Bug Fixes

Documentation

0.2.2 (2023-09-16)

Bug Fixes

Documentation

0.2.1 (2023-07-17)

Bug Fixes

0.2.0 (2023-07-07)

Features

Bug Fixes

0.1.1 (2022-12-14)

Bug Fixes

Documentation

0.1.0 (2022-11-29)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

0.4.0 (2024-11-01)

Features

Bug Fixes

Documentation

0.3.1 (2024-06-21)

Bug Fixes

0.3.0 (2024-06-20)

Features

Bug Fixes

Documentation

0.2.2 (2023-09-16)

Bug Fixes

Documentation

0.2.1 (2023-07-17)

Bug Fixes

0.2.0 (2023-07-07)

Features

Bug Fixes

0.1.1 (2022-12-14)

Bug Fixes

Documentation

0.1.0 (2022-11-29)