This repository has been archived by the owner on Dec 14, 2023. It is now read-only.
forked from browsermt/bergamot-translator
-
Notifications
You must be signed in to change notification settings - Fork 10
Attempt to generate artifacts through CircleCI #39
Closed
jerinphilip
wants to merge
33
commits into
mozilla:main
from
browsermt:tgt-mozilla-collapse-bindings
Closed
Attempt to generate artifacts through CircleCI #39
jerinphilip
wants to merge
33
commits into
mozilla:main
from
browsermt:tgt-mozilla-collapse-bindings
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Switch to wasm branch for this example * Load marian model from a byte array * Sanitise executable names * Change marian branch * Update marian branch that loads binary models * Example of loading model as a byte array * Add the byte array loading files * Die on misaligned memory * Remove the unused argument * Allow loading without a ptr parameter so that we don't break emc workflow
* Merging two Services * Moving stop() logic to destructor * We have WITH_PTHREADS back * string based constructor on Service * Removing now empty service_base.* files * Hiding away pcqueue_ construction Ugliest ifdefs I have done in my life. * Another ifdef to hide pcqueue header file * Missing semicolons in WITH_PTHREADS path * Fixing async_translate residue argument from copy * Adding comments * Initialize batchtranslator only at one place To reduce tax for bytebuffer loads, initialize batchtranslator only at one place. * \#ifdef WITH_PTHREADS -> #ifndef WASM_HIDE_THREADS Sane platform (non WASM) is default. This truly only hide-threads from compilation path and not switch unswitch pthreads (-lpthread). * Review comments: Rearranging destructor, fix wrong comment * Move loadVocabularies to service.cpp and put in anonymous namespace * Prettifying diff: Removing unwanted empty lines * Indicate in comments multithreaded has numWorkers translators * Typo fix: bergamot_translator -> bergamot-translator * Safety guards to avoid pcqueue illegal init * Add WASM_HIDE_THREADS as a global WASM_COMPILE_FLAG * Compile Defs: WASM_HIDE_THREADS -> __EMSCRIPTEN__ * Removing dead CMakeLists.txt code following __EMSCRIPTEN__ * Compile defs: __EMSCRIPTEN__ -> WASM
* A script to patch the wasm artifacts to use wormhole via APIs that instantiate WASM module * Updated README * Load just production ready models * Shallow clone bergamot-models repo since it has such a large history * Improved wasm test_page - test page can load all 5 language pairs - Use intgemm.alpha* models * Refactor the code that patches wasm artifacts to enable wormhole Co-authored-by: Andre Natal <anatal@gmail.com> Co-authored-by: Motin <motin@motin.eu>
Adds doxygen configurations, additional sphinx which consumes the doxygen files to generate developer API, compatible with marian-nmt/marian-dev.
- Earlier it was using 'wasm' branch - CMakefile changes - Github workflow change
- Naming follows <target-arch>-<nature-of-marian>-<runner-os> (wasm|native)-(full_marian|custom_marian)-(ubuntu|mac)
* Draft adjustments to API * Adjustments to docs * Let's call the word + sentence ranges annotations * Editing confusing comment on size() * Fixing compilation for template adjustments for SentenceRanges * string_view template hacks This commit shifts AnnotatedBlob into a templated type and gets the troubled part to compile. All to manage absl::string_view and std::string_view. Objective: marian::bergamot stays C++ 11 to pluck and put in marian code, bergamot-translator somehow flexes C++17. Simplify development in one place. * Fixing the wiring: Gets source to build Runtime errors exist, but AnnotatedBlobs are consistent. * Bugfix: Matching old-state after factoring AnnotatedBlob in * Removing vocabs_ from Response. (For the umpteenth time). * Alignment API ready in marian::bergamot::Response * Wiring alignments upto TranslationResult * Adjustment to get alignments; bergamot-translator-app has alignments available * Accessing words instead of Ids This code sets up access of word string_views from annotations instead of printing Ids. However, we have segfault. This is likely due to targetRanges not being set, pending from #25. Could also be a rogue EOS token which we're filtering for in string_view annotations, but not so in alignments. * Switching to browsermt/marian-dev@jp/decode-string-view for targetTokenRanges * Target word byte range annotations available Issues corresponding to #25 should be resolved. There is still a segfault. Could be due to EOS. Pending investigation. * Bugfix: Tokens for alignments are now through. Was not EOS. * browsermt/marian-dev@master ByteRange changes work downstream and has been merged to master. Updating submodule to point to master. * Style and documentation enhancements: response.cpp * Style and documentation enhancements: TranslationResult.h * Descriptions for SentenceRanges templating * Switching to marian-dev@wasm-sync * AnnotatedBlob can be copy-ctord/copy-assigned * TranslationResult: Empty ctor + WASM Bindings Allows empty construction of TranslationResult. Using this empty constructor, WASM bindings are adjusted. Unsure of the results, maybe @abhi-agg can test. * Cosmetic: SentenceRangesT -> Annotation - SentenceRangesT is renamed to AnnotationT; - Further comments to explain heavily templated files. * Response: Cleaning up unused members and adding docs * Adding quality scores - attempt * Stub QualityScores This adjustment adds capability to get "scores", which should potentially indicate how confident (at least relative in a target-sentence) should be. This enables writing the code forward for TranslationResult, and an example quality-score people can be pointed at. - These are not between [0,1] yet. - In addition, guards to check out-of-bounds access have been placed so illegal accesses are caught early on during development. * Removing token debug statements * Reworking Annotation without templates mozilla#8 provides ByteRanges. - This ByteRange data-type is used in Annotation and converted to marian::string_view(=absl::string-view) on demand. - Since Annotation[using ByteRange] is not bound to anything else, it can be unit tested. A unit test is added (originally to test independently for integration after). - Annotation with ByteRange is now propogated across marian::bergamot and functionality matched to how it was previously working. This eliminates the string-view conversion and template code. * Nit: Removing std::endl flushes * Bring TranslationResult and Response closer Helps #53. In preparation , the data-export types for Quality and Alignment are pushed down to Response from TranslationResult and computed during construction. This brings TranslationResult closer to Response, paving way to avoid having two TranslationResults. histories_ only remain for marian-decoder replacement usage, which can be removed in a separate PR. * Clean up hacks originally added for a unit-test to compile * Moving Annotation functions to cpp and documenting header file * Shifting alignments, qualityScore testing capability into main-mts * Restore Unified API files to previous state * Adaptations to fix Response with Quality, Alignments to connect to old Unified API * Missing reset on TranslationResultBindings * Cleaning up Response documentation to reflect newer code * Minor adjustments to get build back after main sync * Marian seems to make available Catch somehow * Disable COMPILE_BERGAMOT_TESTS for WASM * Add COMPILE_BERGAMOT_TESTS as a CMakeDependent option * Use the COMPILE_TESTS flag instead to skip macos.yml * Trigger unit-tests on GitHub runners for Annotation * Reordering enable_testing() to before inclusion of test directory * doc constructs required to operate with alignments Documents with doxygen compatible documentation for Response, AnnotatedBlob, Annotation, ByteRange. Incorporates doxygen compatible documentation for * Updates ByteRange consistent with general C++ Also little documentation enhancements in the process. * Updating marian-dev@9337105 * Copy-paste documentation because lazy * Turn off autoformat and manually edit to fix style changes * AnnotatedBlob -> AnnotatedText; blob -> text * text.text in test app renamed * text of text -> blob of text in places of documentation
* Try to fix gcc missingness in CI
* Updated marian-dev submodule - cmake changes required after the submodule update * Added workflows for building custom marian on mac and ubuntu * Renamed cmake option - Renamed USE_WASM_COMPATIBLE_SOURCES to USE_WASM_COMPATIBLE_SOURCE - Use proper compile defnitions
Contains "hack" that must go immediately by editing TranslationModel, to come in following commit. * add shortlist_memory and update service-cli-bytearray test * update marian-dev * address review comments * fix ccompliation and tests failures and further address review comments * small update on marian-dev (based on browsermt/marian-dev PR#28) * update marian-dev with upstream * code refactoring according to review * fix marian-dev submodule conflicts * switch MemoryGift to AlignedVector * copy aligned.h from kpu/intgemm for AlignedVector * changes based on memory ownership and AlignedVector * fix BatchTranslator inits * small fixes according to review comments * update submodule marian-dev to master * update submodule marian-dev with upstream Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
* Removes vocabs and propogates fixes for breaks * Prettify diff: Undoing comment shuffles due to merge conflict edits * 20% of time actual work, 80% prettifying diff * Histories members -> poof! We however have Histories in constructor, which we will remove out of the way soon. Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
* Consistent api between the two versions of the executables in app folder * Remove shared ptrs
* Rudimentary validator for binary files
* Changing Annotation to adhere to [begin, end) * Stronger unit tests on sentences + num words, num sentences * Hotfix with empty string view from EOS * No more absolving empty-sentence; Added tests now defined behaviour * Uncommenting important section in unit test * Ensure empty string view default, beginning at end so marker points * Further strengthen and comment unit-tests, mark exactly where empty sentence is happening * Review comments: Dummy sentence + docs - What should be a simple fast accessor is turning into compute. Normally the way to deal with this, for better or worse, is to put 0 at the beginning of sentenceEndIds_. (Putting 0 at the beginning of sentenceEndIds_) - Indices into what? Mentioned to be flatByteRanges_. * Documentation updates * More changes to docs Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
@jerinphilip Can we discard and close this PR? |
@abhi-agg leave it open until collapse is complete I can reuse this with force pushes to keep generating circleCI artifacts. |
Adds regression-tests to the workflow for native minimal/custom marian and full builds. Co-authored-by: abhi-agg <66322306+abhi-agg@users.noreply.github.com>
Co-authored-by: Kenneth Heafield <kpu@users.noreply.github.com>
- AlignedMemory is AlignedVector<char> now instead of AlignedVector<const void*> - This solves the issue of allocating 8x of the actual required memory for loading files as bytes
Windows still failing but getting closer
Fixes #101. ResponseBuilder is called with empty histories to trigger a valid but mostly-empty response.
* Control validating the config options via a boolean flag - parseOptions() function now validates the parsed options based on the validate argument * Minor syntactic fix
* Bindings to load model and shortlist files as bytes * Modified wasm test page for byte based loading of files * Updates wasm README for byte loading based usage of TranslationModel
4 tasks
jerinphilip
force-pushed
the
tgt-mozilla-collapse-bindings
branch
from
April 29, 2021 12:58
11fd195
to
f2939c3
Compare
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sorry about any spam. Please ignore this PR (the source will be reviewed at parent).
mozilla-extensions/firefox-translations#83 seems to be using a preconfigured artifact from a shell-script. I can update the shell-script to use the CI system if I am able to generate the artifact (I hope). Syncing a few draft changes to use the automation here to generate artifacts and update the download scripts over there.