-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release 4.0.0beta #2993
Release 4.0.0beta #2993
Conversation
* added release/check_wheels.py * added preamble * Update release/check_wheels.py Co-Authored-By: Radim Řehůřek <me@radimrehurek.com> * respond to review comments
* git add HACKTOBERFEST.md * clarify contributions * respond to review comments * add link to HACKTOBERFEST.md from README.md * typo * include comments from Gordon
* Probably fixes #2534 * Uppercase P * Added comment
* Disable Py2.7 builds under Travis and AppVeyor * use Py3.7.4 image under CircleCI * tweak circleci config.yml * patch tox.ini * more fixes to get docs building under tox * s/python3.7/python3/ * delay annoy ImportError until actual use * bring back Pattern * simplify invokation of pip command * add install_numpy_scipy.py * fixup * use sys.executable * adjust version in install_wheels.py * adjust travis.yml * adjust version in install_wheels.py back * add logging statements * use version_info instead of sys.version * fixup
It belongs at the top. People should see it immediately without having to scroll down to an older release.
* Change interlinks format to list of tuples. Fixes #2635 This commit fixes the issue in #2635 This commit changes the interlinks storage in the `segment_wiki` script from dictionary to a list of tuples. We can process the test wikidata used in the test suite of gensim to inspect the new behavior. ``` python gensim/scripts/segment_wiki.py -i \ -f ~/Downloads/enwiki-latest-pages-articles1.xml-p000000010p000030302-shortened.bz2 \ -o ~/Downloads/enwiki-latest.json.gz ``` We get the following output: ``` $ cat ~/Downloads/enwiki-latest.json.gz | zcat | head -1 | jq -r '.interlinks[] | [.[0], .[1]] | @TSV' | sort | head -ism -ism 1848 Revolution 1848 Revolution 1917 October Revolution 1917 October Revolution 6 February 1934 crisis February 1934 riots A. S. Neill A. S. Neill AK Press AK Press Abu Hanifa Abu Hanifa Adolf Brand Adolf Brand Adolf Brand Adolf Brand Adolf Hitler Hitler ``` All tests pass for the related test file. ``` python -m unittest gensim.test.test_scripts /Users/smishra/miniconda3/envs/TwitterNER/lib/python3.7/bz2.py:131: ResourceWarning: unclosed file <_io.BufferedReader name='/Users/smishra/workspace/codes/python/gensim/gensim/test/test_data/enwiki-latest-pages-articles1.xml-p000000010p000030302-shortened.bz2'> self._buffer = None ResourceWarning: Enable tracemalloc to get the object allocation traceback ..... ---------------------------------------------------------------------- Ran 5 tests in 6.298s OK ``` * Updated docstrings * Fixed flake8 issue of long line in docsrtring * Fixed comments and replaces assertTrue with assertEqual * Fixed unittest comment and checks for wikicorpus
* Update makefile to point to new subdirectory * Update layout.html to show new documentation sections * introduce sphinx gallery * reorganize gallery * trim tut3.rst * git add docs/to_python.py * git add gallery/010_tutorials/run_doc2vec_lee.py * minor layout tweak * add downloader api howto * add fasttext tutorial and howto * use pprint in fasttext tutorial * add summarization tutorial * git add gallery/020_howtos/run_howto_compare_lda.py * add fasttext thumbnails * adding core concepts tutorial * add summarization plot * update notebook to use 20newsgroups * update notebook * improve notebook * update howtos * fix distance metrics tutorial * improve distance_metrics.ipynb * git add gallery/010_tutorials/run_distance_metrics.py * git add gallery/020_howtos/run_news_classification.py * move downloader API to tutorials section * add docs/src/auto_examples so bindr can pick up the notebooks * minor changes * git add gallery/010_tutorials/run_lda.py * more minor changes * More minor changes * git add gallery/010_tutorials/run_word2vec.py * updated notebooks * git add gallery/010_tutorials/run_wmd.py * add image * move parts of intro.rst to core concepts tutorial * move README.txt to wiki * get rid of fasttext wrapper tutorial * update top-level heading * more minor changes * minor updates * improve Doc2Vec tutorial, move explanations from IMDB * git add gallery/020_howtos/run_doc2vec_imdb.py * git st * fix notebook paths for bindr * rename gallery to documentation * git add binder/requirements.txt * git add auto_examples/000_core/requirements.txt * adding requirements.txt for binder * removing requirements files added in desperation * update conf.py * remove temporary files from git branch * rm images * merge "getting started" into "core concepts" * add some clarifying text * add Jupyter notebook * Revert "get rid of fasttext wrapper tutorial" This reverts commit 3ec0a46. * get rid of fasttext wrapper guide * git add auto_examples/ * minor fixes * fix typo * add listing of corpora and models * get rid of binder * git add gallery/020_howtos/run_doc.py * more instructions for authorship * improve linkage between core tutorials * add highlighting * move downloader to howto * restore support and about sections * sync toolbars * Add installation instructions to top page * clean up html * add wordcloud-based thumbnails * updated notebooks * update script * add sphinx-gallery to doc dependencies * include memory_profiler in docs_testenv * git add README.rst * use proper temporary file * reorganize tutorials section * clarify version control in README.rst * git rm 020_howtos/saved_model_wrapper * move pivoted document normalization to tutorials section * fix ordering in howto section * add images * add annoy to doc dependencies * update gitignore * disable tox spinner * turn off progress bar for pip * fix labels * naming fixes * git rm docs/notebooks/gensim\ Quick\ Start.ipynb * git rm docs/notebooks/Corpora_and_Vector_Spaces.ipynb * git rm gensim\ Quick\ Start.ipynb * git rm docs/notebooks/Topics_and_Transformations.ipynb * git rm docs/notebooks/Similarity_Queries.ipynb * git rm docs/notebooks/summarization_tutorial.ipynb * git rm docs/notebooks/distance_metrics.ipynb * git rm docs/notebooks/word2vec.ipynb * git rm docs/notebooks/doc2vec-lee.ipynb * git rm docs/notebooks/gensim_news_classification.ipynb * git rm docs/notebooks/lda_training_tips.ipynb * git rm docs/notebooks/doc2vec-IMDB.ipynb * git rm docs/notebooks/annoytutorial.ipynb * git rm tutorial.rst tut1.rst tut2.rst tut3.rst * minor update to layout.html * git rm changes_080.rst * minor tweaks to gallery and surrounding docs * remove cruft from run_doc2vec_imdb.py * update doc howto * fixup * git add requirements_docs.txt * more dependencies in requirements_docs.txt * re-enable LDA howto * add missing images * add built LDA howto * port tutorials.md to gallery * WIP: cleaning up docs * language clean up + pin exact versions in doc requirements * git add redirects.csv test_redirects.py * remove gensim_numfocus namespace qualifier * doc cleanup in Other resources * fix redirects * regenerated tutorials * Added tools/check_gallery.py * committing unsuccessful attempt to fix a tutorial before deleting it * remove tutorials that don't work * index page fixes * add install anchor * Update redirects.csv * link fixes from local testing * replace easy_install with pip * renamed run_040_compare_lda.py to run_compare_lda.py * minor fixes * more fixes from website testing * updating wordcloud images * add pandas to requirements_docs.txt * !! * more dependency + code fixes * update upload path to "live" website * update test_redirects.py * git rm redirects.csv test_redirects.py
* Fix links to documentation in README.md * Update README.md
* Remove native Python implementations of Cython extensions Fix #2511 * remove print statement in tox.ini * remove print statement in tox.ini * fix flake8 issues * fix missing imports * adjust exception message * bring back FAST_VERSION variable * fixup: missing parens * disable progress bar for tox * respond to review comments * remove C/C++ sources generated from Cython files * update setup.py * remove duplicate line in setup.py * fix numpy bootstrapping * update tox.ini * handle cython dependency in setup.py * fixup in setup.py: lowercase c * more cython sourcery * fix tox.ini * Fix merge artifact in setup.py * fix merge artifact * disable pip progress bar under CircleCI
* document accessing model's vocabulary * update images
… (DTM) documentation * improve & corrected gensim documentation (#2637) * more descriptive explanation of top_chain_var
- uncomment next year
* Speed up word2vec binary model loading (#2642) * Add correctness tests for optimized word2vec model loading (#2642) * Include remarks of Radim to code speeding up vectors loading (#2671) * Include remarks of Michael to code speeding up vectors loading (#2671) * Refactor _load_word2vec_format into a few functions for better readability * Clean-up _add_word_to_result function
…orpus' is empty (#2672) * [Issue-2670] Bug fix: Initialize doc_no2 because it is not set when 'corpus' is empty * [Issue-2670] Add: unittests should fail on invalid input (generator and empty corpus) * [Issue-2670] Add: Fix unittest for generator * [Issue-2670] Fix unittest tox:flake8 errors * [Issue-2670] Fix: empty corpus def in unittest * [Issue-2670] Fix: empty corpus and generator unittests * [Issue-2670] Fix: empty corpus and generator unittests
* move install_wheels script * git add continuous_integration/check_wheels.py * bump versions for numpy and scipy * update old requirements.txt * add file header * get rid of install_wheels.py hack * fixup: update travis.yml * Update continuous_integration/check_wheels.py Co-Authored-By: Radim Řehůřek <me@radimrehurek.com> * Update continuous_integration/check_wheels.py Co-Authored-By: Radim Řehůřek <me@radimrehurek.com> Co-authored-by: Radim Řehůřek <me@radimrehurek.com>
* Find largest by absolute value * Add helper function to simplify code & add unit test for it
* force python int before calling islice. islice don't accept numpy int * add test to check islice error * it makes test to fail * make sure that islice receives a python int * fix typo
OK, that What's the "merge artifact" in 3918704 though? |
I'm not 100% sure. I resolved the merge conflicts manually, and then ran a diff to check that I got them all right, and that artifact came out. The lines themselves come from |
If I understand correctly, git added two Scary, but the diff is the definitive answer. As long as the diff is fine, we're good to go. @mpenkov can you finish the release? I'll switch the website symlink right after + tweet. Thanks. |
In my opinion, this was done exactly backwards, and has resulted in a seriously messed-up project The commit labeled "Release 4.0.0beta" has an unhelpful title & gigantic commit message & reports as touching 500+ files.
This process was something else, more complicated, hader to review and understand, for unclear benefits. Anything that was in the weird orphaned -3.8.3 branch should have been forgotten as a one-time error, not carried forward to confuse things for 4.0.0-beta. Cleanly releasing off (I can't tell if the current state of |
Mid-release, as far as I can tell. @mpenkov is fighting with the CI wheels I guess, which is a mid-release step that fails randomly and can take several hours. The step of changing the self-reported version in I already switched https://radimrehurek.com/gensim/ because I expected the release to be done by now. @mpenkov what's the ETA? Re. branches: yes, 3.8.2 was a botched up release, which led to a hot-fix release 3.8.3 which was badly merged, and now that affects 4.0.0. Ditching that orphaned branch would have been my choice too but apparently since I saw that including those orphaned commits actually auto-closed some (orphaned!) tickets… which is a bonus. |
We're still mid-release. There's a number of unforeseen issues blocking the wheel builds. It's hard for me to give an ETA because I haven't encountered these issues before, and my bandwidth during the working week is limited. The problems I can see include:
The way to proceed is, for each issue:
If either of your are able to handle any of the above, please go ahead. Otherwise, I'll poke at it when I have during the week.
@piskvorky Hang on, the 3.8.3 branch is still an orphan, right? The only trace of that in develop is the CHANGELOG, and we merged that intentionally in #2831. So yeah, we messed up 3.8.2, but I don't see how 3.8.3 was badly merged. |
@gojomo I had a look at one of the problems, and it's being caused by this:
At this stage it's probably too late to fix this in gensim (because we'll have to repeat the above release procedure) so should we bump the numpy version that we build the wheels against? If yes, then to what? |
What I mean about merges is that if Ideally, nothing that happened wrong in 3.8.2 or 3.8.3 should have any impact now, because I guess I just don't understand any necessary process that could create the giant commits/PRs as shown here in the Github web interface. (Maybe it's a Github issue, but even reading the explanation atop this issue, I can't imagine why a couple commits ago, WRT to the numpy |
It left But as long as no "bad changes" (as verified via |
Looking around, I don't think the merge did that. I'm not sure what did, but a single changelog entry wouldn't have caused this. I've identified another issue with the wheels on Windows: pip is being stupid and somehow failing to install Cython when installing the wheel:
This is despite it being able to cythonize the files prior to building the wheel. The wheel itself is getting built and uploaded, but the above error is crashing the build and preventing builds for other Python versions from proceeding.
|
Why would installing a binary wheel need Cython at all? |
That's a good question. We typically only require cython if the extensions aren't built. In this case, the extensions are getting built, so cython shouldn't be required. Yet another mystery to unravel. |
This is becoming a bit too much to manage for a single PR (that has been merged anyway), so I've opened separate tickets to deal with the immediate problems: |
Yes. Seeing how the other contributions went under the previous stewardship, I should have expected this part would be of similar quality. And focused more attention there too, prior to the release. Kudos to @gojomo for insisting on beta. |
I've solved the previous issues, but uncovered two more: piskvorky/gensim-wheels#10 Currently, all Linux test runs of the wheels are passing (except 3.7). The Windows runs are still failing. |
@mpenkov thanks a lot! What are the take aways from releasing 4.0.0beta – anything to improve in our process / documentation? Do we expect the next release go smoother? |
@mpenkov I tried on Linux, and How do I install EDIT: OSX installs EDIT2: Windows install |
Figured it out: the Linux virtualenv had py2.7, so pip installed the latest Gensim where py2.7 worked = 3.8.3. So, a feature. No problem here :) |
Yeah, I can think of two:
|
A practice I've liked on previous projects was for each auto-build to actually create the exact same distribution artifacts as a full release - and ideally, also, run the tests from an install of those artifacts. (And keep all of them around, at least for a little while.) Then, there's no separate release build or release repo: an official release is just a tiny labeling change - a version-label-bump of a few lines - then grabbing the exact same artifacts for uploading to official distribution points. |
We normally merge release branches onto master locally, using the git CLI, but there were some conflicts, and I thought it prudent to deal with them in a separate PR, where we can comment and discuss, etc.
The conflicts are caused by the non-standard way that we handled the previous release (3.8.3). Unlike regular releases, that branch off develop, the 3.8.3 release branched off the 3.8.2 tag on master (?) - the first commit was afaf76f. This branch is still around: https://github.com/RaRe-Technologies/gensim/compare/release-3.8.3.
The sole purpose of 3.8.3 was to temporarily bring back Py2.7 support to gensim (we removed it prematurely in 3.8.2). All work done on the 3.8.3 branch was related to Py2.7, so it was never merged to develop or master. Regular work continued during the release, and we did merge develop into the release-3.8.3 branch (see c3d95ab).
Shortly after the release of 3.8.3, people pointed out that it was missing from the change log on the develop branch. While this was expected for us, because we never merged release-3.8.3 into develop (see above for reason), it was confusing for people. So, we updated the develop changelog to mention 3.8.3. This is the PR here: #2831 .
I couldn't work out what the cause of the conflicts was, but there were only a small number of them:
I resolved them in favor of the release branch. You can check that the differences between release-4.0.0beta and develop are minimal, so there are no merge artifacts in those conflicted files:
I think we're ready to proceed, so: