Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set default for USE_SENTENCEPIECE to on #536

Closed
ugermann opened this issue Oct 30, 2019 · 13 comments
Closed

Set default for USE_SENTENCEPIECE to on #536

ugermann opened this issue Oct 30, 2019 · 13 comments

Comments

@ugermann
Copy link
Collaborator

Currently, the default for USE_SENTENCEPIECE is off. I propose to set this to on by default.

@snukky
Copy link
Member

snukky commented Oct 30, 2019

I think that compiling with SentencePiece requires protobuf, so we would need to increase minimum dependencies.

@kpu
Copy link
Member

kpu commented Oct 30, 2019

Shouldn't this be automatic based on what's installed?

@emjotde
Copy link
Member

emjotde commented Oct 30, 2019

Let's wait with this. I want to update the way we install dependencies to be consistent across Linux and Windows. CMake has nice functionality for this and can essentially create a local install of all the deps. I was planning to enable SentencePiece by default when this is done.

@ugermann
Copy link
Collaborator Author

Just a word of warning: Between version 3.5 and 3.6 of CMake, someone brilliantly decided to rename the prefix for ProtoBuf-related variables from PROTOBUF_ to Protobuf_,
cf. https://cmake.org/cmake/help/v3.5/module/FindProtobuf.html
vs. https://cmake.org/cmake/help/v3.6/module/FindProtobuf.html. Ubuntu 16.04 uses 3.5 by default.

@alvations
Copy link
Collaborator

Ubuntu 16.04 should really be in the museum and should unsupported after April 2019's end of life =)

And also cmake on default ubuntu is usually archaic and a pain with CUDA, c.f. #510 (comment)

@ugermann
Copy link
Collaborator Author

ugermann commented Nov 6, 2019

Ubuntu LTS versions are supported for five years, so EOL for U16.04 is in 2021. As if the situations wasn't chaotic enough already, cmake 3.8 upgraded CUDA from FindCUDA to a full-fleged language specification, which also raises compatibility issues and may require an overhaul of Marian's cmake setup. Well-documented an well-tested pull requests are always welcome.

@ugermann
Copy link
Collaborator Author

ugermann commented Nov 6, 2019

I think that compiling with SentencePiece requires protobuf, so we would need to increase minimum dependencies.

Apparently newer versions of SentencePiece come with built-in protobuf-lite: google/sentencepiece@7b19d68

@ugermann
Copy link
Collaborator Author

ugermann commented Nov 7, 2019

@alviations: Can you elaborate on "pain"? We use U16.04 with standard cmake 3.5 and up-to-date versions of CUDA (10.1). Running cmake3.5 with properly set -DCUDA_TOOLKIT_ROOT_DIR usually does the trick.

@alvations
Copy link
Collaborator

Most probably the pain comes when we use c++17 instead of c++11.

Somehow on our system, we use Ubuntu 18.04 and CUDA 10.1 through the default cuda docker images docker pull nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 and the default cmake there throws error about not finding the CUBLAS automatically, we didn't set the -DCUDA_TOOLKIT_ROOT_DIR but usually, just upgrading the cmake to 3.14 or 3.15 actually works well in automatically finding the CUBLAS without setting the -DCUDA_TOOLKIT_ROOT_DIR

@ugermann
Copy link
Collaborator Author

ugermann commented Nov 7, 2019

@alvations: Can you share your Dockerfile? We might be able to add that to the regression tests.

@emjotde
Copy link
Member

emjotde commented Nov 7, 2019

Also this: #526

@emjotde
Copy link
Member

emjotde commented Nov 7, 2019

I will take a look at the updated SentencePiece soon. The lighter dependency on Protobuf was actually my request, but they also changed training semantics to something weird, hence I put off merging for a while.

@snukky
Copy link
Member

snukky commented Jan 6, 2021

SentencePiece has been updated and is enabled by default. Closing.

@snukky snukky closed this as completed Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants