Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix kernel generation for Spark Yarn // TOREE-97 #141

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions etc/pip_install/toree/toreeapp.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@

PYTHON_PATH = 'PYTHONPATH'
SPARK_HOME ='SPARK_HOME'
HADOOP_CONF_DIR = 'HADOOP_CONF_DIR'
SPARK_CONF_DIR = 'SPARK_CONF_DIR'
TOREE_SPARK_OPTS = '__TOREE_SPARK_OPTS__'
TOREE_OPTS = '__TOREE_OPTS__'
DEFAULT_INTERPRETER = 'DEFAULT_INTERPRETER'
Expand All @@ -57,6 +59,12 @@ class ToreeInstall(InstallKernelSpec):
spark_home = Unicode(os.getenv(SPARK_HOME, '/usr/local/spark'), config=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, do we actually want to use default values here? I am assuming we don't really have a standard default place to deploy Spark/Hadoop and maybe it would be better to use the env variables if they are available, otherwise, ignore or in case of required ones throw an error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're set to be empty, it's just like they're not set -- so it's true that probably they're better empty.

help='''Specify where the spark files can be found.'''
)
hadoop_conf_dir = Unicode(os.getenv(HADOOP_CONF_DIR, '/usr/local/hadoop'), config=True,
help='''Specify where the hadoop config files can be found.'''
)
spark_conf_dir = Unicode(os.getenv(SPARK_CONF_DIR, '/usr/local/spark'), config=True,
help='''Specify where the spark config files can be found.'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default value actually be /usr/local/spark/conf ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, it has to be the place where spark-env.sh is found.

)
kernel_name = Unicode('Apache Toree', config=True,
help='Install the kernel spec with this name. This is also used as the base of the display name in jupyter.'
)
Expand Down Expand Up @@ -105,6 +113,8 @@ def create_kernel_json(self, location, interpreter):
TOREE_SPARK_OPTS : self.spark_opts,
TOREE_OPTS : self.toree_opts,
SPARK_HOME : self.spark_home,
HADOOP_CONF_DIR : self.hadoop_conf_dir,
SPARK_CONF_DIR : self.spark_conf_dir,
PYTHON_PATH : '{0}/python:{0}/python/lib/{1}'.format(self.spark_home, py4j_zip),
PYTHON_EXEC : self.python_exec
}
Expand Down