Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for build with newer versions of Spark #20

Open
Dav-v opened this issue Apr 16, 2020 · 4 comments
Open

Documentation for build with newer versions of Spark #20

Dav-v opened this issue Apr 16, 2020 · 4 comments

Comments

@Dav-v
Copy link

Dav-v commented Apr 16, 2020

Hi all,
maybe this would be more appropriate on the axs-spark repo, but it's not possible to open issues there so I'm posting here.
I would like to install AXS on a standalone cluster with a more recent version of Spark (2.4.5 or even 3.0-preview), is there a documentation explaining how to prepare the distribution? I noticed that axs-spark has some branches for spark version like 2.3.0 , 2.4.3 and 3.0 preview, but there is an AXS release only with spark 2.4.0. I see that @stevenstetzler is testing an automatic pipeline to create an AXS distribution with spark 3.0-preview, would it be possible to do it manually while it is not ready?
Thanks a lot,
Davide Viero

@stevenstetzler
Copy link
Member

Yes, I'll put the instructions here and we should add documentation on the website on how to build from source as well.

Building AXS is essentially merging a few pieces of AXS into Spark before building Spark from source. You can find documentation on building Spark from source here: https://spark.apache.org/docs/latest/building-spark.html. You'll need to download and install maven, which lets you compile Java projects from source including their dependencies. From the Spark documentation: "Building Spark using Maven requires Maven 3.5.4 and Java 8"

First, clone this repository and checkout the branch/tag/commit you want

git clone https://github.com/astronomy-commons/axs.git
cd axs
git checkout master

next do the same with the axs-spark repository

git clone https://github.com/astronomy-commons/axs-spark
cd axs-spark
git checkout axs-3.0.0-preview
  1. Build AxsUtilities.jar
cd axs/AxsUtilities
mvn package # runs maven to compile the AxsUtilities project, pom.xml sets configuration for build

created jar will be in axs/AxsUtilities/target.
2) Merge axs and Spark

cp -r ./axs/axs ./axs-spark/python/. # adds python components of axs to Spark's PYTHONPATH
cp -r ./axs/AxsUtilities/target/*.jar ./axs-spark/python/axs/. # adds compiled  AXS Jar for use in Spark
  1. Build Spark from source
cd axs-spark
./dev/make-distribution.sh --name AXS-Custom-Build --tgz -Phadoop-2.7 -Pmesos -Pyarn -Phive -Phive-thriftserver -Pkubernetes

This will build Spark from source and produce a tar file (--tgz) called spark-3.0.0-preview.tgz or something like that. -Phadoop-2.7 specifies to build Spark along with Hadoop 2.7 binaries. Hadoop can be an external library on your system as well. -Pmesos -Pyarn -Pkubernetes says to build Spark with scheduling support for Mesos, Yars, and Kubernetes. -Phive -Phive-thriftserver enables support for using Hive, which AXS depends on for storage of catalog metadata.

@stevenstetzler
Copy link
Member

Also, if you don't want to go through building from source, these distributions should have Spark 3.0.0 support:

From @ctslater : https://epyc.astro.washington.edu/~ctslater/axs-spark-3.0.0-preview-axsdistfix.tar.gz
From one of our preliminary automated builds: https://github.com/stevenstetzler/axs/releases/download/v3.0.0-preview/axs-distribution.tgz

@Dav-v
Copy link
Author

Dav-v commented Apr 18, 2020

Yes, I'll put the instructions here and we should add documentation on the website on how to build from source as well.

Building AXS is essentially merging a few pieces of AXS into Spark before building Spark from source. You can find documentation on building Spark from source here: https://spark.apache.org/docs/latest/building-spark.html. You'll need to download and install maven, which lets you compile Java projects from source including their dependencies. From the Spark documentation: "Building Spark using Maven requires Maven 3.5.4 and Java 8"

First, clone this repository and checkout the branch/tag/commit you want

git clone https://github.com/astronomy-commons/axs.git
cd axs
git checkout master

next do the same with the axs-spark repository

git clone https://github.com/astronomy-commons/axs-spark
cd axs-spark
git checkout axs-3.0.0-preview
1. Build AxsUtilities.jar
cd axs/AxsUtilities
mvn package # runs maven to compile the AxsUtilities project, pom.xml sets configuration for build

created jar will be in axs/AxsUtilities/target.
2) Merge axs and Spark

cp -r ./axs/axs ./axs-spark/python/. # adds python components of axs to Spark's PYTHONPATH
cp -r ./axs/AxsUtilities/target/*.jar ./axs-spark/python/axs/. # adds compiled  AXS Jar for use in Spark
1. Build Spark from source
cd axs-spark
./dev/make-distribution.sh --name AXS-Custom-Build --tgz -Phadoop-2.7 -Pmesos -Pyarn -Phive -Phive-thriftserver -Pkubernetes

This will build Spark from source and produce a tar file (--tgz) called spark-3.0.0-preview.tgz or something like that. -Phadoop-2.7 specifies to build Spark along with Hadoop 2.7 binaries. Hadoop can be an external library on your system as well. -Pmesos -Pyarn -Pkubernetes says to build Spark with scheduling support for Mesos, Yars, and Kubernetes. -Phive -Phive-thriftserver enables support for using Hive, which AXS depends on for storage of catalog metadata.

Great, thanks a lot. It would be very useful indeed to put this on the AXS documentation pages, also because it took a while for me to find out about the existence of axs-spark, since it is not mentioned in the documentation nor in the README. The repository axs-common/axs is more visible and easier to find on Google than axs-common/axs-spark, so it could be nice to explain in the documentation the relationship between them for future users.

Also, if you don't want to go through building from source, these distributions should have Spark 3.0.0 support:

From @ctslater : https://epyc.astro.washington.edu/~ctslater/axs-spark-3.0.0-preview-axsdistfix.tar.gz
From one of our preliminary automated builds: https://github.com/stevenstetzler/axs/releases/download/v3.0.0-preview/axs-distribution.tgz

Thanks, I'll install this version then

@stargaser
Copy link

Thanks very much @stevenstetzler for posting these build instructions. I've successfully built the 3.0.0 preview at IPAC.

Two minor hiccups: The build did not work with Java 11 but it did work with Java 8. Building the yarn and mesos parts was failing, until I copied some certificates from an existing Java 8 to the openjdk that I was using with Maven.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants