Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small bug in databricks installation script #965

Merged
merged 2 commits into from
Oct 28, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,9 +186,10 @@ SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true, -Dspark.worker.cleanup.a
* Databricks Runtime version 4.3 (Apache Spark 2.3.1, Scala 2.11) or greater
* Python 3

An example of how to create an Azure Databricks workspace and an Apache Spark cluster within the workspace can be found from [here](https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal). To utilize deep learning models and GPUs, you may setup GPU-enabled cluster. For more details about this topic, please see [Azure Databricks deep learning guide](https://docs.azuredatabricks.net/applications/deep-learning/index.html).
An example of how to create an Azure Databricks workspace and an Apache Spark cluster within the workspace can be found from [here](https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal). To utilize deep learning models and GPUs, you may setup GPU-enabled cluster. For more details about this topic, please see [Azure Databricks deep learning guide](https://docs.azuredatabricks.net/applications/deep-learning/index.html).

### Repository installation

You can setup the repository as a library on Databricks either manually or by running an [installation script](scripts/databricks_install.py). Both options assume you have access to a provisioned Databricks workspace and cluster and that you have appropriate permissions to install libraries.

<details>
Expand All @@ -200,7 +201,7 @@ This option utilizes an installation script to do the setup, and it requires add
> * Setup CLI authentication for [Azure Databricks CLI (command-line interface)](https://docs.azuredatabricks.net/user-guide/dev-tools/databricks-cli.html#install-the-cli). Please find details about how to create a token and set authentication [here](https://docs.azuredatabricks.net/user-guide/dev-tools/databricks-cli.html#set-up-authentication). Very briefly, you can install and configure your environment with the following commands.
>
> ```{shell}
> conda activate reco-pyspark
> conda activate reco_pyspark
> databricks configure --token
> ```
>
Expand Down Expand Up @@ -249,6 +250,7 @@ To install the repo manually onto Databricks, follow the steps:
cd Recommenders
zip -r Recommenders.egg .
```

3. Once your cluster has started, go to the Databricks workspace, and select the `Home` button.
4. Your `Home` directory should appear in a panel. Right click within your directory, and select `Import`.
5. In the pop-up window, there is an option to import a library, where it says: `(To import a library, such as a jar or egg, click here)`. Select `click here`.
Expand Down
7 changes: 4 additions & 3 deletions scripts/databricks_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
MMLSPARK_INFO = {
"maven": {
"coordinates": "Azure:mmlspark:0.17",
"repo": "https://mvnrepository.com/artifact"
"repo": "https://mvnrepository.com/artifact",
}
}

Expand All @@ -71,6 +71,7 @@

## Additional dependencies met below.


def create_egg(
path_to_recommenders_repo_root=os.getcwd(),
local_eggname="Recommenders.egg",
Expand Down Expand Up @@ -227,10 +228,10 @@ def prepare_for_operationalization(

# make sure path_to_recommenders is on sys.path to allow for import
sys.path.append(args.path_to_recommenders)
from scripts.generate_conda_file import PIP_BASE
from scripts.generate_conda_file import PIP_BASE, CONDA_BASE

## depend on PIP_BASE:
PYPI_RECO_LIB_DEPS = [PIP_BASE["tqdm"], PIP_BASE["papermill"]]
PYPI_RECO_LIB_DEPS = [PIP_BASE["tqdm"], CONDA_BASE["papermill"]]

PYPI_O16N_LIBS = [
"azure-cli==2.0.56",
Expand Down