Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Needed: DataLab integration with Google BigTable, Google DataProc (Spark) #41

Open
joshreuben456 opened this issue Nov 25, 2016 · 1 comment

Comments

@joshreuben456
Copy link

We use Jupyter notebooks to access BigTable data like so:

from google.cloud import bigtable
from google.cloud import happybase
client = bigtable.Client(project=project_id, admin=True)
instance = client.instance(instance_id)
connection = happybase.Connection(instance=instance)
table = connection.table(table_name)

for key, row in table.scan:

(we then convert this in Pandas DataFrames)

In regards to DataLab and DataProc integration - Jupyter Spark integration http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step/ is a thing in Data Science - so how can we leverage DataLab notebooks over Spark jobs running on DataProc (eg stepwise pyspark job definitions, visualising job results)?

Also , how do we leverage IPython Parallel https://ipyparallel.readthedocs.io/en/latest/ and Jupyter Cluster notebook extensions in DataLab ?

@chmeyers
Copy link
Contributor

For Datalab/Dataproc integration, take a look at:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/tree/master/datalab
This is not yet completely documented, but the engineering work is in place.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants