Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP SparkR Example #8240

Merged

Conversation

roelhogervorst
Copy link
Contributor

@roelhogervorst roelhogervorst commented Apr 10, 2020

Show how to schedule R, and sparkR jobs on a dataproc cluster.
The functionality to run that kind of job is already in dataproc,
but it was not so clear how to do that from Airflow.

This is a replacement for #7864


Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

Allows you to schedule R, and sparkR jobs on a dataproc cluster.
The functionality to run that kind of job is already in dataproc,
but it was not so clear how to do that from Airflow.
@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label Apr 10, 2020
Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check the R code of the job? Suggested changes allowed me to run this example. I think it also may be necessary to update the dataproc client to google-cloud-dataproc==0.7.0.

@codecov-io
Copy link

codecov-io commented Apr 10, 2020

Codecov Report

Merging #8240 into master will decrease coverage by 0.65%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8240      +/-   ##
==========================================
- Coverage   88.33%   87.68%   -0.66%     
==========================================
  Files         936      940       +4     
  Lines       45319    45359      +40     
==========================================
- Hits        40034    39774     -260     
- Misses       5285     5585     +300     
Impacted Files Coverage Δ
...ders/google/cloud/example_dags/example_dataproc.py 100.00% <100.00%> (ø)
...flow/providers/apache/cassandra/hooks/cassandra.py 21.25% <0.00%> (-72.50%) ⬇️
...w/providers/apache/hive/operators/mysql_to_hive.py 35.84% <0.00%> (-64.16%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0.00%> (-55.56%) ⬇️
airflow/providers/postgres/operators/postgres.py 47.82% <0.00%> (-52.18%) ⬇️
airflow/providers/redis/operators/redis_publish.py 50.00% <0.00%> (-50.00%) ⬇️
airflow/kubernetes/volume.py 52.94% <0.00%> (-47.06%) ⬇️
airflow/providers/mongo/sensors/mongo.py 53.33% <0.00%> (-46.67%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0.00%> (-45.08%) ⬇️
airflow/providers/mysql/operators/mysql.py 55.00% <0.00%> (-45.00%) ⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e945439...933cd6e. Read the comment docs.

@turbaszek turbaszek merged commit efcffa3 into apache:master Apr 16, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 16, 2020

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants