-
Notifications
You must be signed in to change notification settings - Fork 116
Tasks for loading GA data into Snowflake (PART 2) #722
base: master
Are you sure you want to change the base?
Conversation
2064e1e
to
4f49352
Compare
b51a455
to
35301c3
Compare
4e568e8
to
ce63f5a
Compare
35301c3
to
d8d2b04
Compare
ce63f5a
to
3fae201
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments.
@@ -4,23 +4,23 @@ | |||
-r base.txt | |||
|
|||
argparse==1.2.1 # Python Software Foundation License | |||
boto3==1.4.8 # Apache 2.0 | |||
boto3==1.9.131 # Apache 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for choosing 1.4.8, as I recall, was that it was the smallest change needed to get support for regions in Europe, that open-source users needed.
graphitesend==0.10.0 # Apache | ||
html5lib==1.0b3 # MIT | ||
isoweek==1.3.3 # BSD | ||
numpy==1.11.3 # BSD | ||
paypalrestsdk==1.9.0 # Paypal SDK License | ||
psycopg2==2.6.2 # LGPL | ||
psycopg2==2.8.1 # LGPL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what actually requires this, but I assume it's for vertica-python. If so, perhaps it doesn't need to be pinned in the *.in file, but only pinned in *.txt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it were for vertica-python, why would that make pinning this in default.in not required?
I only upgraded it because I literally could not run make upgrade
to even generate the *.txt files on Ubuntu 18.04 due to a bug somewhere in the dependency resolution in pip-tools or in some random setup.py.
""" | ||
Overriding so that we can pass `self.fs` to the new marker. | ||
""" | ||
return GCSMarkerTarget(path=path, client=self.fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear what this is doing. We get a GCSMarkerTarget somehow, and then use it to clone other GCSMarkerTarget? Some kind of "clone" operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I introduced new_with_credentials()
to MarkerMixin to allow markers to construct new markers (something we already do) but optionally also passing along any credentials which would otherwise get dropped in the case of GCSMarkerTarget. The result is typically a marker target which is the exact same in every way, except with /_SUCCESS
appended to the path
attribute which, in retrospect, should never exist as an object and I probably should not have built on top of that functionality.
The more correct thing to do would be to construct a new target object which is the non-marker analogy of the current class. What do you think is the best way to handle this?
667625d
to
00e65aa
Compare
d3c5ac1
to
9f64895
Compare
This is part 2 of the GA loading pipeline which DOES depend on a Luigi upgrade. DE-1374 (PART 2)
DE-1374 (PART 2)
00e65aa
to
38e313a
Compare
This PR is now really old. You have a few more old PRs in flight: https://github.com/pulls?q=is%3Aopen+is%3Apr+archived%3Afalse+author%3Apwnage101+sort%3Aupdated-asc+org%3Aedx+org%3Aopenedx Do you want to keep them all open? |
This is the second part of the work to add a pipeline to load GA360 data into Snowflake.
This part of the code change DOES depend on a Luigi upgrade, and is NOT ready for merging until we have done the following things:
luigi>=2.7.6
google-cloud-bigquery==1.11.2
Other PRs:
Analytics Pipeline Pull Request
Make sure that the following steps are done before merging: