-
Notifications
You must be signed in to change notification settings - Fork 116
Tasks for loading GA data into Snowflake (PART 1) #721
Conversation
b51a455
to
35301c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not done a full review, but I expect the S3 changes to not work generally. Run acceptance tests, for example.
Codecov Report
@@ Coverage Diff @@
## master #721 +/- ##
==========================================
- Coverage 75.18% 75.18% -0.01%
==========================================
Files 203 203
Lines 22888 22911 +23
==========================================
+ Hits 17209 17226 +17
- Misses 5679 5685 +6
Continue to review full report at Codecov.
|
35301c3
to
d8d2b04
Compare
Currently running acceptance tests against this branch: http://jenkins-ci.analytics.edx.org/view/ad-hoc/job/edx-analytics-pipeline-acceptance-test-manual/1727/console |
Acceptance tests passed. One test failed initially, but it was apparently flaky because I re-ran that module and it passed: http://jenkins-ci.analytics.edx.org/view/ad-hoc/job/edx-analytics-pipeline-acceptance-test-manual/1729/console |
@@ -162,3 +159,29 @@ def open(self, mode='r'): | |||
if not hasattr(self, 's3_client'): | |||
self.s3_client = ScalableS3Client() | |||
return AtomicS3File(safe_path, self.s3_client, policy=DEFAULT_KEY_ACCESS_POLICY) | |||
|
|||
|
|||
def canonicalize_s3_url(url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to provide some unit tests for this, since you're modifying test_s3_util.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of my remaining suggestions are optional. 👍
d692598
to
d5719e3
Compare
it's actually good that I added the canonicalize_s3_url unit test because it caught what would have been a major flaw: it would have rejected "s3" as a URL scheme due to a code typo. |
d5719e3
to
d3c5ac1
Compare
This is part 1 of the GA loading pipeline which DOES NOT depend on a Luigi upgrade. DE-1374 (PART 1)
d3c5ac1
to
9f64895
Compare
This is the first part of the work to add a pipeline to load GA360 data into Snowflake.
This part of the code change DOES NOT depend on a Luigi upgrade, and is ready for review/merging now.
Other PRs:
Analytics Pipeline Pull Request
Make sure that the following steps are done before merging: