Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up integration testing and plug in some statistics #31

Closed
wants to merge 10 commits into from

Conversation

scholtzan
Copy link
Collaborator

This sets up end-to-end testing for pensieve. I created a test project on GCP for this so that some data can be written to BigQuery. This is also useful for running metrics and statistics related tests with some actual (mock) data.

@tdsmith
Copy link
Contributor

tdsmith commented Apr 8, 2020

Interesting -- I wonder if this integration testing belongs in mozanalysis, tbh.

@scholtzan
Copy link
Collaborator Author

It's to check if tables/views get created by pensieve as expected and to see if metrics and statistics are calculated. Ideally, integration testing would be done in both projects.

pensieve/experimenter.py Show resolved Hide resolved
pensieve/analysis.py Outdated Show resolved Hide resolved
command: |
export GOOGLE_APPLICATION_CREDENTIALS="/tmp/gcp.json"
echo "$GCLOUD_SERVICE_KEY" > "$GOOGLE_APPLICATION_CREDENTIALS"
venv/bin/pytest pensieve/tests/integration/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried using a pytest marker for this, however for some reason it gets ignored. As a workaround, I moved all test that would require GCP authentication to a separate directory. This way it is easier for folks to ignore running these tests locally if they are not logged in to GCP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you mean that it gets ignored? Markers don't change what executes unless you specifically include or exclude them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it should work by adding an option -m integration or -m "not integration", but even if I set these options, all test still get executed.


table_id = f"{self.project}.{self.dataset}.statistics_{result_table}"

job = self.bigquery.client.load_table_from_dataframe(df_statistics_results, table_id)
Copy link
Collaborator Author

@scholtzan scholtzan Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, uploading STRUCT / RECORD fields from load_table_from_dataframe is only supported with pyarrow 0.17.0 which is not published on pip, yet (but should be soon).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put together a proposal for what these tables should look like.

@scholtzan scholtzan closed this Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants