Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove hard-coded integration test setup for AWS & GCP #2970

Merged
merged 8 commits into from
Jul 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/pr_integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,6 @@ jobs:
if: ${{ always() }} # this will guarantee that step won't be canceled and resources won't leak
env:
FEAST_SERVER_DOCKER_IMAGE_TAG: ${{ needs.build-docker-image.outputs.DOCKER_IMAGE_TAG }}
FEAST_USAGE: "False"
IS_TEST: "True"
SNOWFLAKE_CI_DEPLOYMENT: ${{ secrets.SNOWFLAKE_CI_DEPLOYMENT }}
SNOWFLAKE_CI_USER: ${{ secrets.SNOWFLAKE_CI_USER }}
SNOWFLAKE_CI_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }}
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,12 @@ jobs:
run: make install-python-ci-dependencies
- name: Test Python
env:
IS_TEST: "True"
SNOWFLAKE_CI_DEPLOYMENT: ${{ secrets.SNOWFLAKE_CI_DEPLOYMENT }}
SNOWFLAKE_CI_USER: ${{ secrets.SNOWFLAKE_CI_USER }}
SNOWFLAKE_CI_PASSWORD: ${{ secrets.SNOWFLAKE_CI_PASSWORD }}
SNOWFLAKE_CI_ROLE: ${{ secrets.SNOWFLAKE_CI_ROLE }}
SNOWFLAKE_CI_WAREHOUSE: ${{ secrets.SNOWFLAKE_CI_WAREHOUSE }}
run: FEAST_USAGE=False pytest -n 8 --cov=./ --cov-report=xml --color=yes sdk/python/tests
run: pytest -n 8 --cov=./ --cov-report=xml --color=yes sdk/python/tests
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
Expand Down
59 changes: 53 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ There are two sets of tests you can run:
#### Local integration tests
For this approach of running tests, you'll need to have docker set up locally: [Get Docker](https://docs.docker.com/get-docker/)

It leverages a file based offline store to test against emulated versions of Datastore, DynamoDB, and Redis, using ephemeral containers.
It leverages a file based offline store to test against emulated versions of Datastore, DynamoDB, and Redis, using ephemeral containers.

These tests create new temporary tables / datasets locally only, and they are cleaned up. when the containers are torn down.

Expand All @@ -161,17 +161,48 @@ To test across clouds, on top of setting up Redis, you also need GCP / AWS / Sno
gcloud auth login
gcloud auth application-default login
```
3. Export `GCLOUD_PROJECT=[your project]` to your .zshrc
- When you run `gcloud auth application-default login`, you should see some output of the form:
```
Credentials saved to file: [$HOME/.config/gcloud/application_default_credentials.json]
```
- You should run `export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/application_default_credentials.json”` to add the application credentials to your .zshrc or .bashrc.
3. Run `export GCLOUD_PROJECT=[your project]` to your .zshrc or .bashrc.
4. Running `gcloud config list` should give you something like this:
```sh
$ gcloud config list
[core]
account = [your email]
disable_usage_reporting = True
project = [your project]

Your active configuration is: [default]
```
5. Export gcp specific environment variables. Namely,
```sh
export GCS_REGION='[your gcs region e.g US]'
export GCS_STAGING_LOCATION='[your gcs staging location]'
```

**AWS**
1. TODO(adchia): flesh out setting up AWS login (or create helper script)
2. Modify `RedshiftDataSourceCreator` to use your credentials
2. To run the AWS Redshift and Dynamo integration tests you will have to export your own AWS credentials. Namely,

```sh
export AWS_REGION='[your aws region]'
export AWS_CLUSTER_ID='[your aws cluster id]'
export AWS_USER='[your aws user]'
export AWS_DB='[your aws database]'
export AWS_STAGING_LOCATION='[your s3 staging location uri]'
export AWS_IAM_ROLE='[redshift and s3 access role]'
export AWS_LAMBDA_ROLE='[your aws lambda execution role]'
export AWS_REGISTRY_PATH='[your aws registry path]'
```

**Snowflake**
1. See https://signup.snowflake.com/ to setup a trial.
1. See https://signup.snowflake.com/ to setup a trial.
2. Then to run successfully, you'll need some environment variables setup:
```sh
export SNOWFLAKE_CI_DEPLOYMENT='[snowflake_deployment]'
export SNOWFLAKE_CI_DEPLOYMENT='[snowflake_deployment]'
export SNOWFLAKE_CI_USER='[your user]'
export SNOWFLAKE_CI_PASSWORD='[your pw]'
export SNOWFLAKE_CI_ROLE='[your CI role e.g. SYSADMIN]'
Expand All @@ -180,12 +211,28 @@ export SNOWFLAKE_CI_WAREHOUSE='[your warehouse]'

Then run `make test-python-integration`. Note that for Snowflake / GCP / AWS, this will create new temporary tables / datasets.

#### Running specific provider tests or running your test against specific online or offline stores

1. If you don't need to have your test run against all of the providers(`gcp`, `aws`, and `snowflake`) or don't need to run against all of the online stores, you can tag your test with specific providers or stores that you need(`@pytest.mark.universal_online_stores` or `@pytest.mark.universal_online_stores` with the `only` parameter). The `only` parameter selects specific offline providers and online stores that your test will test against. Example:

```python
# Only parametrizes this test with the sqlite online store
@pytest.mark.universal_online_stores(only=["sqlite"])
def test_feature_get_online_features_types_match():
```

2. You can also filter tests to run by using pytest's cli filtering. Instead of using the make commands to test Feast, you can filter tests by name with the `-k` parameter. The parametrized integration tests are all uniquely identified by their provider and online store so the `-k` option can select only the tests that you need to run. For example, to run only Redshift related tests, you can use the following command:

```sh
python -m pytest -n 8 --integration -k Redshift sdk/python/tests
```

#### (Experimental) Run full integration tests against containerized services
Test across clouds requires existing accounts on GCP / AWS / Snowflake, and may incur costs when using these services.

For this approach of running tests, you'll need to have docker set up locally: [Get Docker](https://docs.docker.com/get-docker/)

It's possible to run some integration tests against emulated local versions of these services, using ephemeral containers.
It's possible to run some integration tests against emulated local versions of these services, using ephemeral containers.
These tests create new temporary tables / datasets locally only, and they are cleaned up. when the containers are torn down.

The services with containerized replacements currently implemented are:
Expand Down
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,11 +135,10 @@ pprint(feature_vector)

## 📦 Functionality and Roadmap

The list below contains the functionality that contributors are planning to develop for Feast
The list below contains the functionality that contributors are planning to develop for Feast.

* Items below that are in development (or planned for development) will be indicated in parentheses.
* We welcome contribution to all items in the roadmap!
* Have questions about the roadmap? Go to the Slack channel to ask on #feast-development
* Have questions about the roadmap? Go to the Slack channel to ask on #feast-development.

* **Data Sources**
* [x] [Snowflake source](https://docs.feast.dev/reference/data-sources/snowflake)
Expand Down Expand Up @@ -185,9 +184,8 @@ The list below contains the functionality that contributors are planning to deve
* [x] Kubernetes (See [guide](https://docs.feast.dev/how-to-guides/running-feast-in-production#4.3.-java-based-feature-server-deployed-on-kubernetes))
* **Feature Serving**
* [x] Python Client
* [x] REST Feature Server (Python) (See [RFC](https://docs.google.com/document/d/1iXvFhAsJ5jgAhPOpTdB3j-Wj1S9x3Ev\_Wr6ZpnLzER4/edit))
* [x] REST / gRPC Feature Server (Go) (Alpha release. See [docs](https://docs.feast.dev/reference/feature-servers/go-feature-retrieval)
* [x] gRPC Feature Server (Java) (Alpha release. See [#1497](https://github.com/feast-dev/feast/issues/1497))
* [x] [Python feature server](https://docs.feast.dev/reference/feature-servers/python-feature-server)
* [x] [Go feature server](https://docs.feast.dev/reference/feature-servers/go-feature-server)
* **Data Quality Management (See [RFC](https://docs.google.com/document/d/110F72d4NTv80p35wDSONxhhPBqWRwbZXG4f9mNEMd98/edit))**
* [x] Data profiling and validation (Great Expectations)
* **Feature Discovery and Governance**
Expand All @@ -196,7 +194,7 @@ The list below contains the functionality that contributors are planning to deve
* [x] Model-centric feature tracking (feature services)
* [x] Amundsen integration (see [Feast extractor](https://github.com/amundsen-io/amundsen/blob/main/databuilder/databuilder/extractor/feast_extractor.py))
* [x] DataHub integration (see [DataHub Feast docs](https://datahubproject.io/docs/generated/ingestion/sources/feast/))
* [x] Feast Web UI (Alpha release. See [documentation](https://docs.feast.dev/reference/alpha-web-ui))
* [x] Feast Web UI (Alpha release. See [docs](https://docs.feast.dev/reference/alpha-web-ui))

## 🎓 Important Resources

Expand Down
15 changes: 9 additions & 6 deletions sdk/python/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# limitations under the License.
import logging
import multiprocessing
import os
import socket
from contextlib import closing
from datetime import datetime, timedelta
Expand All @@ -24,13 +25,15 @@
import pytest
from _pytest.nodes import Item

from feast import FeatureStore
from feast.wait import wait_retry_backoff
from tests.data.data_creator import create_basic_driver_dataset
from tests.integration.feature_repos.integration_test_repo_config import (
os.environ["FEAST_USAGE"] = "False"
os.environ["IS_TEST"] = "True"
Comment on lines +28 to +29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wrap this wrap these somehow so that we set them when running tests and restore the old values once we're done running tests. Pytest should have a way of making this possible

from feast import FeatureStore # noqa: E402
from feast.wait import wait_retry_backoff # noqa: E402
from tests.data.data_creator import create_basic_driver_dataset # noqa: E402
from tests.integration.feature_repos.integration_test_repo_config import ( # noqa: E402
IntegrationTestRepoConfig,
)
from tests.integration.feature_repos.repo_configuration import (
from tests.integration.feature_repos.repo_configuration import ( # noqa: E402
AVAILABLE_OFFLINE_STORES,
AVAILABLE_ONLINE_STORES,
OFFLINE_STORE_TO_PROVIDER_CONFIG,
Expand All @@ -39,7 +42,7 @@
construct_test_environment,
construct_universal_test_data,
)
from tests.integration.feature_repos.universal.data_sources.file import (
from tests.integration.feature_repos.universal.data_sources.file import ( # noqa: E402
FileDataSourceCreator,
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@
)

DYNAMO_CONFIG = {"type": "dynamodb", "region": "us-west-2"}
# Port 12345 will chosen as default for redis node configuration because Redis Cluster is started off of nodes
# 6379 -> 6384. This causes conflicts in cli integration tests so we manually keep them separate.
REDIS_CONFIG = {"type": "redis", "connection_string": "localhost:6379,db=0"}
REDIS_CLUSTER_CONFIG = {
"type": "redis",
Expand Down Expand Up @@ -390,7 +388,10 @@ def construct_test_environment(

feature_server = AwsLambdaFeatureServerConfig(
enabled=True,
execution_role_name="arn:aws:iam::402087665549:role/lambda_execution_role",
execution_role_name=os.getenv(
"AWS_LAMBDA_ROLE",
"arn:aws:iam::402087665549:role/lambda_execution_role",
),
)

else:
Expand All @@ -402,9 +403,12 @@ def construct_test_environment(
if (
test_repo_config.python_feature_server and test_repo_config.provider == "aws"
) or test_repo_config.registry_location == RegistryLocation.S3:
aws_registry_path = os.getenv(
"AWS_REGISTRY_PATH", "s3://feast-integration-tests/registries"
)
registry: Union[
str, RegistryConfig
] = f"s3://feast-integration-tests/registries/{project}/registry.db"
] = f"{aws_registry_path}/{project}/registry.db"
else:
registry = RegistryConfig(
path=str(Path(repo_dir_name) / "registry.db"),
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
import uuid
from typing import Dict, List, Optional

Expand Down Expand Up @@ -53,7 +54,10 @@ def teardown(self):

def create_offline_store_config(self):
return BigQueryOfflineStoreConfig(
location="US", gcs_staging_location="gs://feast-export/"
location=os.getenv("GCS_REGION", "US"),
gcs_staging_location=os.getenv(
"GCS_STAGING_LOCATION", "gs://feast-export/"
),
)

def create_data_source(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
import uuid
from typing import Dict, List, Optional

Expand All @@ -24,16 +25,23 @@ class RedshiftDataSourceCreator(DataSourceCreator):

def __init__(self, project_name: str, *args, **kwargs):
super().__init__(project_name)
self.client = aws_utils.get_redshift_data_client("us-west-2")
self.s3 = aws_utils.get_s3_resource("us-west-2")
self.client = aws_utils.get_redshift_data_client(
os.getenv("AWS_REGION", "us-west-2")
)
self.s3 = aws_utils.get_s3_resource(os.getenv("AWS_REGION", "us-west-2"))

self.offline_store_config = RedshiftOfflineStoreConfig(
cluster_id="feast-integration-tests",
region="us-west-2",
user="admin",
database="feast",
s3_staging_location="s3://feast-integration-tests/redshift/tests/ingestion",
iam_role="arn:aws:iam::402087665549:role/redshift_s3_access_role",
cluster_id=os.getenv("AWS_CLUSTER_ID", "feast-integration-tests"),
region=os.getenv("AWS_REGION", "us-west-2"),
user=os.getenv("AWS_USER", "admin"),
database=os.getenv("AWS_DB", "feast"),
s3_staging_location=os.getenv(
"AWS_STAGING_LOCATION",
"s3://feast-integration-tests/redshift/tests/ingestion",
),
iam_role=os.getenv(
"AWS_IAM_ROLE", "arn:aws:iam::402087665549:role/redshift_s3_access_role"
),
)

def create_data_source(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from datetime import datetime, timedelta
from tempfile import mkstemp
Expand Down Expand Up @@ -75,12 +76,17 @@ def feature_store_with_gcs_registry():

@pytest.fixture
def feature_store_with_s3_registry():
aws_registry_path = os.getenv(
"AWS_REGISTRY_PATH", "s3://feast-integration-tests/registries"
)
return FeatureStore(
config=RepoConfig(
registry=f"s3://feast-integration-tests/registries/{int(time.time() * 1000)}/registry.db",
registry=f"{aws_registry_path}/{int(time.time() * 1000)}/registry.db",
project="default",
provider="aws",
online_store=DynamoDBOnlineStoreConfig(region="us-west-2"),
online_store=DynamoDBOnlineStoreConfig(
region=os.getenv("AWS_REGION", "us-west-2")
),
offline_store=FileOfflineStoreConfig(),
)
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import time
from datetime import timedelta
from tempfile import mkstemp
Expand Down Expand Up @@ -63,8 +64,11 @@ def gcs_registry() -> Registry:

@pytest.fixture
def s3_registry() -> Registry:
aws_registry_path = os.getenv(
"AWS_REGISTRY_PATH", "s3://feast-integration-tests/registries"
)
registry_config = RegistryConfig(
path=f"s3://feast-integration-tests/registries/{int(time.time() * 1000)}/registry.db",
path=f"{aws_registry_path}/{int(time.time() * 1000)}/registry.db",
cache_ttl_seconds=600,
)
return Registry(registry_config, None)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
"mmh3",
"numpy>=1.22,<3",
"pandas>=1.4.3,<2",
"pandavro==1.5.*",
"pandavro==1.5.*", # For some reason pandavro higher than 1.5.* only support pandas less than 1.3.
"protobuf>3.20,<4",
"proto-plus>=1.20.0,<2",
"pyarrow>=4,<9",
Expand Down