Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destinations still have "basic normalization" field in configuration #4710

Closed
cgardens opened this issue Jul 12, 2021 · 6 comments
Closed

Destinations still have "basic normalization" field in configuration #4710

cgardens opened this issue Jul 12, 2021 · 6 comments

Comments

@cgardens
Copy link
Contributor

Environment

  • Airbyte version: 0.27.1
  • Destination Connector and version: Postgres 0.3.6, Snowflake 0.3.9
  • Severity: Medium
  • Step where error happened: Configuration time

Current Behavior

At configuration time at least 2 of our destinations still allow a user to configure basic normalization. I saw this in the most recent versions of Postgres and Snowflake. Didn't look past those, so there may be others.
Screen Shot 2021-07-12 at 3 01 44 PM

Expected Behavior

The changes to connection settings that moved setting up normalization as part of a connection, means that a user should not be configuring this fields as part of the configurations for these destinations.

Steps to Reproduce

  1. Create a postgres destination on most recent version of airbyte

@ChristopheDuong is this a known issue? Is it an easy fix or is it going to be a headache because migrations?

@cgardens cgardens added the type/bug Something isn't working label Jul 12, 2021
@ChristopheDuong
Copy link
Contributor

@ChristopheDuong is this a known issue? Is it an easy fix or is it going to be a headache because migrations?

No, I think @subodh1810 made a migration to remove these in the past

@subodh1810
Copy link
Contributor

This is weird, and may be I know whats going on, as part of this PR https://github.com/airbytehq/airbyte/pull/3624/files we deleted the basic_normalization attribute from all the connectors but we never published the images for all those connectors! :/

here is the output of docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/4e476dac-fb67-4911-b0b0-188b88825fb6/0 --network host --log-driver none airbyte/destination-postgres:dev spec

2021-07-13 09:11:18 INFO i.a.i.d.p.PostgresDestination(main):83 - {} - starting destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2021-07-13 09:11:18 INFO i.a.i.b.IntegrationRunner(run):81 - {} - Running integration: io.airbyte.integrations.destination.postgres.PostgresDestination
2021-07-13 09:11:18 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {spec=null}
2021-07-13 09:11:18 INFO i.a.i.b.IntegrationRunner(run):85 - {} - Command: SPEC
2021-07-13 09:11:18 INFO i.a.i.b.IntegrationRunner(run):86 - {} - Integration config: IntegrationConfig{command=SPEC, configPath='null', catalogPath='null', statePath='null'}
{"type":"SPEC","spec":{"documentationUrl":"https://docs.airbyte.io/integrations/destinations/postgres","connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Postgres Destination Spec","type":"object","required":["host","port","username","database","schema"],"additionalProperties":false,"properties":{"host":{"title":"Host","description":"Hostname of the database.","type":"string","order":0},"port":{"title":"Port","description":"Port of the database.","type":"integer","minimum":0,"maximum":65536,"default":5432,"examples":["5432"],"order":1},"database":{"title":"DB Name","description":"Name of the database.","type":"string","order":2},"schema":{"title":"Default Schema","description":"The default schema tables are written to if the source does not specify a namespace. The usual value for this field is \"public\".","type":"string","examples":["public"],"default":"public","order":3},"username":{"title":"User","description":"Username to use to access the database.","type":"string","order":4},"password":{"title":"Password","description":"Password associated with the username.","type":"string","airbyte_secret":true,"order":5},"ssl":{"title":"SSL Connection","description":"Encrypt data using SSL.","type":"boolean","default":false,"order":6}}},"supportsIncremental":true,"supportsNormalization":true,"supportsDBT":true,"supported_destination_sync_modes":["overwrite","append","append_dedup"]}}
2021-07-13 09:11:19 INFO i.a.i.b.IntegrationRunner(run):122 - {} - Completed integration: io.airbyte.integrations.destination.postgres.PostgresDestination
2021-07-13 09:11:19 INFO i.a.i.d.p.PostgresDestination(main):85 - {} - completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination

As you can see above, am using dev image and there is no basic_normalization attribute.
But when I use the 0.3.6 tag via docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/4e476dac-fb67-4911-b0b0-188b88825fb6/0 --network host --log-driver none airbyte/destination-postgres:0.3.6 spec the output is

2021-07-13 09:11:47 INFO i.a.i.d.p.PostgresDestination(main):83 - {} - starting destination: class io.airbyte.integrations.destination.postgres.PostgresDestination
2021-07-13 09:11:47 INFO i.a.i.b.IntegrationRunner(run):78 - {} - Running integration: io.airbyte.integrations.destination.postgres.PostgresDestination
2021-07-13 09:11:47 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {spec=null}
2021-07-13 09:11:47 INFO i.a.i.b.IntegrationRunner(run):82 - {} - Command: SPEC
2021-07-13 09:11:47 INFO i.a.i.b.IntegrationRunner(run):83 - {} - Integration config: IntegrationConfig{command=SPEC, configPath='null', catalogPath='null', statePath='null'}
{"type":"SPEC","spec":{"documentationUrl":"https://docs.airbyte.io/integrations/destinations/postgres","connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Postgres Destination Spec","type":"object","required":["host","port","username","database","schema"],"additionalProperties":false,"properties":{"host":{"title":"Host","description":"Hostname of the database.","type":"string","order":0},"port":{"title":"Port","description":"Port of the database.","type":"integer","minimum":0,"maximum":65536,"default":5432,"examples":["5432"],"order":1},"database":{"title":"DB Name","description":"Name of the database.","type":"string","order":2},"schema":{"title":"Default Schema","description":"The default schema tables are written to if the source does not specify a namespace. The usual value for this field is \"public\".","type":"string","examples":["public"],"default":"public","order":3},"username":{"title":"User","description":"Username to use to access the database.","type":"string","order":4},"password":{"title":"Password","description":"Password associated with the username.","type":"string","airbyte_secret":true,"order":5},"ssl":{"title":"SSL Connection","description":"Encrypt data using SSL.","type":"boolean","default":false,"order":6},"basic_normalization":{"title":"Basic Normalization","type":"boolean","default":true,"description":"Whether or not to normalize the data in the destination. See <a href=\"https://docs.airbyte.io/architecture/basic-normalization\">basic normalization</a> for more details.","examples":[true,false],"order":7}}},"supportsIncremental":true,"supportsNormalization":true,"supportsDBT":true,"supported_destination_sync_modes":["overwrite","append","append_dedup"]}}
2021-07-13 09:11:48 INFO i.a.i.b.IntegrationRunner(run):118 - {} - Completed integration: io.airbyte.integrations.destination.postgres.PostgresDestination
2021-07-13 09:11:48 INFO i.a.i.d.p.PostgresDestination(main):85 - {} - completed destination: class io.airbyte.integrations.destination.postgres.PostgresDestination

which contains the basic_normalization attribute

@subodh1810
Copy link
Contributor

These are the following connectors that we need to publish

postgres
snowflake
redshift

@subodh1810
Copy link
Contributor

Oh we dont have to do snowflake cause Sherif already did it today, #4713 and in the new version basic_normalization is not there,
docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/75336fbb-9267-4eec-b195-d2c24a3eb3d2/0 --network host --log-driver none airbyte/destination-snowflake:0.3.10 spec

2021-07-13 09:37:39 INFO i.a.i.d.s.SnowflakeDestination(main):81 - {} - starting destination: class io.airbyte.integrations.destination.snowflake.SnowflakeDestination
2021-07-13 09:37:39 INFO i.a.i.b.IntegrationRunner(run):81 - {} - Running integration: io.airbyte.integrations.destination.snowflake.SnowflakeDestination
2021-07-13 09:37:39 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {spec=null}
2021-07-13 09:37:39 INFO i.a.i.b.IntegrationRunner(run):85 - {} - Command: SPEC
2021-07-13 09:37:39 INFO i.a.i.b.IntegrationRunner(run):86 - {} - Integration config: IntegrationConfig{command=SPEC, configPath='null', catalogPath='null', statePath='null'}
{"type":"SPEC","spec":{"documentationUrl":"https://docs.airbyte.io/integrations/destinations/snowflake","connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Snowflake Destination Spec","type":"object","required":["host","role","warehouse","database","schema","username","password"],"additionalProperties":false,"properties":{"host":{"description":"Host domain of the snowflake instance (must include the account, region, cloud environment, and end with snowflakecomputing.com).","examples":["accountname.us-east-2.aws.snowflakecomputing.com"],"type":"string","title":"Host","order":0},"role":{"description":"The role you created for Airbyte to access Snowflake.","examples":["AIRBYTE_ROLE"],"type":"string","title":"Role","order":1},"warehouse":{"description":"The warehouse you created for Airbyte to sync data into.","examples":["AIRBYTE_WAREHOUSE"],"type":"string","title":"Warehouse","order":2},"database":{"description":"The database you created for Airbyte to sync data into.","examples":["AIRBYTE_DATABASE"],"type":"string","title":"Database","order":3},"schema":{"description":"The default Snowflake schema tables are written to if the source does not specify a namespace.","examples":["AIRBYTE_SCHEMA"],"type":"string","title":"Default Schema","order":4},"username":{"description":"The username you created to allow Airbyte to access the database.","examples":["AIRBYTE_USER"],"type":"string","title":"Username","order":5},"password":{"description":"Password associated with the username.","type":"string","airbyte_secret":true,"title":"Password","order":6},"loading_method":{"type":"object","title":"Loading Method","description":"Loading method used to send data to Snowflake.","order":7,"oneOf":[{"title":"Standard Inserts","additionalProperties":false,"description":"Uses <pre>INSERT</pre> statements to send batches of records to Snowflake. Easiest (no setup) but not recommended for large production workloads due to slow speed.","required":["method"],"properties":{"method":{"type":"string","enum":["Standard"],"default":"Standard"}}},{"title":"AWS S3 Staging","additionalProperties":false,"description":"Writes large batches of records to a file, uploads the file to S3, then uses <pre>COPY INTO table</pre> to upload the file. Recommended for large production workloads for better speed and scalability.","required":["method","s3_bucket_name","access_key_id","secret_access_key"],"properties":{"method":{"type":"string","enum":["S3 Staging"],"default":"S3 Staging","order":0},"s3_bucket_name":{"title":"S3 Bucket Name","type":"string","description":"The name of the staging S3 bucket. Airbyte will write files to this bucket and read them via <pre>COPY</pre> statements on Snowflake.","examples":["airbyte.staging"],"order":1},"s3_bucket_region":{"title":"S3 Bucket Region","type":"string","default":"","description":"The region of the S3 staging bucket to use if utilising a copy strategy.","enum":["","us-east-1","us-east-2","us-west-1","us-west-2","af-south-1","ap-east-1","ap-south-1","ap-northeast-1","ap-northeast-2","ap-northeast-3","ap-southeast-1","ap-southeast-2","ca-central-1","cn-north-1","cn-northwest-1","eu-central-1","eu-west-1","eu-west-2","eu-west-3","eu-south-1","eu-north-1","sa-east-1","me-south-1"],"order":2},"access_key_id":{"type":"string","description":"The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.","title":"S3 Key Id","airbyte_secret":true,"order":3},"secret_access_key":{"type":"string","description":"The corresponding secret to the above access key id.","title":"S3 Access Key","airbyte_secret":true,"order":4}}},{"title":"GCS Staging","additionalProperties":false,"description":"Writes large batches of records to a file, uploads the file to GCS, then uses <pre>COPY INTO table</pre> to upload the file. Recommended for large production workloads for better speed and scalability.","required":["method","project_id","bucket_name","credentials_json"],"properties":{"method":{"type":"string","enum":["GCS Staging"],"default":"GCS Staging","order":0},"project_id":{"title":"GCP Project ID","type":"string","description":"The name of the GCP project ID for your credentials.","examples":["my-project"],"order":1},"bucket_name":{"title":"GCS Bucket Name","type":"string","description":"The name of the staging GCS bucket. Airbyte will write files to this bucket and read them via <pre>COPY</pre> statements on Snowflake.","examples":["airbyte-staging"],"order":2},"credentials_json":{"title":"Google Application Credentials","type":"string","description":"The contents of the JSON key file that has read/write permissions to the staging GCS bucket. You will separately need to grant bucket access to your Snowflake GCP service account. See the <a href=\"https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating_service_account_keys\">GCP docs</a> for more information on how to generate a JSON key for your service account.","airbyte_secret":true,"multiline":true,"order":3}}}]}}},"supportsIncremental":true,"supportsNormalization":true,"supportsDBT":true,"supported_destination_sync_modes":["overwrite","append","append_dedup"]}}
2021-07-13 09:37:39 INFO i.a.i.b.IntegrationRunner(run):122 - {} - Completed integration: io.airbyte.integrations.destination.snowflake.SnowflakeDestination
2021-07-13 09:37:39 INFO i.a.i.d.s.SnowflakeDestination(main):83 - {} - completed destination: class io.airbyte.integrations.destination.snowflake.SnowflakeDestination

@subodh1810
Copy link
Contributor

subodh1810 commented Jul 13, 2021

Redshift :
docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/5d9bde02-d4fd-48a0-97ef-e4fd89979171/0 --network host --log-driver none airbyte/destination-redshift:0.3.9 spec
Output for current version 0.3.9 contains basic_normalization :

2021-07-13 09:45:28 INFO i.a.i.d.r.RedshiftDestination(main):97 - {} - starting destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination
2021-07-13 09:45:28 INFO i.a.i.b.IntegrationRunner(run):78 - {} - Running integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2021-07-13 09:45:28 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {spec=null}
2021-07-13 09:45:28 INFO i.a.i.b.IntegrationRunner(run):82 - {} - Command: SPEC
2021-07-13 09:45:28 INFO i.a.i.b.IntegrationRunner(run):83 - {} - Integration config: IntegrationConfig{command=SPEC, configPath='null', catalogPath='null', statePath='null'}
{"type":"SPEC","spec":{"documentationUrl":"https://docs.airbyte.io/integrations/destinations/redshift","connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Redshift Destination Spec","type":"object","required":["host","port","database","username","password","schema"],"additionalProperties":false,"properties":{"host":{"description":"Host Endpoint of the Redshift Cluster (must include the cluster-id, region and end with .redshift.amazonaws.com)","type":"string","title":"Host"},"port":{"description":"Port of the database.","type":"integer","minimum":0,"maximum":65536,"default":5439,"examples":["5439"],"title":"Port"},"username":{"description":"Username to use to access the database.","type":"string","title":"Username"},"password":{"description":"Password associated with the username.","type":"string","airbyte_secret":true,"title":"Password"},"database":{"description":"Name of the database.","type":"string","title":"Database"},"schema":{"description":"The default schema tables are written to if the source does not specify a namespace. Unless specifically configured, the usual value for this field is \"public\".","type":"string","examples":["public"],"default":"public","title":"Default Schema"},"basic_normalization":{"type":"boolean","default":true,"description":"Whether or not to normalize the data in the destination. See <a href=\"https://docs.airbyte.io/architecture/basic-normalization\">basic normalization</a> for more details.","title":"Basic Normalization","examples":[true,false]},"s3_bucket_name":{"title":"S3 Bucket Name","type":"string","description":"The name of the staging S3 bucket to use if utilising a COPY strategy. COPY is recommended for production workloads for better speed and scalability. See <a href=\"https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html\">AWS docs</a> for more details.","examples":["airbyte.staging"]},"s3_bucket_region":{"title":"S3 Bucket Region","type":"string","default":"","description":"The region of the S3 staging bucket to use if utilising a copy strategy.","enum":["","us-east-1","us-east-2","us-west-1","us-west-2","af-south-1","ap-east-1","ap-south-1","ap-northeast-1","ap-northeast-2","ap-northeast-3","ap-southeast-1","ap-southeast-2","ca-central-1","cn-north-1","cn-northwest-1","eu-central-1","eu-north-1","eu-south-1","eu-west-1","eu-west-2","eu-west-3","sa-east-1","me-south-1"]},"access_key_id":{"type":"string","description":"The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.","title":"S3 Key Id","airbyte_secret":true},"secret_access_key":{"type":"string","description":"The corresponding secret to the above access key id.","title":"S3 Access Key","airbyte_secret":true},"part_size":{"type":"integer","minimum":10,"maximum":100,"examples":["10"],"description":"Optional. Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.","title":"Stream Part Size"}}},"supportsIncremental":true,"supportsNormalization":true,"supportsDBT":true,"supported_destination_sync_modes":["overwrite","append","append_dedup"]}}
2021-07-13 09:45:29 INFO i.a.i.b.IntegrationRunner(run):118 - {} - Completed integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2021-07-13 09:45:29 INFO i.a.i.d.r.RedshiftDestination(main):99 - {} - completed destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination

Dev : docker run --rm --init -i -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -w /data/5d9bde02-d4fd-48a0-97ef-e4fd89979171/0 --network host --log-driver none airbyte/destination-redshift:dev spec
Output doesnt contain basic_normalization

2021-07-13 09:46:34 INFO i.a.i.d.r.RedshiftDestination(main):97 - {} - starting destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination
2021-07-13 09:46:34 INFO i.a.i.b.IntegrationRunner(run):81 - {} - Running integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2021-07-13 09:46:34 INFO i.a.i.b.IntegrationCliParser(parseOptions):135 - {} - integration args: {spec=null}
2021-07-13 09:46:34 INFO i.a.i.b.IntegrationRunner(run):85 - {} - Command: SPEC
2021-07-13 09:46:34 INFO i.a.i.b.IntegrationRunner(run):86 - {} - Integration config: IntegrationConfig{command=SPEC, configPath='null', catalogPath='null', statePath='null'}
{"type":"SPEC","spec":{"documentationUrl":"https://docs.airbyte.io/integrations/destinations/redshift","connectionSpecification":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Redshift Destination Spec","type":"object","required":["host","port","database","username","password","schema"],"additionalProperties":false,"properties":{"host":{"description":"Host Endpoint of the Redshift Cluster (must include the cluster-id, region and end with .redshift.amazonaws.com)","type":"string","title":"Host"},"port":{"description":"Port of the database.","type":"integer","minimum":0,"maximum":65536,"default":5439,"examples":["5439"],"title":"Port"},"username":{"description":"Username to use to access the database.","type":"string","title":"Username"},"password":{"description":"Password associated with the username.","type":"string","airbyte_secret":true,"title":"Password"},"database":{"description":"Name of the database.","type":"string","title":"Database"},"schema":{"description":"The default schema tables are written to if the source does not specify a namespace. Unless specifically configured, the usual value for this field is \"public\".","type":"string","examples":["public"],"default":"public","title":"Default Schema"},"s3_bucket_name":{"title":"S3 Bucket Name","type":"string","description":"The name of the staging S3 bucket to use if utilising a COPY strategy. COPY is recommended for production workloads for better speed and scalability. See <a href=\"https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html\">AWS docs</a> for more details.","examples":["airbyte.staging"]},"s3_bucket_region":{"title":"S3 Bucket Region","type":"string","default":"","description":"The region of the S3 staging bucket to use if utilising a copy strategy.","enum":["","us-east-1","us-east-2","us-west-1","us-west-2","af-south-1","ap-east-1","ap-south-1","ap-northeast-1","ap-northeast-2","ap-northeast-3","ap-southeast-1","ap-southeast-2","ca-central-1","cn-north-1","cn-northwest-1","eu-central-1","eu-north-1","eu-south-1","eu-west-1","eu-west-2","eu-west-3","sa-east-1","me-south-1"]},"access_key_id":{"type":"string","description":"The Access Key Id granting allow one to access the above S3 staging bucket. Airbyte requires Read and Write permissions to the given bucket.","title":"S3 Key Id","airbyte_secret":true},"secret_access_key":{"type":"string","description":"The corresponding secret to the above access key id.","title":"S3 Access Key","airbyte_secret":true},"part_size":{"type":"integer","minimum":10,"maximum":100,"examples":["10"],"description":"Optional. Increase this if syncing tables larger than 100GB. Only relevant for COPY. Files are streamed to S3 in parts. This determines the size of each part, in MBs. As S3 has a limit of 10,000 parts per file, part size affects the table size. This is 10MB by default, resulting in a default limit of 100GB tables. Note, a larger part size will result in larger memory requirements. A rule of thumb is to multiply the part size by 10 to get the memory requirement. Modify this with care.","title":"Stream Part Size"}}},"supportsIncremental":true,"supportsNormalization":true,"supportsDBT":true,"supported_destination_sync_modes":["overwrite","append","append_dedup"]}}
2021-07-13 09:46:34 INFO i.a.i.b.IntegrationRunner(run):122 - {} - Completed integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2021-07-13 09:46:34 INFO i.a.i.d.r.RedshiftDestination(main):99 - {} - completed destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination

@cgardens
Copy link
Contributor Author

Nice! Thanks for fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment