Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery Destination : Fix GCS processing of Facebook data #9415

Merged
merged 8 commits into from
Jan 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"destinationDefinitionId": "079d5540-f236-4294-ba7c-ade8fd918496",
"name": "BigQuery (denormalized typed struct)",
"dockerRepository": "airbyte/destination-bigquery-denormalized",
"dockerImageTag": "0.2.2",
"dockerImageTag": "0.2.3",
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/bigquery",
"icon": "bigquery.svg"
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"destinationDefinitionId": "22f6c74f-5699-40ff-833c-4a879ea40133",
"name": "BigQuery",
"dockerRepository": "airbyte/destination-bigquery",
"dockerImageTag": "0.6.2",
"dockerImageTag": "0.6.3",
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/bigquery",
"icon": "bigquery.svg"
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"destinationDefinitionId": "4816b78f-1489-44c1-9060-4b19d5fa9362",
"name": "S3",
"dockerRepository": "airbyte/destination-s3",
"dockerImageTag": "0.2.2",
"dockerImageTag": "0.2.4",
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/s3",
"icon": "s3.svg"
}
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@
- name: BigQuery
destinationDefinitionId: 22f6c74f-5699-40ff-833c-4a879ea40133
dockerRepository: airbyte/destination-bigquery
dockerImageTag: 0.6.2
dockerImageTag: 0.6.3
documentationUrl: https://docs.airbyte.io/integrations/destinations/bigquery
icon: bigquery.svg
- name: BigQuery (denormalized typed struct)
destinationDefinitionId: 079d5540-f236-4294-ba7c-ade8fd918496
dockerRepository: airbyte/destination-bigquery-denormalized
dockerImageTag: 0.2.2
dockerImageTag: 0.2.3
documentationUrl: https://docs.airbyte.io/integrations/destinations/bigquery
icon: bigquery.svg
- name: Cassandra
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@
supportsDBT: false
supported_destination_sync_modes:
- "append"
- dockerImage: "airbyte/destination-bigquery:0.6.2"
- dockerImage: "airbyte/destination-bigquery:0.6.3"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/destinations/bigquery"
connectionSpecification:
Expand Down Expand Up @@ -378,7 +378,7 @@
- "overwrite"
- "append"
- "append_dedup"
- dockerImage: "airbyte/destination-bigquery-denormalized:0.2.2"
- dockerImage: "airbyte/destination-bigquery-denormalized:0.2.3"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/destinations/bigquery"
connectionSpecification:
Expand Down Expand Up @@ -3408,7 +3408,7 @@
supported_destination_sync_modes:
- "append"
- "overwrite"
- dockerImage: "airbyte/destination-s3:0.2.3"
- dockerImage: "airbyte/destination-s3:0.2.4"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/destinations/s3"
connectionSpecification:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ ENV APPLICATION destination-bigquery-denormalized

COPY --from=build /airbyte /airbyte

LABEL io.airbyte.version=0.2.2
LABEL io.airbyte.version=0.2.3
LABEL io.airbyte.name=airbyte/destination-bigquery-denormalized
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ dependencies {

integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test')
integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-bigquery-denormalized')
integrationTestJavaImplementation files(project(':airbyte-integrations:bases:base-normalization').airbyteDocker.outputs)

implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs)
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ public static JsonNode getSchema() {
+ " \"string\"\n"
+ " ]\n"
+ " },\n"
+ " \"permissions\": {\n"
+ " \"permission-list\": {\n"
+ " \"type\": [\n"
+ " \"array\"\n"
+ " ],\n"
Expand Down Expand Up @@ -135,7 +135,7 @@ public static JsonNode getSchemaWithInvalidArrayType() {
+ " \"string\"\n"
+ " ]\n"
+ " },\n"
+ " \"permissions\": {\n"
+ " \"permission-list\": {\n"
+ " \"type\": [\n"
+ " \"array\"\n"
+ " ],\n"
Expand Down Expand Up @@ -167,7 +167,7 @@ public static JsonNode getData() {
"{\n"
+ " \"name\": \"Andrii\",\n"
+ " \"accepts_marketing_updated_at\": \"2021-10-11T06:36:53-07:00\",\n"
+ " \"permissions\": [\n"
+ " \"permission-list\": [\n"
+ " {\n"
+ " \"domain\": \"abs\",\n"
+ " \"grants\": [\n"
Expand Down Expand Up @@ -266,7 +266,7 @@ public static JsonNode getDataWithEmptyObjectAndArray() {
return Jsons.deserialize(
"{\n"
+ " \"name\": \"Andrii\",\n"
+ " \"permissions\": [\n"
+ " \"permission-list\": [\n"
+ " {\n"
+ " \"domain\": \"abs\",\n"
+ " \"items\": {},\n" // empty object
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ ENV APPLICATION destination-bigquery

COPY --from=build /airbyte /airbyte

LABEL io.airbyte.version=0.6.2
LABEL io.airbyte.version=0.6.3
LABEL io.airbyte.name=airbyte/destination-bigquery
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ ENV APPLICATION destination-s3

COPY --from=build /airbyte /airbyte

LABEL io.airbyte.version=0.2.3
LABEL io.airbyte.version=0.2.4
LABEL io.airbyte.name=airbyte/destination-s3
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ protected String applyDefaultCase(final String input) {

@Override
public String getIdentifier(final String name) {
return checkFirsCharInStreamName(convertStreamName(name));
return replaceForbiddenCharacters(checkFirsCharInStreamName(convertStreamName(name)));
}

private String checkFirsCharInStreamName(final String name) {
Expand All @@ -26,4 +26,8 @@ private String checkFirsCharInStreamName(final String name) {
}
}

private String replaceForbiddenCharacters(final String name) {
return name.replace("-", "_");
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ public Schema getAvroSchema(final JsonNode jsonSchema,
// Omit the namespace for root level fields, because it is directly assigned in the builder above.
// This may not be the correct choice.
? null
: (fieldNamespace == null ? fieldName : (fieldNamespace + "." + fieldName));
: (fieldNamespace == null ? stdName : (fieldNamespace + "." + stdName));
fieldBuilder.type(parseJsonField(subfieldName, subfieldNamespace, subfieldDefinition, appendExtraProps, addStringToLogicalTypes))
.withDefault(null);
}
Expand Down
28 changes: 15 additions & 13 deletions docs/integrations/destinations/bigquery.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ Therefore, Airbyte BigQuery destination will convert any invalid characters into

| Version | Date | Pull Request | Subject |
|:--------| :--- | :--- | :--- |
| 0.6.3 | 2022-01-12 | [\#9415](https://github.com/airbytehq/airbyte/pull/9415) | BigQuery Destination : Fix GCS processing of Facebook data |
| 0.6.2 | 2022-01-10 | [\#9121](https://github.com/airbytehq/airbyte/pull/9121) | Fixed check method for GCS mode to verify if all roles assigned to user |
| 0.6.1 | 2021-12-22 | [\#9039](https://github.com/airbytehq/airbyte/pull/9039) | Added part_size configuration to UI for GCS staging |
| 0.6.0 | 2021-12-17 | [\#8788](https://github.com/airbytehq/airbyte/issues/8788) | BigQuery/BiqQuery denorm Destinations : Add possibility to use different types of GCS files |
Expand All @@ -169,21 +170,22 @@ Therefore, Airbyte BigQuery destination will convert any invalid characters into

### bigquery-denormalized

| Version | Date | Pull Request | Subject |
|:--------| :--- | :--- | :--- |
| 0.2.2 | 2021-12-22 | [\#9039](https://github.com/airbytehq/airbyte/pull/9039) | Added part_size configuration to UI for GCS staging |
| 0.2.1 | 2021-12-21 | [\#8574](https://github.com/airbytehq/airbyte/pull/8574) | Added namespace to Avro and Parquet record types |
| 0.2.0 | 2021-12-17 | [\#8788](https://github.com/airbytehq/airbyte/pull/8788) | BigQuery/BiqQuery denorm Destinations : Add possibility to use different types of GCS files |
| Version | Date | Pull Request | Subject |
|:--------|:-----------|:-----------------------------------------------------------| :--- |
| 0.2.3 | 2022-01-12 | [\#9415](https://github.com/airbytehq/airbyte/pull/9415) | BigQuery Destination : Fix GCS processing of Facebook data |
| 0.2.2 | 2021-12-22 | [\#9039](https://github.com/airbytehq/airbyte/pull/9039) | Added part_size configuration to UI for GCS staging |
| 0.2.1 | 2021-12-21 | [\#8574](https://github.com/airbytehq/airbyte/pull/8574) | Added namespace to Avro and Parquet record types |
| 0.2.0 | 2021-12-17 | [\#8788](https://github.com/airbytehq/airbyte/pull/8788) | BigQuery/BiqQuery denorm Destinations : Add possibility to use different types of GCS files |
| 0.1.11 | 2021-12-16 | [\#8816](https://github.com/airbytehq/airbyte/issues/8816) | Update dataset locations |
| 0.1.10 | 2021-11-09 | [\#7804](https://github.com/airbytehq/airbyte/pull/7804) | handle null values in fields described by a $ref definition |
| 0.1.10 | 2021-11-09 | [\#7804](https://github.com/airbytehq/airbyte/pull/7804) | handle null values in fields described by a $ref definition |
| 0.1.9 | 2021-11-08 | [\#7736](https://github.com/airbytehq/airbyte/issues/7736) | Fixed the handling of ObjectNodes with $ref definition key |
| 0.1.8 | 2021-10-27 | [\#7413](https://github.com/airbytehq/airbyte/issues/7413) | Fixed DATETIME conversion for BigQuery |
| 0.1.7 | 2021-10-26 | [\#7240](https://github.com/airbytehq/airbyte/issues/7240) | Output partitioned/clustered tables |
| 0.1.6 | 2021-09-16 | [\#6145](https://github.com/airbytehq/airbyte/pull/6145) | BigQuery Denormalized support for date, datetime & timestamp types through the json "format" key |
| 0.1.5 | 2021-09-07 | [\#5881](https://github.com/airbytehq/airbyte/pull/5881) | BigQuery Denormalized NPE fix |
| 0.1.4 | 2021-09-04 | [\#5813](https://github.com/airbytehq/airbyte/pull/5813) | fix Stackoverflow error when receive a schema from source where "Array" type doesn't contain a required "items" element |
| 0.1.3 | 2021-08-07 | [\#5261](https://github.com/airbytehq/airbyte/pull/5261) | 🐛 Destination BigQuery\(Denormalized\): Fix processing arrays of records |
| 0.1.2 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json |
| 0.1.1 | 2021-06-21 | [\#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer |
| 0.1.0 | 2021-06-21 | [\#4176](https://github.com/airbytehq/airbyte/pull/4176) | Destination using Typed Struct and Repeated fields |
| 0.1.6 | 2021-09-16 | [\#6145](https://github.com/airbytehq/airbyte/pull/6145) | BigQuery Denormalized support for date, datetime & timestamp types through the json "format" key |
| 0.1.5 | 2021-09-07 | [\#5881](https://github.com/airbytehq/airbyte/pull/5881) | BigQuery Denormalized NPE fix |
| 0.1.4 | 2021-09-04 | [\#5813](https://github.com/airbytehq/airbyte/pull/5813) | fix Stackoverflow error when receive a schema from source where "Array" type doesn't contain a required "items" element |
| 0.1.3 | 2021-08-07 | [\#5261](https://github.com/airbytehq/airbyte/pull/5261) | 🐛 Destination BigQuery\(Denormalized\): Fix processing arrays of records |
| 0.1.2 | 2021-07-30 | [\#5125](https://github.com/airbytehq/airbyte/pull/5125) | Enable `additionalPropertities` in spec.json |
| 0.1.1 | 2021-06-21 | [\#3555](https://github.com/airbytehq/airbyte/pull/3555) | Partial Success in BufferedStreamConsumer |
| 0.1.0 | 2021-06-21 | [\#4176](https://github.com/airbytehq/airbyte/pull/4176) | Destination using Typed Struct and Repeated fields |

45 changes: 23 additions & 22 deletions docs/integrations/destinations/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,26 +222,27 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A
## CHANGELOG

| Version | Date | Pull Request | Subject |
| :--- | :--- | :--- | :--- |
| 0.2.3 | 2022-01-11 | [\#9367](https://github.com/airbytehq/airbyte/pull/9367) | Avro & Parquet: support array field with unknown item type; default any improperly typed field to string. |
| 0.2.2 | 2021-12-21 | [\#8574](https://github.com/airbytehq/airbyte/pull/8574) | Added namespace to Avro and Parquet record types |
| 0.2.1 | 2021-12-20 | [\#8974](https://github.com/airbytehq/airbyte/pull/8974) | Release a new version to ensure there is no excessive logging. |
| 0.2.0 | 2021-12-15 | [\#8607](https://github.com/airbytehq/airbyte/pull/8607) | Change the output filename for CSV files - it's now `bucketPath/namespace/streamName/timestamp_epochMillis_randomUuid.csv` |
| 0.1.16 | 2021-12-10 | [\#8562](https://github.com/airbytehq/airbyte/pull/8562) | Swap dependencies with destination-jdbc. |
| 0.1.15 | 2021-12-03 | [\#8501](https://github.com/airbytehq/airbyte/pull/8501) | Remove excessive logging for Avro and Parquet invalid date strings. |
| 0.1.14 | 2021-11-09 | [\#7732](https://github.com/airbytehq/airbyte/pull/7732) | Support timestamp in Avro and Parquet |
| 0.1.13 | 2021-11-03 | [\#7288](https://github.com/airbytehq/airbyte/issues/7288) | Support Json `additionalProperties`. |
| 0.1.12 | 2021-09-13 | [\#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by S3 |
| 0.1.11 | 2021-09-10 | [\#5729](https://github.com/airbytehq/airbyte/pull/5729) | For field names that start with a digit, a `_` will be appended at the beginning for the`Parquet` and `Avro` formats. |
| 0.1.10 | 2021-08-17 | [\#4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator |
| 0.1.9 | 2021-07-12 | [\#4666](https://github.com/airbytehq/airbyte/pull/4666) | Fix MinIO output for Parquet format. |
| 0.1.8 | 2021-07-07 | [\#4613](https://github.com/airbytehq/airbyte/pull/4613) | Patched schema converter to support combined restrictions. |
| 0.1.7 | 2021-06-23 | [\#4227](https://github.com/airbytehq/airbyte/pull/4227) | Added Avro and JSONL output. |
| 0.1.6 | 2021-06-16 | [\#4130](https://github.com/airbytehq/airbyte/pull/4130) | Patched the check to verify prefix access instead of full-bucket access. |
| 0.1.5 | 2021-06-14 | [\#3908](https://github.com/airbytehq/airbyte/pull/3908) | Fixed default `max_padding_size_mb` in `spec.json`. |
| 0.1.4 | 2021-06-14 | [\#3908](https://github.com/airbytehq/airbyte/pull/3908) | Added Parquet output. |
| 0.1.3 | 2021-06-13 | [\#4038](https://github.com/airbytehq/airbyte/pull/4038) | Added support for alternative S3. |
| 0.1.2 | 2021-06-10 | [\#4029](https://github.com/airbytehq/airbyte/pull/4029) | Fixed `_airbyte_emitted_at` field to be a UTC instead of local timestamp for consistency. |
| 0.1.1 | 2021-06-09 | [\#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` in base Docker image for Kubernetes support. |
| 0.1.0 | 2021-06-03 | [\#3672](https://github.com/airbytehq/airbyte/pull/3672) | Initial release with CSV output. |
|:--------| :--- | :--- | :--- |
| 0.2.4 | 2022-01-12 | [\#9415](https://github.com/airbytehq/airbyte/pull/9415) | BigQuery Destination : Fix GCS processing of Facebook data |
| 0.2.3 | 2022-01-11 | [\#9367](https://github.com/airbytehq/airbyte/pull/9367) | Avro & Parquet: support array field with unknown item type; default any improperly typed field to string. |
| 0.2.2 | 2021-12-21 | [\#8574](https://github.com/airbytehq/airbyte/pull/8574) | Added namespace to Avro and Parquet record types |
| 0.2.1 | 2021-12-20 | [\#8974](https://github.com/airbytehq/airbyte/pull/8974) | Release a new version to ensure there is no excessive logging. |
| 0.2.0 | 2021-12-15 | [\#8607](https://github.com/airbytehq/airbyte/pull/8607) | Change the output filename for CSV files - it's now `bucketPath/namespace/streamName/timestamp_epochMillis_randomUuid.csv` |
| 0.1.16 | 2021-12-10 | [\#8562](https://github.com/airbytehq/airbyte/pull/8562) | Swap dependencies with destination-jdbc. |
| 0.1.15 | 2021-12-03 | [\#8501](https://github.com/airbytehq/airbyte/pull/8501) | Remove excessive logging for Avro and Parquet invalid date strings. |
| 0.1.14 | 2021-11-09 | [\#7732](https://github.com/airbytehq/airbyte/pull/7732) | Support timestamp in Avro and Parquet |
| 0.1.13 | 2021-11-03 | [\#7288](https://github.com/airbytehq/airbyte/issues/7288) | Support Json `additionalProperties`. |
| 0.1.12 | 2021-09-13 | [\#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by S3 |
| 0.1.11 | 2021-09-10 | [\#5729](https://github.com/airbytehq/airbyte/pull/5729) | For field names that start with a digit, a `_` will be appended at the beginning for the`Parquet` and `Avro` formats. |
| 0.1.10 | 2021-08-17 | [\#4699](https://github.com/airbytehq/airbyte/pull/4699) | Added json config validator |
| 0.1.9 | 2021-07-12 | [\#4666](https://github.com/airbytehq/airbyte/pull/4666) | Fix MinIO output for Parquet format. |
| 0.1.8 | 2021-07-07 | [\#4613](https://github.com/airbytehq/airbyte/pull/4613) | Patched schema converter to support combined restrictions. |
| 0.1.7 | 2021-06-23 | [\#4227](https://github.com/airbytehq/airbyte/pull/4227) | Added Avro and JSONL output. |
| 0.1.6 | 2021-06-16 | [\#4130](https://github.com/airbytehq/airbyte/pull/4130) | Patched the check to verify prefix access instead of full-bucket access. |
| 0.1.5 | 2021-06-14 | [\#3908](https://github.com/airbytehq/airbyte/pull/3908) | Fixed default `max_padding_size_mb` in `spec.json`. |
| 0.1.4 | 2021-06-14 | [\#3908](https://github.com/airbytehq/airbyte/pull/3908) | Added Parquet output. |
| 0.1.3 | 2021-06-13 | [\#4038](https://github.com/airbytehq/airbyte/pull/4038) | Added support for alternative S3. |
| 0.1.2 | 2021-06-10 | [\#4029](https://github.com/airbytehq/airbyte/pull/4029) | Fixed `_airbyte_emitted_at` field to be a UTC instead of local timestamp for consistency. |
| 0.1.1 | 2021-06-09 | [\#3973](https://github.com/airbytehq/airbyte/pull/3973) | Added `AIRBYTE_ENTRYPOINT` in base Docker image for Kubernetes support. |
| 0.1.0 | 2021-06-03 | [\#3672](https://github.com/airbytehq/airbyte/pull/3672) | Initial release with CSV output. |