Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Iceberg partitioning not working with partition transform functions #2694

Conversation

LeonLuttenberger
Copy link
Contributor

Feature or Bugfix

  • Bugfix

Detail

  • Fix Iceberg partitioning issue when using partition transform functions
  • Added unit test

Relates

#2686

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@malachi-constant

This comment was marked as outdated.

@malachi-constant

This comment was marked as outdated.

@malachi-constant

This comment was marked as outdated.

@LeonLuttenberger LeonLuttenberger marked this pull request as ready for review February 29, 2024 21:03
@malachi-constant

This comment was marked as outdated.

@malachi-constant

This comment was marked as outdated.

Copy link
Contributor

@kukushking kukushking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 9297197
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 9297197
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@LeonLuttenberger LeonLuttenberger merged commit e81fd99 into main Mar 1, 2024
19 checks passed
@LeonLuttenberger LeonLuttenberger deleted the 2686-iceberg-partitioning-based-on-transformed-dataframe-columns-not-supported branch March 1, 2024 17:03
sawyerh referenced this pull request in sawyerh/highlights Jun 1, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[aws-lambda-powertools](https://togithub.com/aws-powertools/powertools-lambda-python)
([changelog](https://togithub.com/aws-powertools/powertools-lambda-python/releases))
| `2.34.2` -> `2.38.1` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/aws-lambda-powertools/2.38.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/aws-lambda-powertools/2.38.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/aws-lambda-powertools/2.34.2/2.38.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/aws-lambda-powertools/2.34.2/2.38.1?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [awswrangler](https://aws-sdk-pandas.readthedocs.io/)
([source](https://togithub.com/aws/aws-sdk-pandas)) | `3.6.0` -> `3.7.3`
|
[![age](https://developer.mend.io/api/mc/badges/age/pypi/awswrangler/3.7.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/awswrangler/3.7.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/awswrangler/3.6.0/3.7.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/awswrangler/3.6.0/3.7.3?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [pandas](https://pandas.pydata.org)
([source](https://togithub.com/pandas-dev/pandas)) | `2.2.1` -> `2.2.2`
|
[![age](https://developer.mend.io/api/mc/badges/age/pypi/pandas/2.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/pandas/2.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/pandas/2.2.1/2.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/pandas/2.2.1/2.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [pytest-mock](https://togithub.com/pytest-dev/pytest-mock)
([changelog](https://pytest-mock.readthedocs.io/en/latest/changelog.html))
| `3.12.0` -> `3.14.0` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/pytest-mock/3.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/pytest-mock/3.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/pytest-mock/3.12.0/3.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/pytest-mock/3.12.0/3.14.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
| [tenacity](https://togithub.com/jd/tenacity) | `8.2.3` -> `8.3.0` |
[![age](https://developer.mend.io/api/mc/badges/age/pypi/tenacity/8.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/tenacity/8.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/tenacity/8.2.3/8.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/tenacity/8.2.3/8.3.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>aws-powertools/powertools-lambda-python
(aws-lambda-powertools)</summary>

###
[`v2.38.1`](https://togithub.com/aws-powertools/powertools-lambda-python/blob/HEAD/CHANGELOG.md#v2381---2024-05-17)

[Compare
Source](https://togithub.com/aws-powertools/powertools-lambda-python/compare/v2.38.0...v2.38.1)

#### \[v2.38.1] - 2024-05-17

###
[`v2.38.0`](https://togithub.com/aws-powertools/powertools-lambda-python/blob/HEAD/CHANGELOG.md#v2380---2024-05-17)

[Compare
Source](https://togithub.com/aws-powertools/powertools-lambda-python/compare/v2.37.0...v2.38.0)

#### \[v2.38.0] - 2024-05-17

###
[`v2.37.0`](https://togithub.com/aws-powertools/powertools-lambda-python/blob/HEAD/CHANGELOG.md#v2370---2024-04-18)

[Compare
Source](https://togithub.com/aws-powertools/powertools-lambda-python/compare/v2.36.0...v2.37.0)

#### \[v2.37.0] - 2024-04-18

###
[`v2.36.0`](https://togithub.com/aws-powertools/powertools-lambda-python/blob/HEAD/CHANGELOG.md#v2360---2024-03-27)

[Compare
Source](https://togithub.com/aws-powertools/powertools-lambda-python/compare/v2.35.1...v2.36.0)

#### \[v2.36.0] - 2024-03-27

###
[`v2.35.1`](https://togithub.com/aws-powertools/powertools-lambda-python/blob/HEAD/CHANGELOG.md#v2351---2024-03-08)

[Compare
Source](https://togithub.com/aws-powertools/powertools-lambda-python/compare/v2.35.0...v2.35.1)

#### \[v2.35.1] - 2024-03-08

###
[`v2.35.0`](https://togithub.com/aws-powertools/powertools-lambda-python/blob/HEAD/CHANGELOG.md#v2350---2024-03-06)

[Compare
Source](https://togithub.com/aws-powertools/powertools-lambda-python/compare/v2.34.2...v2.35.0)

#### \[v2.35.0] - 2024-03-06

</details>

<details>
<summary>aws/aws-sdk-pandas (awswrangler)</summary>

###
[`v3.7.3`](https://togithub.com/aws/aws-sdk-pandas/releases/tag/3.7.3):
AWS SDK for pandas 3.7.3

[Compare
Source](https://togithub.com/aws/aws-sdk-pandas/compare/3.7.2...3.7.3)

#### Bug fixes 🐛

- Iceberg schema evolution fails for map, array and struct types by
[@&#8203;LeonLuttenberger](https://togithub.com/LeonLuttenberger) in
[#&#8203;2755](https://togithub.com/aws/aws-sdk-pandas/issues/2755)
- trickle down `s3_output` in `athena.to_iceberg` by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[#&#8203;2767](https://togithub.com/aws/aws-sdk-pandas/issues/2767)
- respect order of columns in `to_iceberg` by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[#&#8203;2768](https://togithub.com/aws/aws-sdk-pandas/issues/2768)
- add PyArrow `fixed_size_binary` dtype support by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[#&#8203;2775](https://togithub.com/aws/aws-sdk-pandas/issues/2775)
- Opensearch serverless vector search collections - remove default `_id`
by [@&#8203;kukushking](https://togithub.com/kukushking) in
[#&#8203;2784](https://togithub.com/aws/aws-sdk-pandas/issues/2784)
- missing keys in `list_to_arrow_table` by
[@&#8203;kukushking](https://togithub.com/kukushking) in
[#&#8203;2778](https://togithub.com/aws/aws-sdk-pandas/issues/2778)
- prevent `athena.to_iceberg` overwrite to delete table in order to
preserve Iceberg transactions history by
[@&#8203;erwan-simon](https://togithub.com/erwan-simon) in
[#&#8203;2776](https://togithub.com/aws/aws-sdk-pandas/issues/2776)

#### Documentation 📚

- simplify README, remove AWS Glue for Ray references by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[#&#8203;2750](https://togithub.com/aws/aws-sdk-pandas/issues/2750)
- fix YAML formatting in Ray Remote tutorial by
[@&#8203;LeonLuttenberger](https://togithub.com/LeonLuttenberger) in
[#&#8203;2770](https://togithub.com/aws/aws-sdk-pandas/issues/2770)

#### Security Dependency Updates 🛡️

- bump `idna` from 3.6 to 3.7 by
[@&#8203;dependabot](https://togithub.com/dependabot) in
[#&#8203;2772](https://togithub.com/aws/aws-sdk-pandas/issues/2772)
- bump `aiohttp` from 3.9.3 to 3.9.4 by
[@&#8203;dependabot](https://togithub.com/dependabot) in
[#&#8203;2777](https://togithub.com/aws/aws-sdk-pandas/issues/2777)

#### New Contributors 👋

- [@&#8203;erwan-simon](https://togithub.com/erwan-simon) made their
first contribution in
[#&#8203;2776](https://togithub.com/aws/aws-sdk-pandas/issues/2776)

**Full Changelog**:
aws/aws-sdk-pandas@3.7.2...3.7.3

###
[`v3.7.2`](https://togithub.com/aws/aws-sdk-pandas/releases/tag/3.7.2):
AWS SDK for pandas 3.7.2

[Compare
Source](https://togithub.com/aws/aws-sdk-pandas/compare/3.7.1...3.7.2)

#### Features/Enhancements 🚀

- Add support for DeltaLake's DynamoDB lock mechanism by
[@&#8203;LeonLuttenberger](https://togithub.com/LeonLuttenberger) in
[#&#8203;2705](https://togithub.com/aws/aws-sdk-pandas/issues/2705)

#### Bug fixes 🐛

- `wr.athena.to_iceberg` - Insert query has mismatched column types
[#&#8203;2678](https://togithub.com/aws/aws-sdk-pandas/issues/2678) by
[@&#8203;GalvFionic](https://togithub.com/GalvFionic) in
[#&#8203;2715](https://togithub.com/aws/aws-sdk-pandas/issues/2715)
- allow `s3_output` in `athena.to_iceberg` by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[#&#8203;2727](https://togithub.com/aws/aws-sdk-pandas/issues/2727)
- replace deprecated `np.split_array` by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[#&#8203;2735](https://togithub.com/aws/aws-sdk-pandas/issues/2735)
- Athena `to_iceberg` fails with non-lowercase column names by
[@&#8203;LeonLuttenberger](https://togithub.com/LeonLuttenberger) in
[#&#8203;2736](https://togithub.com/aws/aws-sdk-pandas/issues/2736)
- Support Ray 2.10 by
[@&#8203;kukushking](https://togithub.com/kukushking) in
[#&#8203;2741](https://togithub.com/aws/aws-sdk-pandas/issues/2741)

#### New Contributors

- [@&#8203;GalvFionic](https://togithub.com/GalvFionic) made their first
contribution in
[#&#8203;2715](https://togithub.com/aws/aws-sdk-pandas/issues/2715)

**Full Changelog**:
aws/aws-sdk-pandas@3.7.1...3.7.2

###
[`v3.7.1`](https://togithub.com/aws/aws-sdk-pandas/releases/tag/3.7.1):
AWS SDK for pandas 3.7.1

[Compare
Source](https://togithub.com/aws/aws-sdk-pandas/compare/3.7.0...3.7.1)

#### Bug fixes 🐛

- fix breaking change in `_create_table` by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[https://github.com/aws/aws-sdk-pandas/pull/2711](https://togithub.com/aws/aws-sdk-pandas/pull/2711)
- pin pyarrow to version 8 and above by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[https://github.com/aws/aws-sdk-pandas/pull/2709](https://togithub.com/aws/aws-sdk-pandas/pull/2709)

#### Documentation 📚

- fix `redshift.to_sql` doc indentation error by
[@&#8203;LeonLuttenberger](https://togithub.com/LeonLuttenberger) in
[https://github.com/aws/aws-sdk-pandas/pull/2706](https://togithub.com/aws/aws-sdk-pandas/pull/2706)

**Full Changelog**:
aws/aws-sdk-pandas@3.7.0...3.7.1

###
[`v3.7.0`](https://togithub.com/aws/aws-sdk-pandas/releases/tag/3.7.0):
AWS SDK for pandas 3.7.0

[Compare
Source](https://togithub.com/aws/aws-sdk-pandas/compare/3.6.0...3.7.0)

#### Breaking changes 💥

Lake Formation Governed tables are being phased out and we are dropping
support
([#&#8203;2692](https://togithub.com/aws/aws-sdk-pandas/issues/2692)).

#### Features/Enhancements 🚀

- support parquet client encryption
([#&#8203;2642](https://togithub.com/aws/aws-sdk-pandas/issues/2642)) by
[@&#8203;Marwen94](https://togithub.com/Marwen94) in
[https://github.com/aws/aws-sdk-pandas/pull/2674](https://togithub.com/aws/aws-sdk-pandas/pull/2674)

#### Bug fixes 🐛

- Index columns removed on s3.to_parquet by
[@&#8203;robert-schmidtke](https://togithub.com/robert-schmidtke) in
[https://github.com/aws/aws-sdk-pandas/pull/2655](https://togithub.com/aws/aws-sdk-pandas/pull/2655)
- Missing timezone metadata by
[@&#8203;kukushking](https://togithub.com/kukushking) in
[https://github.com/aws/aws-sdk-pandas/pull/2682](https://togithub.com/aws/aws-sdk-pandas/pull/2682)
- remove enforced openpyxl engine constraint by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[https://github.com/aws/aws-sdk-pandas/pull/2696](https://togithub.com/aws/aws-sdk-pandas/pull/2696)
- Iceberg partitioning not working with partition transform functions by
[@&#8203;LeonLuttenberger](https://togithub.com/LeonLuttenberger) in
[https://github.com/aws/aws-sdk-pandas/pull/2694](https://togithub.com/aws/aws-sdk-pandas/pull/2694)
- remove awswrangler README from `site-packages` folder by
[@&#8203;AlJohri](https://togithub.com/AlJohri) in
[https://github.com/aws/aws-sdk-pandas/pull/2698](https://togithub.com/aws/aws-sdk-pandas/pull/2698)
- indent categories in pyarrow_additional_kwargs correctly by
[@&#8203;jaidisido](https://togithub.com/jaidisido) in
[https://github.com/aws/aws-sdk-pandas/pull/2701](https://togithub.com/aws/aws-sdk-pandas/pull/2701)

#### New Contributors

- [@&#8203;Marwen94](https://togithub.com/Marwen94) made their first
contribution in
[https://github.com/aws/aws-sdk-pandas/pull/2674](https://togithub.com/aws/aws-sdk-pandas/pull/2674)
- [@&#8203;AlJohri](https://togithub.com/AlJohri) made their first
contribution in
[https://github.com/aws/aws-sdk-pandas/pull/2698](https://togithub.com/aws/aws-sdk-pandas/pull/2698)

**Full Changelog**:
aws/aws-sdk-pandas@3.6.0...3.7.0

</details>

<details>
<summary>pandas-dev/pandas (pandas)</summary>

###
[`v2.2.2`](https://togithub.com/pandas-dev/pandas/compare/v2.2.1...v2.2.2)

[Compare
Source](https://togithub.com/pandas-dev/pandas/compare/v2.2.1...v2.2.2)

</details>

<details>
<summary>pytest-dev/pytest-mock (pytest-mock)</summary>

###
[`v3.14.0`](https://togithub.com/pytest-dev/pytest-mock/blob/HEAD/CHANGELOG.rst#3140-2024-03-21)

[Compare
Source](https://togithub.com/pytest-dev/pytest-mock/compare/v3.13.0...v3.14.0)

- `#&#8203;415 <https://github.com/pytest-dev/pytest-mock/pull/415>`\_:
`MockType` and `AsyncMockType` can be imported from `pytest_mock` for
type annotation purposes.

- `#&#8203;420
<https://github.com/pytest-dev/pytest-mock/issues/420>`\_: Fixed a
regression which would cause `mocker.patch.object` to not being properly
cleared between tests.

###
[`v3.13.0`](https://togithub.com/pytest-dev/pytest-mock/blob/HEAD/CHANGELOG.rst#3130-2024-03-21)

[Compare
Source](https://togithub.com/pytest-dev/pytest-mock/compare/v3.12.0...v3.13.0)

- `#&#8203;417 <https://github.com/pytest-dev/pytest-mock/pull/417>`\_:
`spy` now has `spy_return_list`, which is a list containing all the
values returned by the spied function.
-   `pytest-mock` now requires `pytest>=6.2.5`.
- `#&#8203;410 <https://github.com/pytest-dev/pytest-mock/pull/410>`*:
pytest-mock's `setup.py` file is removed.
If you relied on this file, e.g. to install pytest using `setup.py
install`,
please see `Why you shouldn't invoke setup.py directly
<https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html#summary>`*
for alternatives.

</details>

<details>
<summary>jd/tenacity (tenacity)</summary>

### [`v8.3.0`](https://togithub.com/jd/tenacity/releases/tag/8.3.0)

[Compare Source](https://togithub.com/jd/tenacity/compare/8.2.3...8.3.0)

### New Features

- Added a new stop function: `stop_before_delay`, which will stop
execution if the next sleep time would cause overall delay to exceed the
specified delay. Useful for use cases where you have some upper bound on
retry times that you must not exceed, so returning before that timeout
is preferable than returning after that timeout.

### Bug Fixes

-   Preserve **defaults** and **kwdefaults** through retry decorator

### Other Notes

-   Add a "test" extra

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the
month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get
[config help](https://togithub.com/renovatebot/renovate/discussions) if
that's undesired.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/sawyerh/highlights).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNjkuMiIsInVwZGF0ZWRJblZlciI6IjM3LjM2My41IiwidGFyZ2V0QnJhbmNoIjoibWFpbiJ9-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iceberg partitioning based on transformed DataFrame columns not supported?
4 participants