Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix loading issues with dehydrated load #121

Merged
merged 3 commits into from
Jun 13, 2022
Merged

Conversation

lossyrob
Copy link
Member

@lossyrob lossyrob commented Jun 3, 2022

This moves away from using csv to read the dehydrated, direct table output and fixes an issue with loading quoted values.

The CSV reader has a field size limit that can be reached with large items, producing:

_csv.Error: field larger than field limit (131072)

This can be worked around by setting a larger value through a method like:

def set_csv_field_size_limit() -> None:
    """Set the field size limit to the maximum allowed.
    Otherwise csv can raise _csv.Error: field larger than field limit (131072)
    """
    field_size_limit = sys.maxsize
    while True:
        try:
            csv.field_size_limit(field_size_limit)
            break
        except OverflowError:
            field_size_limit = int(field_size_limit / 10)

However, the fact we use the CSV reader unquoted tab-delimited also seems sus - what happens if an Item description has a tab in it? This PR contains changes that moves away from using the csv reader and ensures this wouldn't be an issue.

Also, loading a dumped item table from the PC found another issue - items with escape quoted values (e.g. WKT2) were causing errors in loading. This replaces the troublesome quote character on reading in the table data.

@lossyrob lossyrob force-pushed the fix/rde/csv-field-length branch 2 times, most recently from 94b7e85 to 74e6f41 Compare June 3, 2022 02:29
@lossyrob lossyrob changed the title Fix issue with csv field size limit for loading dehydrated items Fix loading issues with dehydrated load Jun 3, 2022
@lossyrob lossyrob force-pushed the fix/rde/csv-field-length branch from 0215dd1 to d504c09 Compare June 3, 2022 03:38
lossyrob added 2 commits June 2, 2022 23:39
This fixes a failure to load JSON with quoted
values in the projection information. The added
test fails without these changes. Moves away from using
the csv package and instead splits on tabs while guarding
against the case where the Item content contains tab
characters.
Copy link
Collaborator

@bitner bitner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we need to add the test data for the pypgstac tests to get CI passing. I't bombing on loading chloris.json as a collection. Once the tests are passing I am 👍 to merge.

@lossyrob
Copy link
Member Author

Whoops - just added.

@bitner bitner merged commit 4fd02b4 into main Jun 13, 2022
@lossyrob lossyrob deleted the fix/rde/csv-field-length branch June 13, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants