-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A system test for load_table_from_datarame() consistently fails on master branch #61
Comments
@shollyman Have there been any changes on the backend in the last week or so? The system tests passed fine when #58 was initially submitted, but now fail on the latest Specifically, have there been any changes to handling the TIMESTAMP_MICROS logical type? |
Let's grab the avro file that's created indirectly from the test, use that to file an issue with the backend team to aid them reproducing it: https://issuetracker.google.com/issues/new?component=187149&template=0. The kokoro log: However, this is the value for the min allowed datetime, the query engine agrees: Peter, can you take care of this? |
OK, I'll intercept the file the test generates and submits to the backend, and open a ticket in the issuetracker. Edit: The ticket - https://issuetracker.google.com/issues/151765076 |
This is interesting. When exploring the parquet file that is uploaded to the backend (the one attached to the issuetracker issue), I noticed the following: >>> import fastparquet as fp
>>> filename = "/path/to/iss_61.parquet"
>>> pfile = fp.ParquetFile(filename)
>>> pfile.to_pandas()
dt_col
0 1754-08-30 22:43:41.128654848
1 NaT
2 1816-03-30 05:56:08.066276376 The timestamps are incorrect, they should be Update: >>> import pyarrow.parquet
>>> filename = "/path/to/iss_61.parquet"
>>> pfile_pyarrow = pyarrow.parquet.ParquetFile(filename)
>>> pyarrow_table = pfile_pyarrow.read()
>>> pyarrow_table.to_pydict()
{'dt_col': [datetime.datetime(1, 1, 1, 0, 0),
None,
datetime.datetime(9999, 12, 31, 23, 59, 59, 999999)]} However, converting >>> pyarrow_table.to_pandas()
dt_col
0 1754-08-30 22:43:41.128654848
1 NaT
2 1816-03-30 05:56:08.066276376 Pyarrow docs mention that "it is not possible to convert all column types unmodified". I don't know how the parquet file is read on the backend, but if Pandas is used, circumventing it might be the solution. |
@plamut The weird values from Arrow to Pandas are because of https://issues.apache.org/jira/browse/ARROW-5359 (it seems like this is incorrect behavior in the short term, since I think by default this should raise an error). |
@emkornfield I see, thanks for the info. This might spare some debugging time on our end. |
https://issues.apache.org/jira/browse/ARROW-5359 is marked as "Fixed" now. The internal ticket https://issuetracker.google.com/issues/151765076 has not been addressed. |
https://issues.apache.org/jira/plugins/servlet/mobile#issue/ARROW-9768 might also be related |
@emkornfield https://jira.apache.org/jira/browse/ARROW-2587 is also marked as "Fixed" now. any update on internal ticket because the test is still failing. |
Internal issue 166476249 covers loading DATETIME in Parquet files (#56) @HemangChothani sounds like we should update this system test to avoid DATETIME columns until the backend can support them. |
Are we still blocked on this? |
I check again today from the client's perspective (but don't have insight into status on the backend). Update: What's the priority of this on the backend, anyway? P1 or lower than that? (to align this ticket's priority with it) |
To work around this, we added CSV as a serialization format. But, yes we are blocked on the backend, as they don't support DATETIME for Parquet yet. Issue 166476249 is marked as P1, but no one has touched it yet, so I suspect it's being treated as lower priority. |
Closing as a duplicate of #56 |
A system test
test_load_table_from_dataframe_w_explicit_schema()
consistently fails on the latestmaster
branch, both under Python 2.7 and Python 3.8 (example Kokoro run). It is also consistently reproducible locally.Ticket in the Google issue tracker: https://issuetracker.google.com/issues/151765076
The text was updated successfully, but these errors were encountered: