Skip to content

Commit

Permalink
Support nanosecond timestamps in parquet (NVIDIA#10063)
Browse files Browse the repository at this point in the history
Closes rapidsai/cudf#9393

This PR was intended to support nanoseconds for both duration and timestamp types in parquet. It introduces `LogicalType`-handling on both reader and writer sides. This PR also includes code cleanups like moving `CompactProtocolReader` to its own file. Finally, nanosecond durations remain unchanged since it's not fully supported by `pyarrow` i.e. nanosecond durations are truncated to microseconds (see [here](https://github.com/apache/arrow/blob/release-7.0.0/cpp/src/arrow/python/datetime.cc#L259-L264)).

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Devavret Makkar (https://github.com/devavret)
  - Karthikeyan (https://github.com/karthikeyann)

URL: rapidsai/cudf#10063
  • Loading branch information
PointKernel authored Mar 21, 2022
1 parent 21ed251 commit 40baeb4
Show file tree
Hide file tree
Showing 17 changed files with 762 additions and 562 deletions.
2 changes: 1 addition & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -300,12 +300,12 @@ add_library(
src/io/orc/stripe_init.cu
src/io/orc/timezone.cpp
src/io/orc/writer_impl.cu
src/io/parquet/compact_protocol_reader.cpp
src/io/parquet/compact_protocol_writer.cpp
src/io/parquet/page_data.cu
src/io/parquet/chunk_dict.cu
src/io/parquet/page_enc.cu
src/io/parquet/page_hdr.cu
src/io/parquet/parquet.cpp
src/io/parquet/reader_impl.cu
src/io/parquet/writer_impl.cu
src/io/statistics/orc_column_statistics.cu
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2018-2020, NVIDIA CORPORATION.
* Copyright (c) 2018-2022, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -14,8 +14,11 @@
* limitations under the License.
*/

#include "parquet.hpp"
#include "compact_protocol_reader.hpp"

#include <algorithm>
#include <cstddef>
#include <tuple>

namespace cudf {
namespace io {
Expand Down Expand Up @@ -198,7 +201,8 @@ bool CompactProtocolReader::read(TimestampType* t)
bool CompactProtocolReader::read(TimeUnit* u)
{
auto op = std::make_tuple(ParquetFieldUnion(1, u->isset.MILLIS, u->MILLIS),
ParquetFieldUnion(2, u->isset.MICROS, u->MICROS));
ParquetFieldUnion(2, u->isset.MICROS, u->MICROS),
ParquetFieldUnion(3, u->isset.NANOS, u->NANOS));
return function_builder(this, op);
}

Expand Down
Loading

0 comments on commit 40baeb4

Please sign in to comment.