Skip to content

Commit

Permalink
Add read_avro and list_avro_columns for rework on Splittable Avro sup…
Browse files Browse the repository at this point in the history
…port (#399)

This PR is part of the effort to rework on Dataset with large files reading into Tensors first to speed up performance. See 382 and 366 for related discussions.

Summary:

1) read_avro is able to read a avro file within the range of [offset, offset+length] (Splittable)
2) we use primitive read_avro C++ ops to read in big chunks and then wire up with tf.data.Dataset
3) read_avro could be used in other places.
4) AvroDataset automatically find out the dtype in eager mode, in graph mode, user has
   to specify the dtype in kwargs.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
  • Loading branch information
yongtang authored Aug 4, 2019
1 parent a8506f6 commit 77ee1da
Show file tree
Hide file tree
Showing 8 changed files with 467 additions and 362 deletions.
2 changes: 1 addition & 1 deletion tensorflow_io/avro/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ load(
cc_library(
name = "avro_ops",
srcs = [
"kernels/avro_input.cc",
"kernels/avro_kernels.cc",
"ops/avro_ops.cc",
],
copts = tf_io_copts(),
Expand Down
6 changes: 6 additions & 0 deletions tensorflow_io/avro/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,24 @@
"""Avro Dataset.
@@AvroDataset
@@list_avro_columns
@@read_avro
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow_io.avro.python.ops.avro_ops import AvroDataset
from tensorflow_io.avro.python.ops.avro_ops import list_avro_columns
from tensorflow_io.avro.python.ops.avro_ops import read_avro

from tensorflow.python.util.all_util import remove_undocumented

_allowed_symbols = [
"AvroDataset",
"list_avro_columns",
"read_avro_",
]

remove_undocumented(__name__, allowed_exception_list=_allowed_symbols)
248 changes: 0 additions & 248 deletions tensorflow_io/avro/kernels/avro_input.cc

This file was deleted.

Loading

0 comments on commit 77ee1da

Please sign in to comment.