Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp R package to be compatible with most recent version of tensorflow-io #381

Closed
13 tasks
terrytangyuan opened this issue Jul 26, 2019 · 9 comments
Closed
13 tasks
Assignees

Comments

@terrytangyuan
Copy link
Member

terrytangyuan commented Jul 26, 2019

It looks the R package is pretty out-of-date. Here are some additional data sources that we support and should be added in R as well:

  • tensorflow_io.bigquery: Google Cloud BigQuery support.
  • tensorflow_io.text: Pcap network packet capture file support, Text file Dataset and TextSequence output.
  • tensorflow_io.azure: Microsoft Azure Storage support.
  • tensorflow_io.gcs: GCS Configuration support.
  • tensorflow_io.prometheus: Prometheus observation data support.
  • tensorflow_io.avro: Apache Avro Dataset.
  • tensorflow_io.audio: WAV file Dataset.
  • tensorflow_io.grpc: gRPC server Dataset, support for streaming Numpy input.
  • tensorflow_io.hdf5: HDF5 file Dataset.
  • Improved batching support for many Datasets, see Add batch support for dataset at the creation #191
  • tensorflow_io.kafka: Kafka Output support.
  • tensorflow_io.cifar: CIFAR file format support.
  • tensorflow_io.bigtable: Google Cloud Bigtable support.

@yongtang @BryanCutler and others please add if I miss anything here.

@yongtang
Copy link
Member

I think maybe we could also use this as a chance to clean up or even rethink the API, at least for Dataset part. There are quite some discussions with respect to caching and performance, I think we could add those to our thinking into API as well?

@yongtang
Copy link
Member

@terrytangyuan @BryanCutler Some API discussions. With the upcoming 2.0 I think we could thinking about clean up APIs. I am in favor of removing unnecessarily user input when possible. However, due to the graph vs eager mismatch between 1.x vs. 2.0, it is hard to make an API that fits 1.x and 2.0 at the same time.

For example, in PR #384 ParquetDataset, it is possible in 2.0 we only need user to provide filename and column field, and internally find out all the needed information. In 1.x however, user has to provide filename, column, dtype, shape fields.

I am thinking about provide a default API for 2.0 (and use **kwargs to allow user provide 1.x needed APIs.

class ParquetDataset(tf.compat.v2.data.Dataset)

    def __init__(self, filename, column, **kwargs):
      if tf.executing_eagerly():
        # In eager mode we find out dtype, shape,
        ...
     else:
        # In graph mode user has to provide Dtype and shape through kwargs:
       dtype = kwargs["dtype"]
       shape = kwargs["shape"]

Also, the batch_size and batch_mode probably fit into rebatch() for "last step before provided to tf.keras" (see #382 (comment)), so we could drop them from Dataset's __init__, but this could be a separate discussion.

@terrytangyuan
Copy link
Member Author

@yongtang Thanks for bringing this up. I am fine with the use of **kwargs to support multiple versions of TF. This would simplify the deprecating process once we decide to drop 1.x support.

@terrytangyuan
Copy link
Member Author

It would be easier to support this in R as well through ... syntax in R. Though we may need to add better documentation on what's supported and what's not.

@yongtang
Copy link
Member

yongtang commented Jul 31, 2019

@terrytangyuan I am also thinking about deprecate cifar format. Unlikely MNIST which is used by many datasets and many people generates their own MNIST. The CIFAR format itself is not that often seen. It probably makes sense to deprecate it.

@BryanCutler
Copy link
Member

kwargs sound fine to me to support graph mode options

@terrytangyuan
Copy link
Member Author

@yongtang Yea deprecating is fine to me. I don’t see it quite often either.

@terrytangyuan terrytangyuan self-assigned this Dec 19, 2019
@terrytangyuan
Copy link
Member Author

@yongtang A lot of changes have been landed in Python API already. Let's outline what needs to be updated in the R package here and we'll re-vamp everything altogether.

@terrytangyuan terrytangyuan changed the title Add R wrappers for the recently supported data sources Revamp R package to be compatible with most recent version of tensorflow-io Dec 19, 2019
@terrytangyuan
Copy link
Member Author

This was done in #886 but I'll revisit to make sure everything is updated and works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants