Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty container creation #173

Merged

Conversation

FrancoisWagner
Copy link
Contributor

OCF writer lazily writes header on first write instead of file open changed the container writer behaviour to write the header lazily on first write. Consequently, it has prevented the possibility of creating valid empty containers although the documentation of the constructor still shows:

//  A schema string must be passed to ensure that a correct header is written even if no records are written. This
//  is required to produce valid empty Avro container files.

Prior fix:

avro getschema /tmp/testv10.avro
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.io.IOException: Not a data file.
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102)
        at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
        at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47)
        at org.apache.avro.tool.Main.run(Main.java:87)
        at org.apache.avro.tool.Main.main(Main.java:76)
Caused by: java.io.EOFException
        at org.apache.avro.io.BinaryDecoder$InputStreamByteSource.readRaw(BinaryDecoder.java:827)
        at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:349)
        at org.apache.avro.io.BinaryDecoder.readFixed(BinaryDecoder.java:302)
        at org.apache.avro.io.Decoder.readFixed(Decoder.java:150)
        at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:100)
        ... 4 more

With fix:

avro getschema /tmp/testv10.avro
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
{
  "type" : "record",
  "name" : "DummyRecord",
  "fields" : [ {
    "name" : "id",
    "type" : [ "null", "long" ],
    "default" : null
  }, {
    "name" : "name",
    "type" : [ "null", "string" ],
    "default" : null
  } ]
}

@FrancoisWagner
Copy link
Contributor Author

@actgardner I couldn't get hold on the Writer's unit tests location. Let me know if you want me to add a unit test somewhere.

@actgardner
Copy link
Owner

This LGTM @FrancoisWagner, I'll take a look at adding tests in the future. Thanks!

@actgardner actgardner merged commit 14aa90d into actgardner:master Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants