Parquet

Version 1.11.0

Release Notes - Parquet - Version 1.11.0

Bug

PARQUET-138 - Parquet should allow a merge between required and optional schemas
PARQUET-952 - Avro union with single type fails with 'is not a group'
PARQUET-1128 - [Java] Upgrade the Apache Arrow version to 0.8.0 for SchemaConverter
PARQUET-1281 - Jackson dependency
PARQUET-1285 - [Java] SchemaConverter should not convert from TimeUnit.SECOND AND TimeUnit.NANOSECOND of Arrow
PARQUET-1293 - Build failure when using Java 8 lambda expressions
PARQUET-1296 - Travis kills build after 10 minutes, because "no output was received"
PARQUET-1297 - [Java] SchemaConverter should not convert from Timestamp(TimeUnit.SECOND) and Timestamp(TimeUnit.NANOSECOND) of Arrow
PARQUET-1303 - Avro reflect @Stringable field write error if field not instanceof CharSequence
PARQUET-1304 - Release 1.10 contains breaking changes for Hive
PARQUET-1305 - Backward incompatible change introduced in 1.8
PARQUET-1309 - Parquet Java uses incorrect stats and dictionary filter properties
PARQUET-1311 - Update README.md
PARQUET-1317 - ParquetMetadataConverter throw NPE
PARQUET-1341 - Null count is suppressed when columns have no min or max and use unsigned sort order
PARQUET-1344 - Type builders don't honor new logical types
PARQUET-1368 - ParquetFileReader should close its input stream for the failure in constructor
PARQUET-1371 - Time/Timestamp UTC normalization parameter doesn't work
PARQUET-1407 - Data loss on duplicate values with AvroParquetWriter/Reader
PARQUET-1417 - BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with the different length
PARQUET-1421 - InternalParquetRecordWriter logs debug messages at the INFO level
PARQUET-1440 - Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
PARQUET-1441 - SchemaParseException: Can't redefine: list in AvroIndexedRecordConverter
PARQUET-1456 - Use page index, ParquetFileReader throw ArrayIndexOutOfBoundsException
PARQUET-1460 - Fix javadoc errors and include javadoc checking in Travis checks
PARQUET-1461 - Third party code does not compile after parquet-mr minor version update
PARQUET-1470 - Inputstream leakage in ParquetFileWriter.appendFile
PARQUET-1472 - Dictionary filter fails on FIXED_LEN_BYTE_ARRAY
PARQUET-1475 - DirectCodecFactory's ParquetCompressionCodecException drops a passed in cause in one constructor
PARQUET-1478 - Can't read spec compliant, 3-level lists via parquet-proto
PARQUET-1480 - INT96 to avro not yet implemented error should mention deprecation
PARQUET-1485 - Snappy Decompressor/Compressor may cause direct memory leak
PARQUET-1488 - UserDefinedPredicate throw NPE
PARQUET-1496 - [Java] Update Scala for JDK 11 compatibility
PARQUET-1497 - [Java] javax annotations dependency missing for Java 11
PARQUET-1498 - [Java] Add instructions to install thrift via homebrew
PARQUET-1510 - Dictionary filter skips null values when evaluating not-equals.
PARQUET-1514 - ParquetFileWriter Records Compressed Bytes instead of Uncompressed Bytes
PARQUET-1527 - [parquet-tools] cat command throw java.lang.ClassCastException
PARQUET-1529 - Shade fastutil in all modules where used
PARQUET-1531 - Page row count limit causes empty pages to be written from MessageColumnIO
PARQUET-1533 - TestSnappy() throws OOM exception with Parquet-1485 change
PARQUET-1534 - [parquet-cli] Argument error: Illegal character in opaque part at index 2 on Windows
PARQUET-1544 - Possible over-shading of modules
PARQUET-1550 - CleanUtil does not work in Java 11
PARQUET-1555 - Bump snappy-java to 1.1.7.3
PARQUET-1596 - PARQUET-1375 broke parquet-cli's to-avro command
PARQUET-1600 - Fix shebang in parquet-benchmarks/run.sh
PARQUET-1615 - getRecordWriter shouldn't hardcode CREAT mode when new ParquetFileWriter
PARQUET-1637 - Builds are failing because default jdk changed to openjdk11 on Travis
PARQUET-1644 - Clean up some benchmark code and docs.
PARQUET-1691 - Build fails due to missing hadoop-lzo

New Feature

PARQUET-1201 - Column indexes
PARQUET-1253 - Support for new logical type representation
PARQUET-1388 - Nanosecond precision time and timestamp - parquet-mr

Improvement

PARQUET-1135 - upgrade thrift and protobuf dependencies
PARQUET-1280 - [parquet-protobuf] Use maven protoc plugin
PARQUET-1321 - LogicalTypeAnnotation.LogicalTypeAnnotationVisitor#visit methods should have a return value
PARQUET-1335 - Logical type names in parquet-mr are not consistent with parquet-format
PARQUET-1336 - PrimitiveComparator should implements Serializable
PARQUET-1365 - Don't write page level statistics
PARQUET-1375 - Upgrade to supported version of Jackson
PARQUET-1383 - Parquet tools should indicate UTC parameter for time/timestamp types
PARQUET-1390 - [Java] Upgrade to Arrow 0.10.0
PARQUET-1399 - Move parquet-mr related code from parquet-format
PARQUET-1410 - Refactor modules to use the new logical type API
PARQUET-1414 - Limit page size based on maximum row count
PARQUET-1418 - Run integration tests in Travis
PARQUET-1435 - Benchmark filtering column-indexes
PARQUET-1444 - Prefer ArrayList over LinkedList
PARQUET-1445 - Remove Files.java
PARQUET-1462 - Allow specifying new development version in prepare-release.sh
PARQUET-1466 - Upgrade to the latest guava 27.0-jre
PARQUET-1474 - Less verbose and lower level logging for missing column/offset indexes
PARQUET-1476 - Don't emit a warning message for files without new logical type
PARQUET-1487 - Do not write original type for timezone-agnostic timestamps
PARQUET-1489 - Insufficient documentation for UserDefinedPredicate.keep(T)
PARQUET-1490 - Add branch-specific Travis steps
PARQUET-1492 - Remove protobuf install in travis build
PARQUET-1499 - [parquet-mr] Add Java 11 to Travis
PARQUET-1500 - Remove the Closables
PARQUET-1502 - Convert FIXED_LEN_BYTE_ARRAY to arrow type in logicalTypeAnnotation if it is not null
PARQUET-1503 - Remove Ints Utility Class
PARQUET-1504 - Add an option to convert Parquet Int96 to Arrow Timestamp
PARQUET-1505 - Use Java 7 NIO StandardCharsets
PARQUET-1506 - Migrate from maven-thrift-plugin to thrift-maven-plugin
PARQUET-1507 - Bump Apache Thrift to 0.12.0
PARQUET-1509 - Update Docs for Hive Deprecation
PARQUET-1513 - HiddenFileFilter Streamline
PARQUET-1518 - Bump Jackson2 version of parquet-cli
PARQUET-1530 - Remove Dependency on commons-codec
PARQUET-1542 - Merge multiple I/O to one time I/O when read footer
PARQUET-1557 - Replace deprecated Apache Avro methods
PARQUET-1558 - Use try-with-resource in Apache Avro tests
PARQUET-1576 - Upgrade to Avro 1.9.0
PARQUET-1577 - Remove duplicate license
PARQUET-1578 - Introduce Lambdas
PARQUET-1579 - Add Github PR template
PARQUET-1580 - Page-level CRC checksum verification for DataPageV1
PARQUET-1601 - Add zstd support to parquet-cli to-avro
PARQUET-1604 - Bump fastutil from 7.0.13 to 8.2.3
PARQUET-1605 - Bump maven-javadoc-plugin from 2.9 to 3.1.0
PARQUET-1606 - Fix invalid tests scope
PARQUET-1607 - Remove duplicate maven-enforcer-plugin
PARQUET-1616 - Enable Maven batch mode
PARQUET-1650 - Implement unit test to validate column/offset indexes
PARQUET-1654 - Remove unnecessary options when building thrift
PARQUET-1661 - Upgrade to Avro 1.9.1
PARQUET-1662 - Upgrade Jackson to version 2.9.10
PARQUET-1665 - Upgrade zstd-jni to 1.4.0-1
PARQUET-1669 - Disable compiling all libraries when building thrift
PARQUET-1671 - Upgrade Yetus to 0.11.0
PARQUET-1682 - Maintain forward compatibility for TIME/TIMESTAMP
PARQUET-1683 - Remove unnecessary string converting in readFooter method
PARQUET-1685 - Truncate the stored min and max for String statistics to reduce the footer size

Test

PARQUET-1536 - [parquet-cli] Add simple tests for each command

Wish

PARQUET-1552 - upgrade protoc-jar-maven-plugin to 3.8.0
PARQUET-1673 - Upgrade parquet-mr format version to 2.7.0

Task

PARQUET-968 - Add Hive/Presto support in ProtoParquet
PARQUET-1294 - Update release scripts for the new Apache policy
PARQUET-1434 - Release parquet-mr 1.11.0
PARQUET-1436 - TimestampMicrosStringifier shows wrong microseconds for timestamps before 1970
PARQUET-1452 - Deprecate old logical types API
PARQUET-1551 - Support Java 11 - top-level JIRA
PARQUET-1570 - Publish 1.11.0 to maven central
PARQUET-1585 - Update old external links in the code base
PARQUET-1645 - Bump Apache Avro to 1.9.1
PARQUET-1649 - Bump Jackson Databind to 2.9.9.3
PARQUET-1687 - Update release process

Version 1.10.1

Release Notes - Parquet - Version 1.10.1

Bug

PARQUET-1510 - Dictionary filter skips null values when evaluating not-equals.
PARQUET-1309 - Parquet Java uses incorrect stats and dictionary filter properties

Version 1.10.0

Release Notes - Parquet - Version 1.10.0

Bug

PARQUET-196 - parquet-tools command to get rowcount & size
PARQUET-357 - Parquet-thrift generates wrong schema for Thrift binary fields
PARQUET-765 - Upgrade Avro to 1.8.1
PARQUET-783 - H2SeekableInputStream does not close its underlying FSDataInputStream, leading to connection leaks
PARQUET-786 - parquet-tools README incorrectly has 'java jar' instead of 'java -jar'
PARQUET-791 - Predicate pushing down on missing columns should work on UserDefinedPredicate too
PARQUET-1005 - Fix DumpCommand parsing to allow column projection
PARQUET-1028 - [JAVA] When reading old Spark-generated files with INT96, stats are reported as valid when they aren't
PARQUET-1065 - Deprecate type-defined sort ordering for INT96 type
PARQUET-1077 - [MR] Switch to long key ids in KEYs file
PARQUET-1141 - IDs are dropped in metadata conversion
PARQUET-1152 - Parquet-thrift doesn't compile with Thrift 0.9.3
PARQUET-1153 - Parquet-thrift doesn't compile with Thrift 0.10.0
PARQUET-1156 - dev/merge_parquet_pr.py problems
PARQUET-1185 - TestBinary#testBinary unit test fails after PARQUET-1141
PARQUET-1191 - Type.hashCode() takes originalType into account but Type.equals() does not
PARQUET-1208 - Occasional endless loop in unit test
PARQUET-1217 - Incorrect handling of missing values in Statistics
PARQUET-1246 - Ignore float/double statistics in case of NaN
PARQUET-1258 - Update scm developer connection to github

New Feature

PARQUET-1025 - Support new min-max statistics in parquet-mr

Improvement

PARQUET-220 - Unnecessary warning in ParquetRecordReader.initialize
PARQUET-321 - Set the HDFS padding default to 8MB
PARQUET-386 - Printing out the statistics of metadata in parquet-tools
PARQUET-423 - Make writing Avro to Parquet less noisy
PARQUET-755 - create parquet-arrow module with schema converter
PARQUET-777 - Add new Parquet CLI tools
PARQUET-787 - Add a size limit for heap allocations when reading
PARQUET-801 - Allow UserDefinedPredicates in DictionaryFilter
PARQUET-852 - Slowly ramp up sizes of byte[] in ByteBasedBitPackingEncoder
PARQUET-884 - Add support for Decimal datatype to Parquet-Pig record reader
PARQUET-969 - Decimal datatype support for parquet-tools output
PARQUET-990 - More detailed error messages in footer parsing
PARQUET-1024 - allow for case insensitive parquet-xxx prefix in PR title
PARQUET-1026 - allow unsigned binary stats when min == max
PARQUET-1115 - Warn users when misusing parquet-tools merge
PARQUET-1135 - upgrade thrift and protobuf dependencies
PARQUET-1142 - Avoid leaking Hadoop API to downstream libraries
PARQUET-1149 - Upgrade Avro dependancy to 1.8.2
PARQUET-1170 - Logical-type-based toString for proper representeation in tools/logs
PARQUET-1183 - AvroParquetWriter needs OutputFile based Builder
PARQUET-1197 - Log rat failures
PARQUET-1198 - Bump java source and target to java8
PARQUET-1215 - Add accessor for footer after a file is closed
PARQUET-1263 - ParquetReader's builder should use Configuration from the InputFile

Task

PARQUET-768 - Add Uwe L. Korn to KEYS
PARQUET-1189 - Release Parquet Java 1.10

Version 1.9.0

Bug

PARQUET-182 - FilteredRecordReader skips rows it shouldn't for schema with optional columns
PARQUET-212 - Implement nested type read rules in parquet-thrift
PARQUET-241 - ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns
PARQUET-305 - Logger instantiated for package org.apache.parquet may be GC-ed
PARQUET-335 - Avro object model should not require MAP_KEY_VALUE
PARQUET-340 - totalMemoryPool is truncated to 32 bits
PARQUET-346 - ThriftSchemaConverter throws for unknown struct or union type
PARQUET-349 - VersionParser does not handle versions like "parquet-mr 1.6.0rc4"
PARQUET-352 - Add tags to "created by" metadata in the file footer
PARQUET-353 - Compressors not getting recycled while writing parquet files, causing memory leak
PARQUET-360 - parquet-cat json dump is broken for maps
PARQUET-363 - Cannot construct empty MessageType for ReadContext.requestedSchema
PARQUET-367 - "parquet-cat -j" doesn't show all records
PARQUET-372 - Parquet stats can have awkwardly large values
PARQUET-373 - MemoryManager tests are flaky
PARQUET-379 - PrimitiveType.union erases original type
PARQUET-380 - Cascading and scrooge builds fail when using thrift 0.9.0
PARQUET-385 - PrimitiveType.union accepts fixed_len_byte_array fields with different lengths when strict mode is on
PARQUET-387 - TwoLevelListWriter does not handle null values in array
PARQUET-389 - Filter predicates should work with missing columns
PARQUET-395 - System.out is used as logger in org.apache.parquet.Log
PARQUET-396 - The builder for AvroParquetReader loses the record type
PARQUET-400 - Error reading some files after PARQUET-77 bytebuffer read path
PARQUET-409 - InternalParquetRecordWriter doesn't use min/max row counts
PARQUET-410 - Fix subprocess hang in merge_parquet_pr.py
PARQUET-413 - Test failures for Java 8
PARQUET-415 - ByteBufferBackedBinary serialization is broken
PARQUET-422 - Fix a potential bug in MessageTypeParser where we ignore and overwrite the initial value of a method parameter
PARQUET-425 - Fix the bug when predicate contains columns not specified in prejection, to prevent filtering out data improperly
PARQUET-426 - Throw Exception when predicate contains columns not specified in prejection, to prevent filtering out data improperly
PARQUET-430 - Change to use Locale parameterized version of String.toUpperCase()/toLowerCase
PARQUET-431 - Make ParquetOutputFormat.memoryManager volatile
PARQUET-495 - Fix mismatches in Types class comments
PARQUET-509 - Incorrect number of args passed to string.format calls
PARQUET-511 - Integer overflow on counting values in column
PARQUET-528 - Fix flush() for RecordConsumer and implementations
PARQUET-529 - Avoid evoking job.toString() in ParquetLoader
PARQUET-540 - Cascading3 module doesn't build when using thrift 0.9.0
PARQUET-544 - ParquetWriter.close() throws NullPointerException on second call, improper implementation of Closeable contract
PARQUET-560 - Incorrect synchronization in SnappyCompressor
PARQUET-569 - ParquetMetadataConverter offset filter is broken
PARQUET-571 - Fix potential leak in ParquetFileReader.close()
PARQUET-580 - Potentially unnecessary creation of large int[] in IntList for columns that aren't used
PARQUET-581 - Min/max row count for page size check are conflated in some places
PARQUET-584 - show proper command usage when there's no arguments
PARQUET-612 - Add compression to FileEncodingIT tests
PARQUET-623 - DeltaByteArrayReader has incorrect skip behaviour
PARQUET-642 - Improve performance of ByteBuffer based read / write paths
PARQUET-645 - DictionaryFilter incorrectly handles null
PARQUET-651 - Parquet-avro fails to decode array of record with a single field name "element" correctly
PARQUET-660 - Writing Protobuf messages with extensions results in an error or data corruption.
PARQUET-663 - Link are Broken in README.md
PARQUET-674 - Add an abstraction to get the length of a stream
PARQUET-685 - Deprecated ParquetInputSplit constructor passes parameters in the wrong order.
PARQUET-726 - TestMemoryManager consistently fails
PARQUET-743 - DictionaryFilters can re-use StreamBytesInput when compressed

Improvement

PARQUET-77 - Improvements in ByteBuffer read path
PARQUET-99 - Large rows cause unnecessary OOM exceptions
PARQUET-146 - make Parquet compile with java 7 instead of java 6
PARQUET-318 - Remove unnecessary objectmapper from ParquetMetadata
PARQUET-327 - Show statistics in the dump output
PARQUET-341 - Improve write performance with wide schema sparse data
PARQUET-343 - Caching nulls on group node to improve write performance on wide schema sparse data
PARQUET-358 - Add support for temporal logical types to AVRO/Parquet conversion
PARQUET-361 - Add prerelease logic to semantic versions
PARQUET-384 - Add Dictionary Based Filtering to Filter2 API
PARQUET-386 - Printing out the statistics of metadata in parquet-tools
PARQUET-397 - Pig Predicate Pushdown using Filter2 API
PARQUET-421 - Fix mismatch of javadoc names and method parameters in module encoding, column, and hadoop
PARQUET-427 - Push predicates into the whole read path
PARQUET-432 - Complete a todo for method ColumnDescriptor.compareTo()
PARQUET-460 - Parquet files concat tool
PARQUET-480 - Update for Cascading 3.0
PARQUET-484 - Warn when Decimal is stored as INT64 while could be stored as INT32
PARQUET-543 - Remove BoundedInt encodings
PARQUET-585 - Slowly ramp up sizes of int[]s in IntList to keep sizes small when data sets are small
PARQUET-654 - Make record-level filtering optional
PARQUET-668 - Provide option to disable auto crop feature in DumpCommand output
PARQUET-727 - Ensure correct version of thrift is used
PARQUET-740 - Introduce editorconfig

New Feature

PARQUET-225 - INT64 support for Delta Encoding
PARQUET-382 - Add a way to append encoded blocks in ParquetFileWriter
PARQUET-429 - Enables predicates collecting their referred columns
PARQUET-548 - Add Java metadata for PageEncodingStats
PARQUET-669 - Allow reading file footers from input streams when writing metadata files

Task

PARQUET-392 - Release Parquet-mr 1.9.0
PARQUET-404 - Replace git@gh.neting.cc.apache for HTTPS URL on dev/README.md to avoid permission issues
PARQUET-696 - Move travis download from google code (defunct) to github

Test

PARQUET-355 - Create Integration tests to validate statistics
PARQUET-378 - Add thoroughly parquet test encodings

Version 1.8.1

Bug

PARQUET-331 - Merge script doesn't surface stderr from failed sub processes
PARQUET-336 - ArrayIndexOutOfBounds in checkDeltaByteArrayProblem
PARQUET-337 - binary fields inside map/set/list are not handled in parquet-scrooge
PARQUET-338 - Readme references wrong format of pull request title

Improvement

PARQUET-279 - Check empty struct in the CompatibilityChecker util

Task

PARQUET-339 - Add Alex Levenson to KEYS file

Version 1.8.0

Bug

PARQUET-151 - Null Pointer exception in parquet.hadoop.ParquetFileWriter.mergeFooters
PARQUET-152 - Encoding issue with fixed length byte arrays
PARQUET-164 - Warn when parquet memory manager kicks in
PARQUET-199 - Add a callback when the MemoryManager adjusts row group size
PARQUET-201 - Column with OriginalType INT_8 failed at filtering
PARQUET-227 - Parquet thrift can write unions that have 0 or more than 1 set value
PARQUET-246 - ArrayIndexOutOfBoundsException with Parquet write version v2
PARQUET-251 - Binary column statistics error when reuse byte[] among rows
PARQUET-252 - parquet scrooge support should support nested container type
PARQUET-254 - Wrong exception message for unsupported INT96 type
PARQUET-269 - Restore scrooge-maven-plugin to 3.17.0 or greater
PARQUET-284 - Should use ConcurrentHashMap instead of HashMap in ParquetMetadataConverter
PARQUET-285 - Implement nested types write rules in parquet-avro
PARQUET-287 - Projecting unions in thrift causes TExceptions in deserializatoin
PARQUET-296 - Set master branch version back to 1.8.0-SNAPSHOT
PARQUET-297 - created_by in file meta data doesn't contain parquet library version
PARQUET-314 - Fix broken equals implementation(s)
PARQUET-316 - Run.sh is broken in parquet-benchmarks
PARQUET-317 - writeMetaDataFile crashes when a relative root Path is used
PARQUET-320 - Restore semver checks
PARQUET-324 - row count incorrect if data file has more than 2^31 rows
PARQUET-325 - Do not target row group sizes if padding is set to 0
PARQUET-329 - ThriftReadSupport#THRIFT_COLUMN_FILTER_KEY was removed (incompatible change)

Improvement

PARQUET-175 - Allow setting of a custom protobuf class when reading parquet file using parquet-protobuf.
PARQUET-223 - Add Map and List builiders
PARQUET-245 - Travis CI runs tests even if build fails
PARQUET-248 - Simplify ParquetWriters's constructors
PARQUET-253 - AvroSchemaConverter has confusing Javadoc
PARQUET-259 - Support Travis CI in parquet-cpp
PARQUET-264 - Update README docs for graduation
PARQUET-266 - Add support for lists of primitives to Pig schema converter
PARQUET-272 - Updates docs decscription to match data model
PARQUET-274 - Updates URLs to link against the apache user instead of Parquet on github
PARQUET-276 - Updates CONTRIBUTING file with new repo info
PARQUET-286 - Avro object model should use Utf8
PARQUET-288 - Add dictionary support to Avro converters
PARQUET-289 - Allow object models to extend the ParquetReader builders
PARQUET-290 - Add Avro data model to the reader builder
PARQUET-306 - Improve alignment between row groups and HDFS blocks
PARQUET-308 - Add accessor to ParquetWriter to get current data size
PARQUET-309 - Remove unnecessary compile dependency on parquet-generator
PARQUET-321 - Set the HDFS padding default to 8MB
PARQUET-327 - Show statistics in the dump output

New Feature

PARQUET-229 - Make an alternate, stricter thrift column projection API
PARQUET-243 - Add avro-reflect support

Task

PARQUET-262 - When 1.7.0 is released, restore semver plugin config
PARQUET-292 - Release Parquet 1.8.0

Version 1.7.0

PARQUET-23 - Rename to org.apache.

Version 1.6.0

Bug

PARQUET-3 - tool to merge pull requests based on Spark
PARQUET-4 - Use LRU caching for footers in ParquetInputFormat.
PARQUET-8 - [parquet-scrooge] mvn eclipse:eclipse fails on parquet-scrooge
PARQUET-9 - InternalParquetRecordReader will not read multiple blocks when filtering
PARQUET-18 - Cannot read dictionary-encoded pages with all null values
PARQUET-19 - NPE when an empty file is included in a Hive query that uses CombineHiveInputFormat
PARQUET-21 - Fix reference to 'github-apache' in dev docs
PARQUET-56 - Added an accessor for the Long column type in example Group
PARQUET-62 - DictionaryValuesWriter dictionaries are corrupted by user changes.
PARQUET-63 - Fixed-length columns cannot be dictionary encoded.
PARQUET-66 - InternalParquetRecordWriter int overflow causes unnecessary memory check warning
PARQUET-69 - Add committer doc and REVIEWERS files
PARQUET-70 - PARQUET #36: Pig Schema Storage to UDFContext
PARQUET-75 - String decode using 'new String' is slow
PARQUET-80 - upgrade semver plugin version to 0.9.27
PARQUET-82 - ColumnChunkPageWriteStore assumes pages are smaller than Integer.MAX_VALUE
PARQUET-88 - Fix pre-version enforcement.
PARQUET-94 - ParquetScroogeScheme constructor ignores klass argument
PARQUET-96 - parquet.example.data.Group is missing some methods
PARQUET-97 - ProtoParquetReader builder factory method not static
PARQUET-101 - Exception when reading data with parquet.task.side.metadata=false
PARQUET-104 - Parquet writes empty Rowgroup at the end of the file
PARQUET-106 - Relax InputSplit Protections
PARQUET-107 - Add option to disable summary metadata aggregation after MR jobs
PARQUET-114 - Sample NanoTime class serializes and deserializes Timestamp incorrectly
PARQUET-122 - make parquet.task.side.metadata=true by default
PARQUET-124 - parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
PARQUET-132 - AvroParquetInputFormat should use a parameterized type
PARQUET-135 - Input location is not getting set for the getStatistics in ParquetLoader when using two different loaders within a Pig script.
PARQUET-136 - NPE thrown in StatisticsFilter when all values in a string/binary column trunk are null
PARQUET-142 - parquet-tools doesn't filter _SUCCESS file
PARQUET-145 - InternalParquetRecordReader.close() should not throw an exception if initialization has failed
PARQUET-150 - Merge script requires ':' in PR names
PARQUET-157 - Divide by zero in logging code
PARQUET-159 - paquet-hadoop tests fail to compile
PARQUET-162 - ParquetThrift should throw when unrecognized columns are passed to the column projection API
PARQUET-168 - Wrong command line option description in parquet-tools
PARQUET-173 - StatisticsFilter doesn't handle And properly
PARQUET-174 - Fix Java6 compatibility
PARQUET-176 - Parquet fails to parse schema contains '\r'
PARQUET-180 - Parquet-thrift compile issue with 0.9.2.
PARQUET-184 - Add release scripts and documentation
PARQUET-186 - Poor performance in SnappyCodec because of string concat in tight loop
PARQUET-187 - parquet-scrooge doesn't compile under 2.11
PARQUET-188 - Parquet writes columns out of order (compared to the schema)
PARQUET-189 - Support building parquet with thrift 0.9.0
PARQUET-196 - parquet-tools command to get rowcount & size
PARQUET-197 - parquet-cascading and the mapred API does not create metadata file
PARQUET-202 - Typo in the connection info in the pom prevents publishing an RC
PARQUET-207 - ParquetInputSplit end calculation bug
PARQUET-208 - revert PARQUET-197
PARQUET-214 - Avro: Regression caused by schema handling
PARQUET-215 - Parquet Thrift should discard records with unrecognized union members
PARQUET-216 - Decrease the default page size to 64k
PARQUET-217 - Memory Manager's min allocation heuristic is not valid for schemas with many columns
PARQUET-232 - minor compilation issue
PARQUET-234 - Restore ParquetInputSplit methods from 1.5.0
PARQUET-235 - Fix compatibility of parquet.metadata with 1.5.0
PARQUET-236 - Check parquet-scrooge compatibility
PARQUET-237 - Check ParquetWriter constructor compatibility with 1.5.0
PARQUET-239 - Make AvroParquetReader#builder() static
PARQUET-242 - AvroReadSupport.setAvroDataSupplier is broken

Improvement

PARQUET-2 - Adding Type Persuasion for Primitive Types
PARQUET-25 - Pushdown predicates only work with hardcoded arguments
PARQUET-52 - Improve the encoding fall back mechanism for Parquet 2.0
PARQUET-57 - Make dev commit script easier to use
PARQUET-61 - Avoid fixing protocol events when there is not required field missing
PARQUET-74 - Use thread local decoder cache in Binary toStringUsingUTF8()
PARQUET-79 - Add thrift streaming API to read metadata
PARQUET-84 - Add an option to read the rowgroup metadata on the task side.
PARQUET-87 - Better and unified API for projection pushdown on cascading scheme
PARQUET-89 - All Parquet CI tests should be run against hadoop-2
PARQUET-92 - Parallel Footer Read Control
PARQUET-105 - Refactor and Document Parquet Tools
PARQUET-108 - Parquet Memory Management in Java
PARQUET-115 - Pass a filter object to user defined predicate in filter2 api
PARQUET-116 - Pass a filter object to user defined predicate in filter2 api
PARQUET-117 - implement the new page format for Parquet 2.0
PARQUET-119 - add data_encodings to ColumnMetaData to enable dictionary based predicate push down
PARQUET-121 - Allow Parquet to build with Java 8
PARQUET-128 - Optimize the parquet RecordReader implementation when: A. filterpredicate is pushed down , B. filterpredicate is pushed down on a flat schema
PARQUET-133 - Upgrade snappy-java to 1.1.1.6
PARQUET-134 - Enhance ParquetWriter with file creation flag
PARQUET-140 - Allow clients to control the GenericData object that is used to read Avro records
PARQUET-141 - improve parquet scrooge integration
PARQUET-160 - Simplify CapacityByteArrayOutputStream
PARQUET-165 - A benchmark module for Parquet would be nice
PARQUET-177 - MemoryManager ensure minimum Column Chunk size
PARQUET-181 - Scrooge Write Support
PARQUET-191 - Avro schema conversion incorrectly converts maps with nullable values.
PARQUET-192 - Avro maps drop null values
PARQUET-193 - Avro: Implement read compatibility rules for nested types
PARQUET-203 - Consolidate PathFilter for hidden files
PARQUET-204 - Directory support for parquet-schema
PARQUET-210 - JSON output for parquet-cat

New Feature

PARQUET-22 - Parquet #13: Backport of HIVE-6938
PARQUET-49 - Create a new filter API that supports filtering groups of records based on their statistics
PARQUET-64 - Add new logical types to parquet-column
PARQUET-123 - Add dictionary support to AvroIndexedRecordReader
PARQUET-198 - parquet-cascading Add Parquet Avro Scheme

Task

PARQUET-50 - Remove items from semver blacklist
PARQUET-139 - Avoid reading file footers in parquet-avro InputFormat
PARQUET-190 - Fix an inconsistent Javadoc comment of ReadSupport.prepareForRead
PARQUET-230 - Add build instructions to the README

Version 1.5.0

ISSUE 399: Fixed resetting stats after writePage bug, unit testing of readFooter
ISSUE 397: Fixed issue with column pruning when using requested schema
ISSUE 389: Added padding for requested columns not found in file schema
ISSUE 392: Value stats fixes
ISSUE 338: Added statistics to Parquet pages and rowGroups
ISSUE 351: Fix bug #350, fixed length argument out of order.
ISSUE 378: configure semver to enforce semantic versioning
ISSUE 355: Add support for DECIMAL type annotation.
ISSUE 336: protobuf dependency version changed from 2.4.1 to 2.5.0
ISSUE 337: issue #324, move ParquetStringInspector to org.apache.hadoop.hive.serde...

Version 1.4.3

ISSUE 381: fix metadata concurency problem

Version 1.4.2

ISSUE 359: Expose values in SimpleRecord
ISSUE 335: issue #290, hive map conversion to parquet schema
ISSUE 365: generate splits by min max size, and align to HDFS block when possible
ISSUE 353: Fix bug: optional enum field causing ScroogeSchemaConverter to fail
ISSUE 362: Fix output bug during parquet-dump command
ISSUE 366: do not call schema converter to generate projected schema when projection is not set
ISSUE 367: make ParquetFileWriter throw IOException in invalid state case
ISSUE 352: Parquet thrift storer
ISSUE 349: fix header bug

Version 1.4.1

ISSUE 344: select * from parquet hive table containing map columns runs into exception. Issue #341.
ISSUE 347: set reading length in ThriftBytesWriteSupport to avoid potential OOM cau...
ISSUE 346: stop using strings and b64 for compressed input splits
ISSUE 345: set cascading version to 2.5.3
ISSUE 342: compress kv pairs in ParquetInputSplits

Version 1.4.0

ISSUE 333: Compress schemas in split
ISSUE 329: fix filesystem resolution
ISSUE 320: Spelling fix
ISSUE 319: oauth based authentication; fix grep change
ISSUE 310: Merge parquet tools
ISSUE 314: Fix avro schema conv for arrays of optional type for #312.
ISSUE 311: Avro null default values bug
ISSUE 316: Update poms to use thrift.exectuable property.
ISSUE 285: [CASCADING] Provide the sink implementation for ParquetTupleScheme
ISSUE 264: Native Protocol Buffer support
ISSUE 293: Int96 support
ISSUE 313: Add hadoop Configuration to Avro and Thrift writers (#295).
ISSUE 262: Scrooge schema converter and projection pushdown in Scrooge
ISSUE 297: Ports HIVE-5783 to the parquet-hive module
ISSUE 303: Avro read schema aliases
ISSUE 299: Fill in default values for new fields in the Avro read schema
ISSUE 298: Bugfix reorder thrift fields causing writting nulls
ISSUE 289: first use current thread's classloader to load a class, if current threa...
ISSUE 292: Added ParquetWriter() that takes an instance of Hadoop's Configuration.
ISSUE 282: Avro default read schema
ISSUE 280: style: junit.framework to org.junit
ISSUE 270: Make ParquetInputSplit extend FileSplit

Version 1.3.2

ISSUE 271: fix bug: last enum index throws DecodingSchemaMismatchException
ISSUE 268: fixes #265: add semver validation checks to non-bundle builds
ISSUE 269: Bumps parquet-jackson parent version
ISSUE 260: Shade jackson only once for all parquet modules

Version 1.3.1

ISSUE 267: handler only handle ignored field, exception during will be thrown as Sk...
ISSUE 266: upgrade parquet-mr to elephant-bird 4.4

Version 1.3.0

ISSUE 258: Optimize scan
ISSUE 259: add delta length byte arrays and delta byte arrays encodings
ISSUE 249: make summary files read in parallel; improve memory footprint of metadata; avoid unnecessary seek
ISSUE 257: Create parquet-hadoop-bundle which will eventually replace parquet-hive-bundle
ISSUE 253: Delta Binary Packing for Int
ISSUE 254: Add writer version flag to parquet and make initial changes for supported parquet 2.0 encodings
ISSUE 256: Resolves issue #251 by doing additional checks if Hive returns "Unknown" as a version
ISSUE 252: refactor error handler for BufferedProtocolReadToWrite to be non-static

Version 1.2.11

ISSUE 250: pretty_print_json_for_compatibility_checker
ISSUE 243: add parquet cascading integration documentation
ISSUE 248: More Hadoop 2 compatibility fixes

Version 1.2.10

ISSUE 247: fix bug: when field index is greater than zero
ISSUE 244: Feature/error handler
ISSUE 187: Plumb OriginalType
ISSUE 245: integrate parquet format 2.0

Version 1.2.9

ISSUE 242: upgrade elephant-bird version to 4.3
ISSUE 240: fix loader cache
ISSUE 233: use latest stable release of cascading: 2.5.1
ISSUE 241: Update reference to 0.10 in Hive012Binding javadoc
ISSUE 239: Fix hive map and array inspectors with null containers
ISSUE 234: optimize chunk scan; fix compressed size
ISSUE 237: Handle codec not found
ISSUE 238: fix pom version caused by bad merge
ISSUE 235: Not write pig meta data only when pig is not avaliable
ISSUE 227: Breaks parquet-hive up into several submodules, creating infrastructure ...
ISSUE 229: add changelog tool
ISSUE 236: Make cascading a provided dependency

Version 1.2.8

ISSUE 228: enable globing files for parquetTupleScheme, refactor unit tests and rem...
ISSUE 224: Changing read and write methods in ParquetInputSplit so that they can de...

Version 1.2.8

ISSUE 228: enable globing files for parquetTupleScheme, refactor unit tests and rem...
ISSUE 224: Changing read and write methods in ParquetInputSplit so that they can de...

Version 1.2.7

ISSUE 223: refactor encoded values changes and test that resetDictionary works
ISSUE 222: fix bug: set raw data size to 0 after reset

Version 1.2.6

ISSUE 221: make pig, hadoop and log4j jars provided
ISSUE 220: parquet-hive should ship and uber jar
ISSUE 213: group parquet-format version in one property
ISSUE 215: Fix Binary.equals().
ISSUE 210: ParquetWriter ignores enable dictionary and validating flags.
ISSUE 202: Fix requested schema when recreating splits in hive
ISSUE 208: Improve dic fall back
ISSUE 207: Fix offset
ISSUE 206: Create a "Powered by" page

Version 1.2.5

ISSUE 204: ParquetLoader.inputFormatCache as WeakHashMap
ISSUE 203: add null check for EnumWriteProtocol
ISSUE 205: use cascading 2.2.0
ISSUE 199: simplify TupleWriteSupport constructor
ISSUE 164: Dictionary changes
ISSUE 196: Fixes to the Hive SerDe
ISSUE 197: RLE decoder reading past the end of the stream
ISSUE 188: Added ability to define arbitrary predicate functions
ISSUE 194: refactor serde to remove some unecessary boxing and include dictionary awareness
ISSUE 190: NPE in DictionaryValuesWriter.

Version 1.2.4

ISSUE 191: Add compatibility checker for ThriftStruct to check for backward compatibility of two thrift structs

Version 1.2.3

ISSUE 186: add parquet-pig-bundle
ISSUE 184: Update ParquetReader to take Configuration as a constructor argument.
ISSUE 183: Disable the time read counter check in DeprecatedInputFormatTest.
ISSUE 182: Fix a maven warning about a missing version number.
ISSUE 181: FIXED_LEN_BYTE_ARRAY support
ISSUE 180: Support writing Avro records with maps with Utf8 keys
ISSUE 179: Added Or/Not logical filters for column predicates
ISSUE 172: Add sink support for parquet.cascading.ParquetTBaseScheme
ISSUE 169: Support avro records with empty maps and arrays
ISSUE 162: Avro schema with empty arrays and maps

Version 1.2.2

ISSUE 175: fix problem with projection pushdown in parquetloader
ISSUE 174: improve readability by renaming variables
ISSUE 173: make numbers in log messages easy to read in InternalParquetRecordWriter
ISSUE 171: add unit test for parquet-scrooge
ISSUE 165: distinguish recoverable exception in BufferedProtocolReadToWrite
ISSUE 166: support projection when required fields in thrift class are not projected

Version 1.2.1

ISSUE 167: fix oom error dues to bad estimation

Version 1.2.0

ISSUE 154: improve thrift error message
ISSUE 161: support schema evolution
ISSUE 160: Resource leak in parquet.hadoop.ParquetFileReader.readFooter(Configurati...
ISSUE 163: remove debugging code from hot path
ISSUE 155: Manual pushdown for thrift read support
ISSUE 159: Counter for mapred
ISSUE 156: Fix site
ISSUE 153: Fix projection required field

Version 1.1.1

ISSUE 150: add thrift validation on read

Version 1.1.0

ISSUE 149: changing default block size to 128mb
ISSUE 146: Fix and add unit tests for Hive nested types
ISSUE 145: add getStatistics method to parquetloader
ISSUE 144: Map key fields should allow other types than strings
ISSUE 143: Fix empty encoding col metadata
ISSUE 142: Fix total size row group
ISSUE 141: add parquet counters for benchmark
ISSUE 140: Implemented partial schema for GroupReadSupport
ISSUE 138: fix bug of wrong column metadata size
ISSUE 137: ParquetMetadataConverter bug
ISSUE 133: Update plugin versions for maven aether migration - fixes #125
ISSUE 130: Schema validation should not validate the root element's name
ISSUE 127: Adding dictionary encoding for non string types.. #99
ISSUE 125: Unable to build
ISSUE 124: Fix Short and Byte types in Hive SerDe.
ISSUE 123: Fix Snappy compressor in parquet-hadoop.
ISSUE 120: Fix RLE bug with partial literal groups at end of stream.
ISSUE 118: Refactor column reader
ISSUE 115: Map key fields should allow other types than strings
ISSUE 103: Map key fields should allow other types than strings
ISSUE 99: Dictionary encoding for non string types (float double int long boolean)
ISSUE 47: Add tests for parquet-scrooge and parquet-cascading

Version 1.0.1

ISSUE 126: Unit tests for parquet cascading
ISSUE 121: fix wrong RecordConverter for ParquetTBaseScheme
ISSUE 119: fix compatibility with thrift remove unused dependency

Files

CHANGES.md

Latest commit

History

CHANGES.md

File metadata and controls

Parquet

Version 1.11.0

Bug

New Feature

Improvement

Test

Wish

Task

Version 1.10.1

Bug

Version 1.10.0

Bug

New Feature

Improvement

Task

Version 1.9.0

Bug

Improvement

New Feature

Task

Test

Version 1.8.1

Bug

Improvement

Task

Version 1.8.0

Bug

Improvement

New Feature

Task

Version 1.7.0

Version 1.6.0

Bug

Improvement

New Feature

Task

Version 1.5.0

Version 1.4.3

Version 1.4.2

Version 1.4.1

Version 1.4.0

Version 1.3.2

Version 1.3.1

Version 1.3.0

Version 1.2.11

Version 1.2.10

Version 1.2.9

Version 1.2.8

Version 1.2.8

Version 1.2.7

Version 1.2.6

Version 1.2.5

Version 1.2.4

Version 1.2.3

Version 1.2.2

Version 1.2.1

Version 1.2.0

Version 1.1.1

Version 1.1.0

Version 1.0.1

Version 1.0.0