You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before coalescing small files, we need to estimate an initial size to allocate a HostMemoryBuffer to store the HEADER, STRIPES, and FOOTER. From the testing result on non-partitioned 5000 files total 1.3G, there is overestimating for the initial estimated size.
stderr:21/07/16 09:08:38 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63929078, and the true size: 63160978
stderr:21/07/16 09:08:38 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63535693, and the true size: 62768848
stderr:21/07/16 09:08:38 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63159114, and the true size: 62391504
stderr:21/07/16 09:08:38 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63055310, and the true size: 62287543
stderr:21/07/16 09:08:38 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 64388818, and the true size: 63620556
stderr:21/07/16 09:08:39 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63271377, and the true size: 62502777
stderr:21/07/16 09:08:39 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63699389, and the true size: 62931986
stderr:21/07/16 09:08:39 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 63398089, and the true size: 62630814
stderr:21/07/16 09:08:41 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62961599, and the true size: 62193810
stderr:21/07/16 09:08:41 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62870696, and the true size: 62103385
stderr:21/07/16 09:08:42 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62784883, and the true size: 62017554
stderr:21/07/16 09:08:42 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62698196, and the true size: 61932040
stderr:21/07/16 09:08:43 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62590045, and the true size: 61822125
stderr:21/07/16 09:08:43 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62491433, and the true size: 61724033
stderr:21/07/16 09:08:43 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62394913, and the true size: 61629064
stderr:21/07/16 09:08:44 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62287476, and the true size: 61521093
stderr:21/07/16 09:08:44 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62168654, and the true size: 61402519
stderr:21/07/16 09:08:45 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 62029026, and the true size: 61262705
stderr:21/07/16 09:08:45 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 61874813, and the true size: 61108187
stderr:21/07/16 09:08:46 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 61643721, and the true size: 60877037
stderr:21/07/16 09:08:46 INFO MultiFileOrcPartitionReader: ORC Coalescing reading estimates the initTotalSize: 45697529, and the true size: 45092366
The over-estimating size is about 750K (initial size - the true size) for a total of 62M data.
The text was updated successfully, but these errors were encountered:
With #2909, ORC has supported COALESCING reading.
Before coalescing small files, we need to estimate an initial size to allocate a HostMemoryBuffer to store the HEADER, STRIPES, and FOOTER. From the testing result on non-partitioned 5000 files total 1.3G, there is overestimating for the initial estimated size.
The over-estimating size is about 750K (initial size - the true size) for a total of 62M data.
The text was updated successfully, but these errors were encountered: