-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory decoding #968
Commits on Aug 7, 2017
-
Decrease memory consumption for whole image single tile decoding.
We can use the same buffer for the tile decoding and the final image, and save the intermediate buffer to transfer between those. Effect on the decoding of MAPA (9944 x 13498 x 3 components of size byte) Peak memory from 4.5 GB to 2.7 GB Now: n5: 2699708767 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 1610689344 0x4E77E07: opj_aligned_malloc (opj_malloc.c:61) <-- final image n1: 1610689344 0x4E7195B: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E722D2: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 1610689344 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 1610689344 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 1610689344 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40369E: main (opj_decompress.c:1459) n1: 815554560 0x4E72231: opj_tcd_init_decode_tile (tcd.c:1217) <-- working memory for code blocks: 9944*13498/64/64*8192*3 n1: 815554560 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 815554560 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 815554560 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 815554560 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 815554560 0x40369E: main (opj_decompress.c:1459) n1: 219758391 0x4E4C0BF: opj_j2k_read_tile_header (j2k.c:4661) <-- ingestion of code stream n1: 219758391 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 219758391 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 219758391 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 219758391 0x40369E: main (opj_decompress.c:1459) n1: 39822000 0x4E7224F: opj_tcd_init_decode_tile (tcd.c:1224) <-- OPJ_J2K_DEFAULT_NB_SEGS*sizeof(opj_tcd_seg_t) per codeblock n1: 39822000 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 39822000 0x4E4C742: opj_j2k_decode_tiles (j2k.c:10324) n1: 39822000 0x4E4E20E: opj_j2k_decode (j2k.c:7826) n1: 39822000 0x4E52E42: opj_jp2_decode (jp2.c:1564) n0: 39822000 0x40369E: main (opj_decompress.c:1459) n0: 13884472 in 49 places, all below massif's threshold (1.00%) Before: n5: 4493329848 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n2: 1610709160 0x4E77C87: opj_aligned_malloc (opj_malloc.c:61) n1: 1610689344 0x4E717DB: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E72152: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 1610689344 0x4E4C64A: opj_j2k_decode_tiles (j2k.c:10318) n1: 1610689344 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 1610689344 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40369E: main (opj_decompress.c:1459) n0: 19816 in 2 places, all below massif's threshold (1.00%) n1: 1610689344 0x4E43F36: opj_j2k_update_image_data.isra.7 (j2k.c:8743) n1: 1610689344 0x4E4C5C1: opj_j2k_decode_tiles (j2k.c:10358) n1: 1610689344 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 1610689344 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40369E: main (opj_decompress.c:1459) n1: 815554560 0x4E720B1: opj_tcd_init_decode_tile (tcd.c:1217) n1: 815554560 0x4E4BCF1: opj_j2k_read_tile_header (j2k.c:8597) n1: 815554560 0x4E4C64A: opj_j2k_decode_tiles (j2k.c:10318) n1: 815554560 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 815554560 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 815554560 0x40369E: main (opj_decompress.c:1459) n1: 402672336 0x4E4C545: opj_j2k_decode_tiles (j2k.c:10336) n1: 402672336 0x4E4E08E: opj_j2k_decode (j2k.c:7826) n1: 402672336 0x4E52CC2: opj_jp2_decode (jp2.c:1564) n0: 402672336 0x40369E: main (opj_decompress.c:1459) n0: 53704448 in 58 places, all below massif's threshold (1.00%)
Configuration menu - View commit details
-
Copy full SHA for 793edc3 - Browse repository at this point
Copy the full SHA 793edc3View commit details -
Fix crash on Windows due to b7594c0
b7594c0 may put opj_tcd_tilecomp_t->data allocated by opj_alloc_tile_component_data() as the image->comps[].data. As opj_alloc_tile_component_data() use opj_aligned_malloc() we must be sure to ue opj_alined_malloc()/_free() in all places where we alloc/free image->comps[].data. Note: this might have some compatibility impact in case user code does itself the allocation/free of image->comps[].data
Configuration menu - View commit details
-
Copy full SHA for 61fb5dd - Browse repository at this point
Copy the full SHA 61fb5ddView commit details -
Add opj_image_data_alloc() / opj_image_data_free()
As bin/common/color.c used to directly call malloc()/free(), we need to export functions dedicated to allocating/freeing image component data.
Configuration menu - View commit details
-
Copy full SHA for f58aab9 - Browse repository at this point
Copy the full SHA f58aab9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0c1fc05 - Browse repository at this point
Copy the full SHA 0c1fc05View commit details -
Configuration menu - View commit details
-
Copy full SHA for 434ace4 - Browse repository at this point
Copy the full SHA 434ace4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 373520d - Browse repository at this point
Copy the full SHA 373520dView commit details -
Decoding: do not allocate memory for the codestream of each codeblock
Currently we allocate at least 8192 bytes for each codeblock, and copy the relevant parts of the codestream in that per-codeblock buffer as we decode packets. As the whole codestream for the tile is ingested in memory and alive during the decoding, we can directly point to it instead of copying. But to do that, we need an intermediate concept, a 'chunk' of code-stream segment, given that segments may be made of data at different places in the code-stream when quality layers are used. With that change, the decoding of MAPA_005.jp2 goes down from the previous improvement of 2.7 GB down to 1.9 GB. New profile: n4: 1885648469 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 1610689344 0x4E78287: opj_aligned_malloc (opj_malloc.c:61) n1: 1610689344 0x4E71D7B: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E7272C: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618) n1: 1610689344 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349) n1: 1610689344 0x4E4E36E: opj_j2k_decode (j2k.c:7847) n1: 1610689344 0x4E52FA2: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40374E: main (opj_decompress.c:1459) n1: 219232541 0x4E4BBF0: opj_j2k_read_tile_header (j2k.c:4685) n1: 219232541 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349) n1: 219232541 0x4E4E36E: opj_j2k_decode (j2k.c:7847) n1: 219232541 0x4E52FA2: opj_jp2_decode (jp2.c:1564) n0: 219232541 0x40374E: main (opj_decompress.c:1459) n1: 39822000 0x4E727A9: opj_tcd_init_decode_tile (tcd.c:1219) n1: 39822000 0x4E4BDD9: opj_j2k_read_tile_header (j2k.c:8618) n1: 39822000 0x4E4C8A2: opj_j2k_decode_tiles (j2k.c:10349) n1: 39822000 0x4E4E36E: opj_j2k_decode (j2k.c:7847) n1: 39822000 0x4E52FA2: opj_jp2_decode (jp2.c:1564) n0: 39822000 0x40374E: main (opj_decompress.c:1459) n0: 15904584 in 52 places, all below massif's threshold (1.00%)
Configuration menu - View commit details
-
Copy full SHA for ca34d13 - Browse repository at this point
Copy the full SHA ca34d13View commit details -
Slight improvement in management of code block chunks
Instead of having the chunk array at the segment level, we can move it down to the codeblock itself since segments are filled in sequential order. Limit the number of memory allocation, and decrease slightly the memory usage. On MAPA_005.jp2 n4: 1871312549 (heap allocation functions) malloc/new/new[], --alloc-fns, etc. n1: 1610689344 0x4E781E7: opj_aligned_malloc (opj_malloc.c:61) n1: 1610689344 0x4E71D1B: opj_alloc_tile_component_data (tcd.c:676) n1: 1610689344 0x4E726CF: opj_tcd_init_decode_tile (tcd.c:816) n1: 1610689344 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617) n1: 1610689344 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348) n1: 1610689344 0x4E4E3CE: opj_j2k_decode (j2k.c:7846) n1: 1610689344 0x4E53002: opj_jp2_decode (jp2.c:1564) n0: 1610689344 0x40374E: main (opj_decompress.c:1459) n1: 219232541 0x4E4BC50: opj_j2k_read_tile_header (j2k.c:4683) n1: 219232541 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348) n1: 219232541 0x4E4E3CE: opj_j2k_decode (j2k.c:7846) n1: 219232541 0x4E53002: opj_jp2_decode (jp2.c:1564) n0: 219232541 0x40374E: main (opj_decompress.c:1459) n1: 23893200 0x4E72735: opj_tcd_init_decode_tile (tcd.c:1225) n1: 23893200 0x4E4BE39: opj_j2k_read_tile_header (j2k.c:8617) n1: 23893200 0x4E4C902: opj_j2k_decode_tiles (j2k.c:10348) n1: 23893200 0x4E4E3CE: opj_j2k_decode (j2k.c:7846) n1: 23893200 0x4E53002: opj_jp2_decode (jp2.c:1564) n0: 23893200 0x40374E: main (opj_decompress.c:1459) n0: 17497464 in 52 places, all below massif's threshold (1.00%)
Configuration menu - View commit details
-
Copy full SHA for 9211469 - Browse repository at this point
Copy the full SHA 9211469View commit details