Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent HLL bitstream_unpack crashes #84

Open
steven-kirk opened this issue Nov 6, 2019 · 1 comment
Open

Frequent HLL bitstream_unpack crashes #84

steven-kirk opened this issue Nov 6, 2019 · 1 comment

Comments

@steven-kirk
Copy link

steven-kirk commented Nov 6, 2019

We’ve been seeing nearly daily crashes from a PostgreSQL 9.6 application that is heavily
dependent on the HLL extension (v 2.10.2). All these crashes are from inside the HLL
bitstream_unpack function. Usually they’re from an INSERT VALUES statement, but
occasionally they are from an hll_cardinality call in a query. The format of the HLL cells
is always set using expressions like: hll_empty(17,5,-1,1) || hll_hash_bigint(879826435) || ...

I think I’ve identified the root cause, but I’d like someone who is familiar with the code
in the HLL library to confirm my hypothesis:

In bitstream_unpack it pulls a full quadword of data out of the bitstream using the 
brc_curp pointer.  Usually this is not a problem.  However, if the brc_curp pointer is 
less than 8 bytes from the end of the bitstream data, then that quadword read is 
reading past the end of the actual bitstream data.  Because of the subsequent bit
reordering, shifting, and masking this has no effect of the answers.  However, when
the end of the bitstream is very close to the end of an OS page then the quadword
read will attempt to read the next OS page, and if that next OS page does not exist
in this process, then it will SEGV.

Assuming I’ve correctly identified the problem, the attached patch should fix the issue.

Regards,
Steve Kirk

hll_2_10_2_patch.txt

BTW, The crash stacks all look roughly like:

#0 bitstream_unpack (brcp=brcp@entry=0x7ffc69814290) at hll.c:263
#1 0x0000147cbfcbe8ac in sparse_unpack (i_size=, i_bitp=0x7ffc69814327 "", i_nfilled=10403, i_log2nregs=, i_width=5, i_regp=0x7ffc69814320 "\002") at hll.c:359
#2 multiset_unpack (o_msp=o_msp@entry=0x7ffc698142f0, i_bitp=i_bitp@entry=0x147a16bf903c "\023\221\177", i_size=, o_encoded_type=o_encoded_type@entry=0x0) at hll.c:1188
#3 0x0000147cbfcc0823 in hll_union (fcinfo=) at hll.c:2042
#4 0x0000000000617132 in ExecMakeFunctionResultNoSets (fcache=0x147cbb3eb948, econtext=0x147a1d6cd530, isNull=0x147a1d6ce09d "", isDone=) at execQual.c:2041
#5 0x000000000061cace in ExecTargetList (tupdesc=, isDone=0x0, itemIsDone=0x147a1d6cffa0, isnull=0x147a1d6ce090 "", values=0x147a1d6ce000, econtext=0x147a1d6cd530, targetlist=0x147cbb3ecc28) at execQual.c:5423
#6 ExecProject (projInfo=, isDone=isDone@entry=0x0) at execQual.c:5647
#7 0x000000000062f10b in ExecOnConflictUpdate (returning=, canSetTag=1 '\001', estate=0x147a1d6c8038, excludedSlot=0x147a1d6c9bc0, planSlot=0x147a1d6c9bc0, conflictTid=0x7ffc69854520, resultRelInfo=0x147a1d6c81c8, mtstate=0x147a1d6c82d8) at nodeModifyTable.c:1234
#8 ExecInsert (canSetTag=1 '\001', estate=0x147a1d6c8038, onconflict=ONCONFLICT_UPDATE, arbiterIndexes=0x147a1654fd28, planSlot=0x147a1d6c9bc0, slot=0x147a1d6c9bc0, mtstate=0x147a1d6c82d8) at nodeModifyTable.c:410
#9 ExecModifyTable (node=node@entry=0x147a1d6c82d8) at nodeModifyTable.c:1512
#10 0x00000000006162a8 in ExecProcNode (node=node@entry=0x147a1d6c82d8) at execProcnode.c:396
#11 0x0000000000612727 in ExecutePlan (dest=0x147a1654d888, direction=, numberTuples=0, sendTuples=, operation=CMD_INSERT, use_parallel_mode=, planstate=0x147a1d6c82d8, estate=0x147a1d6c8038) at execMain.c:1567
#12 standard_ExecutorRun (queryDesc=0x147cbb3c6438, direction=, count=0) at execMain.c:339

@steven-kirk
Copy link
Author

After this customer switched from using HLLs of size "hll_empty(17,5,-1,1)", which may be up to 80Kb in memory size) to using HLLs of the default size "hll_empty(11,5,-1,1)" where the max memory size is 1.2 Kb and thus smaller than an OS memory page, then they stopped seeing frequent crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant