Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV hs_scan with HS_FLAG_SOM_LEFTMOST #350

Closed
noboruma opened this issue Mar 9, 2022 · 9 comments
Closed

SEGV hs_scan with HS_FLAG_SOM_LEFTMOST #350

noboruma opened this issue Mar 9, 2022 · 9 comments

Comments

@noboruma
Copy link

noboruma commented Mar 9, 2022

Hello there,

I was experimenting with scratch buffer allocation and noticed that one specific pattern would cause a crash. The code I used is as follow, I have 2 similar patterns and only allocate the scratch buffer for pattern1 but not pattern2:

#include <hs/hs_compile.h>
#include <stdio.h>
#include <string.h>
#include <hs/hs.h>

static int eventHandler(unsigned int id, unsigned long long from,
                        unsigned long long to, unsigned int flags, void *ctx) {
    printf("Match for pattern \"%s\" at offset %llu\n", (char *)ctx, to);
    return 0;
}

int main(int argc, char *argv[]) {
    const char *pattern1 = "^(ftps?|https?).*?\\?+$";
    const char *pattern2 = "^(ftps?|https?)://.*$";

    hs_compile_error_t *compile_err;
    hs_database_t *database1;
    hs_compile(pattern1, HS_FLAG_SOM_LEFTMOST, HS_MODE_BLOCK, NULL, &database1,
                   &compile_err);

    hs_database_t *database2;
    hs_compile(pattern2, HS_FLAG_SOM_LEFTMOST, HS_MODE_BLOCK, NULL, &database2,
                   &compile_err);

    const char *inputData = "https://foo.bar/";
    hs_scratch_t *scratch = NULL;
    hs_alloc_scratch(database1, &scratch);
    hs_scan(database1, inputData, strlen(inputData), 0, scratch, eventHandler,
                (void*)pattern1);
    hs_scan(database2, inputData, strlen(inputData), 0, scratch, eventHandler,
                (void*)pattern2);

    return 0;
}

Expected results: the second hs_scan should return an error, since the scratch buffer was not allocated properly for that pattern
Actual results: the second hs_scan is SEGV, here is the stack trace:

#0  0x00007ffff7c3b5cd in memset (__len=32, __ch=0, __dest=0x0)
    at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
#1  fatbit_clear (bits=0x0) at ./src/util/fatbit.h:60
#2  recordAnchoredLiteralMatch (end=8, anch_id=0, scratch=0x7fffcc00f240, t=<optimized out>)
    at ./src/rose/program_runtime.c:101
#3  roseRunProgram (t=<optimized out>, scratch=<optimized out>, programOffset=<optimized out>, som=0,
    end=<optimized out>, prog_flags=<optimized out>) at ./src/rose/program_runtime.c:2278
#4  0x00007ffff7c22fd0 in roseAnchoredCallback (start=start@entry=0, end=<optimized out>,
    id=id@entry=1888, ctx=ctx@entry=0x7fffcc00f240) at ./src/rose/match.c:215
#5  0x00007ffff7bd5a04 in doComplexReport (cached_accept_id=<synthetic pointer>,
    cached_accept_state=<synthetic pointer>, eod=0 '\000', loc=8, s=21, m=<optimized out>,
    ctxt=0x7fffcc00f240, cb=0x7ffff7c22f70 <roseAnchoredCallback>) at ./src/nfa/mcclellan.c:74
#6  mcclellanExec8_i (mode=CALLBACK_OUTPUT, c_final=0x0, single=0 '\000', ctxt=0x7fffcc00f240,
    cb=0x7ffff7c22f70 <roseAnchoredCallback>, offAdj=0, len=8, buf=0x7fffd4014580 "https://foo.bar/",
    state=<synthetic pointer>, m=<optimized out>) at ./src/nfa/mcclellan.c:502
#7  nfaExecMcClellan8_Bi (single=0 '\000', context=0x7fffcc00f240,
    length=8, buffer=0x7fffd4014580 "https://foo.bar/",
    offset=0, n=<optimized out>) at ./src/nfa/mcclellan.c:922
#8  nfaExecMcClellan8_B (n=<optimized out>, offset=0, buffer=0x7fffd4014580 "https://foo.bar/", length=8,
    cb=0x7ffff7c22f70 <roseAnchoredCallback>, context=0x7fffcc00f240) at ./src/nfa/mcclellan.c:945
#9  0x00007ffff7c0dbe1 in runAnchoredTableBlock (scratch=0x7fffcc00f240, atable=<optimized out>,
    t=0x555557cfde00) at ./src/rose/block.c:63
#10 roseBlockAnchored (scratch=0x7fffcc00f240, t=0x555557cfde00) at ./src/rose/block.c:212
#11 roseBlockExec (t=<optimized out>, scratch=<optimized out>) at ./src/rose/block.c:395
#12 0x00007ffff7b44f4f in rawBlockExec (scratch=0x7fffcc00f240, rose=0x555557cfde00)
    at ./src/runtime.c:188
#13 hs_scan (db=<optimized out>, data=<optimized out>, length=15, flags=<optimized out>,
    scratch=0x7fffcc00f240, onEvent=<optimized out>, userCtx=<optimized out>) at ./src/runtime.c:419

Note: there is no crash if I remove the HS_FLAG_SOM_LEFTMOST flag.
Workaround: reallocating the scratch buffer with the second pattern solves the issue.

It sounds to me that some sanity checks are missing inside hs_scan.

alessandrod added a commit to deepfence/hyperscan that referenced this issue Mar 13, 2022
@hongyang7
Copy link
Contributor

Thanks for reporting this. This is indeed a historical issue left. We'll consider to add suitable quick check for scratch validity later.

fatchanghao pushed a commit that referenced this issue Oct 27, 2022
@hongyang7
Copy link
Contributor

Please refer to latest develop branch.
Commit id: 7c65cfc

fatchanghao pushed a commit that referenced this issue Feb 15, 2023
@jlucovsky
Copy link

This document recommends the allocation of one scratch space per scanning context.

Suricata relies on this performance technique. The commit for this change has caused this Suricata issue. I investigated and discovered that the validity check comparing the scratch and database crc values was detecting a mismatch since the scratch area is per-Suricata thread context but a per-database scratch area is expected.

Here's the performance tip from the documentation link:
Tip
A scratch space can be allocated so that it can be used with any one of a number of databases. Each concurrent scan operation (such as a thread) needs its own scratch space.

The hs_alloc_scratch() function can accept an existing scratch space and “grow” it to support scanning with another pattern database. This means that instead of allocating one scratch space for every database used by an application, one can call hs_alloc_scratch() with a pointer to the same hs_scratch_t and it will be sized appropriately for use with any of the given databases.

@hongyang7
Copy link
Contributor

hongyang7 commented Mar 5, 2023

@jlucovsky
Yes you're right. We already notice this issue in Snort integration and Rspamd.
A quick roll back patch for this particular issue will be available soon.
Sorry for any inconvenience.

@jlucovsky
Copy link

@jlucovsky Yes you're right. We already notice this issue in Snort integration and Rspamd. A quick roll back patch for this particular issue will be available soon. Sorry for any inconvenience.

Thanks for addressing the issue so quickly!

fatchanghao pushed a commit that referenced this issue Mar 6, 2023
Roll back fix for github issue #350

About Scratch Usage:
For compile time, scratch space is strongly recommended to be
allocated immediately after database generation.
For runtime, besides using scratch for corresponding database,
Hyperscan also allows user to use larger scratch space allocated
for another database.
When multiple concurrent threads need to use the same databases
and a new scratch space is required, cloning the largest one is
always safe. This is realized based on API hs_scratch_size() and
hs_clone_scratch().
Behaviors beyond above are discouraged and results are undefined.
fatchanghao pushed a commit that referenced this issue Mar 6, 2023
Roll back fix for github issue #350

About Scratch Usage:
For compile time, scratch space is strongly recommended to be
allocated immediately after database generation.
For runtime, besides using scratch for corresponding database,
Hyperscan also allows user to use larger scratch space allocated
for another database.
When multiple concurrent threads need to use the same databases
and a new scratch space is required, cloning the largest one is
always safe. This is realized based on API hs_scratch_size() and
hs_clone_scratch().
Behaviors beyond above are discouraged and results are undefined.
@hongyang7
Copy link
Contributor

@jlucovsky Please refre to latest master. fix id: 0bf86a7
Any questions, let us know.

@jlucovsky
Copy link

Thanks. I've verified proper operation on the Suricata LTS and current (master) releases.

Thanks!

@jlucovsky
Copy link

@hongyang7 Will there be a new release with the update? Many of our users only used "released" software. Thus, could the change be in 5.4.2?

@hongyang7
Copy link
Contributor

@jlucovsky Please check for latest update.

markos pushed a commit to VectorCamp/vectorscan that referenced this issue Sep 5, 2023
markos pushed a commit to VectorCamp/vectorscan that referenced this issue Sep 5, 2023
Roll back fix for github issue intel#350

About Scratch Usage:
For compile time, scratch space is strongly recommended to be
allocated immediately after database generation.
For runtime, besides using scratch for corresponding database,
Hyperscan also allows user to use larger scratch space allocated
for another database.
When multiple concurrent threads need to use the same databases
and a new scratch space is required, cloning the largest one is
always safe. This is realized based on API hs_scratch_size() and
hs_clone_scratch().
Behaviors beyond above are discouraged and results are undefined.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants