Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CH_CALLBACK_SKIP_PATTERN causes infinite loop in ch_scan() with patterns of empty matches #360

Closed
liweitianux opened this issue Jul 13, 2022 · 3 comments

Comments

@liweitianux
Copy link

Hi, I recently noticed an infinite loop in our program and found the issue to be related with CH_CALLBACK_SKIP_PATTERN.

When there are patterns that may match empty strings, the CH_CALLBACK_SKIP_PATTERN return value from the callback would cause an infinite loop in ch_scan(), specifically in catchupPcre(). I've reproduced the issue with Hyperscan 5.1.1 and 5.4.0.

Below is a test program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <hs/ch.h>


static int
handler(unsigned int id,
	unsigned long long from, unsigned long long to,
	unsigned int flags, unsigned int size,
	const ch_capture_t *captured, void *ctx)
{
	(void)flags;
	(void)size;
	(void)captured;
	(void)ctx;

	printf("id=%u, from=%llu, to=%llu\n", id, from, to);

	// XXX!
	// With a regex matching an *empty* string, CH_CALLBACK_SKIP_PATTERN
	// causes an infinite loop in catchupPcre().
	// However, using CH_CALLBACK_CONTINUE is okay.
	return CH_CALLBACK_SKIP_PATTERN;
}


int
main(void)
{
	const char *input = "foo 0123 0 bar 39483 n34jfhlqekrcoi3q4";

	// test regex matching an *empty* string !
	const char *regexes[] = { "[a-z]+", "[0-9]", "0|" };
	unsigned int flags[] = { CH_FLAG_CASELESS, CH_FLAG_CASELESS, CH_FLAG_CASELESS };
	unsigned int ids[] = { 1, 2, 3 };
	unsigned int nelem = 3;

	ch_database_t *db = NULL;
	ch_scratch_t *scratch = NULL;
	ch_compile_error_t *cerr = NULL;
	ch_error_t err;

	err = ch_compile_multi(regexes, flags, ids, nelem,
			CH_MODE_NOGROUPS, NULL /* platform */,
			&db, &cerr);
	if (err != CH_SUCCESS) {
		printf("ch_compile_multi() failed\n");
		exit(1);
	}

	err = ch_alloc_scratch(db, &scratch);
	if (err != CH_SUCCESS) {
		printf("ch_alloc_scratch() failed\n");
		exit(1);
	}

	err = ch_scan(db, input, strlen(input), 0 /* flags */,
			scratch, handler, NULL /* onError */, NULL);
	printf("err = %d\n", (int)err);

	return 0;
}

And here is the output log:

id=3, from=0, to=0
id=1, from=0, to=3
id=2, from=4, to=5
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
id=3, from=38, to=38
...

So it just keeps repeating the last id=3, from=38, to=38 ...

If return CH_CALLBACK_CONTINUE from the callback, ch_scan() works correctly.

Please have a look and hope the above test program helps.

Thank you.

@liweitianux
Copy link
Author

Had a closer look at the catchupPcre() code, and I suspect the following conditional might be the cause:

https://github.com/intel/hyperscan/blob/master/chimera/ch_runtime.c#L338-L348

338        if (hyctx->length >= pd->scanStart &&
339            !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) {
340            DEBUG_PRINTF("get a new match item\n");
341            int ret = scanPcre(hyctx, len, start, top_id);
342
343            if (ret == CH_CALLBACK_TERMINATE) {
344                DEBUG_PRINTF("user callback told us to terminate scanning\n");
345                return CH_SCAN_TERMINATED;
346            } else if (ret == CH_CALLBACK_SKIP_PATTERN) {
347                DEBUG_PRINTF("user callback told us to skip this pattern\n");
348                pd->scanStart = hyctx->length;

It looks to me the conditional at line 338 should be hyctx->length > pd->scanStart instead of >=.

@hongyang7
Copy link
Contributor

Had a closer look at the catchupPcre() code, and I suspect the following conditional might be the cause:

https://github.com/intel/hyperscan/blob/master/chimera/ch_runtime.c#L338-L348

338        if (hyctx->length >= pd->scanStart &&
339            !(pattern->flags & CHIMERA_PATTERN_FLAG_SINGLEMATCH)) {
340            DEBUG_PRINTF("get a new match item\n");
341            int ret = scanPcre(hyctx, len, start, top_id);
342
343            if (ret == CH_CALLBACK_TERMINATE) {
344                DEBUG_PRINTF("user callback told us to terminate scanning\n");
345                return CH_SCAN_TERMINATED;
346            } else if (ret == CH_CALLBACK_SKIP_PATTERN) {
347                DEBUG_PRINTF("user callback told us to skip this pattern\n");
348                pd->scanStart = hyctx->length;

It looks to me the conditional at line 338 should be hyctx->length > pd->scanStart instead of >=.

This change simply avoids infinite loop for CH_CALLBACK_SKIP_PATTERN flag in your case, but might result in losing the last match id=3, from=38, to=38 for CH_CALLBACK_CONTINUE flag.

We'll come with a fix soon.

fatchanghao pushed a commit that referenced this issue Aug 1, 2022
markos pushed a commit to VectorCamp/vectorscan that referenced this issue Aug 29, 2022
@hongyang7
Copy link
Contributor

Please refer to latest develop branch.
Commit id: 10b5225

fatchanghao pushed a commit that referenced this issue Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants