-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans for future 3.0.0 branch #100
Comments
In the end I want to clean up my two pet peeves: 1) replace tabs with spaces and 2) move C files to subdirectory src. |
@pombredanne Philippe, do you have any comments, ideas? Is there anything which is now annoying and can be fixed. |
So I never rebased it because I was not really happy with it as is, but this patch could be made nicer if based against a pyahocorasick with structure outlined below So my request would be that the C module is moved to _ahocorasick and then everything re-exported from ahocorasick so that "porcelain" like that patch can be implemented in Python. |
@WojciechMula sorry for the late reply!... here are a few ideas, in no specific order:
|
@frankier re
Your patch is in Python... so I am not sure I get the requirement. Can you elaborate? |
So the idea is to make it so Python and non Python stuff can be imported from ahocorasick together. This is typically done by naming the C module |
TBH I would be happy to remove plain-python module. It hasn't been upgraded for ages and seems alternatives are more mature. |
@WojciechMula re:
I am fine too. What you could do it move it to tests or a gist and explain this was used as an early prototype or not do anything at all and just remove it. It is in the history in anycase. |
Some notes:
I guess there are two ways I think of this:
As an example of some of the things I store see https://github.com/nexB/scancode-toolkit/blob/d1e725d3603a8f96c25f7e3f7595c68999b92a67/src/licensedcode/match_aho.py#L52
Frankly I would drop entirely wildcard matching as this is both a rare feature of such an automaton and I am not sure anyone is using this. I could see the value to have a regex engine leveraging the automaton to speed up matching (say more or less like esmre or may be parts of hyperscan) but I cannot how I would use the wildcard matching |
I can give this a shot. Glob-style pattern matching should be possible if you slightly adjust the stack-based iterator in |
@melsabagh-kw this looks like a great idea! I love it. Note that AFAIK, a glob can be converted to a regex (at least Python can translate https://docs.python.org/3/library/fnmatch.html#fnmatch.translate ) but this is a much simpler syntax than full regex and therefore better IMHO as you pointed out! |
I submitted a PR at #171 |
I renamed this to target a future v3.0. I am working on a v2.0 today and this is not as ambitious as what we have here. |
Few things I consider for the next, incompatible version (in random order):
STORE_ANY
,STORE_INTS
,STORE_LENGTH
). UsingSTORE_{INT,LENGTH}
makes sense at C-language level, but in Python isn't good. I mean, even if we save bare integers underneath, they have to be converted back to Python object. Removing this might also improve memory usage. To be honest,STORE_LENGTH
is stupid, I suspect it was added because "I could do it".glob
module, i.e. always the '?' marks "any character". It will require a bit of parsing, but the current solution is silly and unintuitive.The text was updated successfully, but these errors were encountered: