Finds a short regex to match strings from one list and none from another, inspired by Peter Norvig's.
This Meta-Regex Golf solution differs from Norvig's in a number of ways:
- Randomness element when generating regex components, allowing components to be of any size
- Based on the weighted set cover problem rather than vanilla set cover
- Uses simulated annealing to improve upon greedy solution
From the root directory, run, with python 3:
python main.py <STRINGS TO MATCH> <STRINGS TO NOT MATCH>
e.g.
python main.py examples/aristotle_quotes.txt examples/gold_digger_lyrics.txt
Settings like probability distributions and time spent on searching can be found in:
metareg/settings.py
Matching US presidential winners against losers:
ay.|po|r.e$|bu|a.a|ed|i.l|di|i.*o|oo|j|a.t|ma|e.a
Matching IMDB's top 100 films against it's bottom 100 films:
yc|D.+g|Ve|A.a|R.i|do|ot$|M$|g$|·|df|aut|Ev|a.h|mu|a.c|7|L.+f+|ble|Ci|lie|^h|Mat|Dep|po|ist| D.*i|ile|gb|av|oy|or*k|ode|or |Fu|p ?.i|F.g|r. M|su|G.a|es?t|Mem
Matching Shakespeare quotes against Aristotle quotes:
ad b|Be| .ew|h as|â|Mai|av.n| .ge|rg|ou. d|Br|Lo.e |hs|ch*,|^I |doo|-l|w n*o| da|u |m s|, I|Le|bon| pr.ve|y;|ar.$| .tr|rue|T.ey|, T|fat|gu|^To|n.le|!|eart|w\.|ull|e wor|my|oy|bor|f no*t|foo|ub|^U| ste|cky|a.fu|fal|gel|Sc|th b|e.pt|I c|nt.y
And, of course, Shakespeare quotes against Golddigger lyrics:
!$| re|\.| de