Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dehyphen improvements #387

Closed
bsless opened this issue Feb 21, 2022 · 7 comments
Closed

Dehyphen improvements #387

bsless opened this issue Feb 21, 2022 · 7 comments
Assignees
Labels
needs analysis I need to think about this!

Comments

@bsless
Copy link

bsless commented Feb 21, 2022

the dehyphen docstring mentions there's probably a faster solution.
The equivalent solution in JVM regex would be: (str/replace s #"(?!\w)-(?=\w)" " ")
Javascript also support lookahead/back, but with different syntax https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Assertions
Rough benchmarks show this regex is 2x faster on the JVM

@seancorfield seancorfield self-assigned this Feb 21, 2022
@seancorfield seancorfield added the needs analysis I need to think about this! label Feb 21, 2022
@seancorfield
Copy link
Owner

I looked at that doc page and the syntax looks the same as for the JVM regex -- what am I missing?

@seancorfield
Copy link
Owner

Heh, somewhat to my surprised, I changed dephyphen to not loop, with the intent that I could write some failing tests first, that the new regex would handle correctly instead.

So here's dehyphen:

(defn- dehyphen
  "Replace _embedded_ hyphens with spaces.

  Hyphens at the start or end of a string should not be touched."
  [s]
  (str/replace s #"(\w)-(\w)" "$1 $2"))

and here are the tests:

(deftest sql-kw-test
  (is (= "FETCH NEXT" (sut/sql-kw :fetch-next)))
  (is (= "WHAT IS THIS" (sut/sql-kw :what-is-this)))
  (is (= "FEE FIE FOE FUM" (sut/sql-kw :fee-fie-foe-fum)))
  (is (= "-WHAT THE-" (sut/sql-kw :-what-the-)))
  (is (= "fetch_next" (sut/sql-kw :'fetch-next)))
  (is (= "what_is_this" (sut/sql-kw :'what-is-this)))
  (is (= "fee_fie_foe_fum" (sut/sql-kw :'fee-fie-foe-fum)))
  (is (= "_what_the_" (sut/sql-kw :'-what-the-))))

and... they all pass straight away... so the loop was never required apparently.

@bsless
Copy link
Author

bsless commented Feb 22, 2022

This is weird, why isn't it working as expected?

(clojure.string/replace "a-b-c-d" #"(\w)-(\w)" "$1 $2")
"a b-c d"

EDIT: I get it, it's an edge case where the word is one character long

@seancorfield
Copy link
Owner

Yeah, took me a while to realize that's what the issue was:

dev=> (#'sql/dehyphen :a-b-c)
":a b-c"
dev=> (#'sql/dehyphen :a-bx-c)
":a bx c"

@seancorfield seancorfield reopened this Feb 23, 2022
@seancorfield
Copy link
Owner

FWIW, the regex you provided causes these failures (which the previous loop/regex did not):

FAIL in (sql-kw-test) (sql_test.cljc:872)
expected: (= "-X" (sut/sql-kw :-x))
  actual: (not (= "-X" " X"))

FAIL in (sql-kw-test) (sql_test.cljc:874)
expected: (= "-X-" (sut/sql-kw :-x-))
  actual: (not (= "-X-" " X-"))

FAIL in (sql-kw-test) (sql_test.cljc:881)
expected: (= "-WHAT THE-" (sut/sql-kw :-what-the-))
  actual: (not (= "-WHAT THE-" " WHAT THE-"))

@seancorfield
Copy link
Owner

This variant seems to work: (str/replace s #"(\w)-(?=\w)" "$1 ")

seancorfield added a commit that referenced this issue Feb 23, 2022
@seancorfield
Copy link
Owner

I'm not going to cut a new release just for this fix unless someone trips over the single character edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs analysis I need to think about this!
Projects
None yet
Development

No branches or pull requests

2 participants