Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implemented split #66 #80

Merged
merged 8 commits into from
Jun 8, 2019
Merged

[WIP] Implemented split #66 #80

merged 8 commits into from
Jun 8, 2019

Conversation

SamMousa
Copy link
Contributor

@SamMousa SamMousa commented Jun 3, 2019

The implementation takes care to not use string comparison functions to prevent needless copying of data..
This does make it a little bit harder to read, so might not be worth it.

Todo:

  • splitStream
  • splitCallable

Please check the implementation in #66 as well, it reads a lot easier but comes with the cost of having to do a substr call for every character in the data instead of only when emitting a value.

@nikic
Copy link
Owner

nikic commented Jun 3, 2019

I'm not sure I understand the motivation for the manually implemented comparison, which is going to be very slow. Can't we use a strpos loop instead? Something along these lines...

$offset = 0;
while (false !== $offset = strpos($str, $deliminter, $offset)) {
    // ...
    $offset += strlen($delimiter);
}

@SamMousa
Copy link
Contributor Author

SamMousa commented Jun 3, 2019

I wanted to prevent substring copying; didn't think of using strpos.
It is definitely a better solution that will also create more readable code.

Side note: why would the manual comparison be slow? Because it's implemented in PHP?

@nikic
Copy link
Owner

nikic commented Jun 3, 2019

Side note: why would the manual comparison be slow? Because it's implemented in PHP?

For two reasons:

  • Implementing it in PHP carries a lot of overhead here beyond the obvious, because the C implementation can use SIMD instructions. If the delimiter is a single character, then it is possible to check many (say 16) characters of the string for it at the same time.
  • For long strings and long delimiters, a naive search has quadratic time complexity. The internal implementation will use a linear time matching algorithm if it is heuristically profitable.

src/iter.php Outdated Show resolved Hide resolved
src/iter.php Outdated Show resolved Hide resolved
src/iter.php Outdated Show resolved Hide resolved
SamMousa and others added 5 commits June 6, 2019 10:05
Co-Authored-By: Nikita Popov <nikita.ppv@googlemail.com>
Co-Authored-By: Nikita Popov <nikita.ppv@googlemail.com>
Co-Authored-By: Nikita Popov <nikita.ppv@googlemail.com>
@nikic nikic merged commit c99acb2 into nikic:master Jun 8, 2019
@nikic
Copy link
Owner

nikic commented Jun 8, 2019

This one looks good, so I went ahead and merged it. We can handle other variants in a followup...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants