Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regex match iterator #2109

Merged
merged 5 commits into from
Aug 2, 2017
Merged

Conversation

autodidaddict
Copy link
Contributor

@autodidaddict autodidaddict commented Jul 30, 2017

Add regex match iterator Issue 2093, RFC 044

@SeanTAllen SeanTAllen added the changelog - added Automatically add "Added" CHANGELOG entry on merge label Jul 30, 2017
@autodidaddict autodidaddict changed the title Adding match iterator functionality from issue #2093, RFC 044 Add regex match iterator Jul 30, 2017
@@ -0,0 +1,38 @@
class MatchIterator is Iterator[Match]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the pony file naming conventions this file shoudl be called match_interator.pony

@@ -61,6 +63,43 @@ class iso _TestEq is UnitTest
h.assert_true(r == "1234", """ \d+ =~ "1234" """)
h.assert_true(r != "asdf", """ \d+ !~ "asdf" """)

class iso _TestMatchIter is UnitTest
Copy link
Member

@SeanTAllen SeanTAllen Jul 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be _TestMatchIterator or _TestMatches

"""
let m = _regex(_subject, _offset)?
_offset = m.end_pos() + 1
m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a newline at the end of the file?

@SeanTAllen
Copy link
Member

This looks pretty good, couple small changes.

false
end

fun ref next() : Match? =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style-wise, the : here should not have a preceding space, and the ? should.

When in doubt on style, check the style guide as well as surrounding code.

let m = _regex(_subject, _offset)?
_offset = m.end_pos() + 1
m

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This little symbol on GitHub is indicating that this file is missing an empty line at the end of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the little symbol, and there is a line at the end of the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd - it may be indicating that there is extra whitespace on the line at the end of the file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means "there's no newline at the end of the file".

fun matches(subject: String): MatchIterator =>
"""
Creates a match iterator from the regular expression that will iterate
over the supplied subject returning matches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a period at the end of this sentence?

@jemc
Copy link
Member

jemc commented Jul 31, 2017

I'm a little bit concerned that the implementation will execute every match twice in a for loop, where both has_next and next are run.

However, that could be optimized by somebody who cared enough about the performance boost to do it, so I'm not necessarily averse to leaving it as-is for now.

@plietar
Copy link
Contributor

plietar commented Jul 31, 2017

How about removing the has_next method from Iterator, and changing for loop desugaring to rely only on the partialness of the next method to quit the loop ?

It would avoid this sort of problem, where checking if there is a next element can be expensive. There are even cases where it is impossible to check if there's a next value without consuming it.

The workaround is kind of awkward and requires caching the value, such as in the FileLines iterator. I also ran into this a while ago when writing a pony wrapper around a Rust iterator.

I've just had a go at implementing it in the compiler and stdlib. All uses of has_next can be replaced quite easily. In many cases it even simplifies both the implementation of the iterator and the code using it. I'm happy to write an RFC for that.

@Praetonus
Copy link
Member

@plietar The problem with that is that every iterator-based loop would have to exit with an exception, which is very expensive.

I think caching would be the best solution here if performance turns out to be a problem.

@jemc
Copy link
Member

jemc commented Jul 31, 2017

@plietar - note that in cases where it's impossible to write a has_next, I have implemented it as statically true in the past - it's a bit of a code smell, but it still works in for loops, and is effectively the same as if you had removed it from Iterator as you suggest - it also has the same downsides that @Praetonus mentioned, though, and it could potentially be confusing to a user who was expecting to rely on has_next being true, but that's something that could perhaps be fixed with documentation.

@SeanTAllen
Copy link
Member

SeanTAllen commented Aug 2, 2017

This is sync approved with a tiny caveat.

@autodidaddict can you address @jemc's couple of small formatting issues and then we can merge this?

@autodidaddict
Copy link
Contributor Author

autodidaddict commented Aug 2, 2017

I already addressed the formatting issues @SeanTAllen. It's in the "style guidelines" commit.

@SeanTAllen
Copy link
Member

I added the missing '.' that was required for this to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - added Automatically add "Added" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants