Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc #354

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Misc #354

wants to merge 3 commits into from

Conversation

Nahor
Copy link
Contributor

@Nahor Nahor commented Mar 4, 2024

The first two (refactor) are simplification of the code.

On the last one (fix), regarding the default SourceCode implementations:

  • it fixes a panic if a 0-length span is at the end of the source code
  • line_count was totally wrong (based on doc + field name). Since it was consistently broken, one can imagine some code out there relying on that bug. So squinting a certain way, one could see this fix as a breaking change.
  • \r was handled as a newline, but is not supported in other places (Graphical/Narrated). So this too, depending on how one squints, it could be a breaking change or not.
  • Otherwise, I rewrote the code so that it is more easily understandable (opinions may vary 😝)

@zkat zkat added the breaking A semver-major breaking change label Mar 7, 2024
@Nahor Nahor mentioned this pull request Mar 19, 2024
Copy link
Contributor

@Benjamin-L Benjamin-L left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the unsafe blocks, this looks reasonable to me

Line {
line_number: context_data.line() + line_number + 1,
offset: context_data.span().offset()
+ unsafe { line.as_ptr().offset_from(context.as_ptr()) as usize },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this unsafe block is sound, but it would be good to have a // SAFETY: comment explaining the reasoning. I had to do a bit of digging to convince myself that it's not possible to have context.len() > isize::MAX.

Alternatively, you could avoid the offset_from call entirely by keeping a mutable offset variable and doing offset += line.len() or similar inside the map closure. That's a little ugly, but I'd personally prefer it to unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the exact opposite of what I wanted to do. The whole reason I went with offset_from is to avoid using an offset variable, because that variable is a possible source of errors if not computed right (and indeed, such problem in the previous code is why I started to look at this piece of code in the first place).

Also, from a semantic point of view, the "offset from" meaning is what we are looking for ("the offset of line from the beginning of the data buffer"). A + line.len() is only indirectly related to that info. It's only correct if, and only if, the various lines follow each other exactly, without gaps nor overlaps. In particular, it requires the use of split_inclusive instead of the more common split1, creating, IMO, an unnecessary dependency between the split and the offset computation.

And yes it's safe per design:

The theoretical upper bound on object and array size is the maximum isize value. This ensures that isize can be used to calculate differences between pointers into an object or array and can address every byte within an object along with one byte past the end.

Footnotes

  1. I would have used split except that it causes some other issues in some corner cases (I can't remember if it was when the data terminates with a newline or when the data was empty)

- Fix a panic when a 0 length span covers the end of a document
- Fix incorrect `line_count`
- Add new unit tests and complete existing ones
- Improve readability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking A semver-major breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants