Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support milliseconds in cacheDuration parsing #586

Merged
merged 4 commits into from
Aug 14, 2021

Conversation

urbanautomaton
Copy link
Contributor

@urbanautomaton urbanautomaton commented May 10, 2021

Since the introduction of cacheDuration parsing in 25cbddd we've been seeing metadata parsing failures for one of our IdPs, whose cacheDuration value is set to "PT6H0M0.000S"

This seems like a valid ISO8601 duration, however the regexp being used for parsing doesn't allow for the possibility of milliseconds.

This PR adds an optional non-captured group to the regexp to permit them. Since the captured string value is already being converted using .to_f, this should work without further changes.

To achieve minimal test coverage I've added a seconds string to one of the existing metadata examples with a zero value; this reproduces the failure and confirms the fix without needing to add a specific test case.

Feedback

I'm new to this project so absolutely any feedback would be welcome. In particular, is the approach taken to test coverage sufficient, or would you prefer to add an explicit test case? This seemed simpler since the duration parsing isn't directly tested, but I'm happy to introduce unit tests there if that's preferred.

@urbanautomaton
Copy link
Contributor Author

Somewhat cheekily I've pushed another commit that reformats the regexp for readability. I was restricted in what I could do since this gem supports ruby 1.8.7, which doesn't have named capture groups. I think the split lines and comments make it a lot easier to understand and work with, but I'm more than happy to get rid of that commit if it's not preferred.

I've ensured the build passes with ruby 2.6.6, but I can't install ruby 1.8.7 on my mac any more. Is there any CI in place for this gem that tests a range of versions?

@urbanautomaton urbanautomaton force-pushed the sc-duration-regexp branch 3 times, most recently from 00638e2 to 725dab8 Compare May 10, 2021 21:13
|
(\d+)W # 8: Weeks
)
$)x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the "ignore whitespace" regexp flag. It's what allows me to split the regexp over multiple lines, and also insert comments. You can see it here documented for 2.6, but it's supported back to 1.8.x:

https://ruby-doc.org/core-2.6.4/Regexp.html#class-Regexp-label-Options

@pitbulk
Copy link
Collaborator

pitbulk commented May 10, 2021

We use travis. but it seems the travis.org was frozen, so we need to move to travis.com
https://travis-ci.org/github/onelogin/ruby-saml/builds

@urbanautomaton
Copy link
Contributor Author

urbanautomaton commented Aug 8, 2021

Edit: this can broadly be ignored, see next comment.


Thanks for the notes, @johnnyshields! All of your points are good ones, but I'm a little wary of expanding the scope of this PR to attempt full coverage of the ISO8601 duration spec, given the indirect test coverage we've got to work with. Since two independent PRs have been raised for troubles with milliseconds specifically, I'd probably favour merging this to begin with and looking at full coverage as a follow-up.

I did have a quick look for libraries that could do this whole task for us, but unfortunately the most commonly-used one (outside of ActiveSupport), ruby-duration, requires ruby 1.9, while ruby-saml still supports 1.8.7. They do point to the last version that supported earlier rubies, though, so we could potentially do another conditional dependency in the gemspec.

@pitbulk what do you think? Are you open to pulling in an extra dependency for this task, or would you prefer to continue updating this gem's implementation?

@urbanautomaton
Copy link
Contributor Author

In fact, after a bit of spec poking I think we actually weren't so far from having full spec coverage, so I've added a couple of commits to do that; we shouldn't need another dependency.

I've also rebased to pick up the new github actions build.

@pitbulk
Copy link
Collaborator

pitbulk commented Aug 12, 2021

Can we add more unitest to cover the different scenarios? maybe you can carry them from the gem dependency you found

@urbanautomaton urbanautomaton force-pushed the sc-duration-regexp branch 2 times, most recently from 412d664 to ac85010 Compare August 13, 2021 09:34
Simon Coffey added 2 commits August 13, 2021 10:37
This adds some more comprehensive unit testing for the .parse_duration
helper, since we'd like to modify it for better coverage of the ISO8601
duration spec.

Since the helper both parses the duration and adds it to the current
time (by default) before returning it, I've used the unix epoch as a
reference point instead and formatted the result as an ISO timestamp
before asserting, to hopefully make any errors clearer.

I've not gone for full coverage, but have tried to test the main edge
cases. In subsequent commits I'll refactor the parsing and introduce
support for fractional values.
The ISO 8601 parsing regexp is moderately complex, and uses numbered
backreferences to extract data, which is prone to breakage if new
capture groups are introduced (either inadvertently or deliberately).
This also means every regexp group we don't want to capture must be
anonymised using (?: ...).

In modern rubies we could use named capture groups to clarify things;
however since these are only supported since ruby 1.9 and ruby-saml
supports 1.8.7, that's not an option.

However, we can still use /x to introduce whitespace and source code
comments, which this commit does, splitting each meaningful capture
group onto its own annotated line.

Doing this, it also became clear that some of the groups weren't
necessary, e.g. the two groups surrounding each part of the ( YMDHMS
format | week format) disjunction, so I've reduced the nesting somewhat,
too.
@urbanautomaton
Copy link
Contributor Author

@pitbulk Sure, yep, I've:

  1. Added unit tests in an initial commit to cover the pre-existing behaviour
  2. Rewritten the regexp for clarity second
  3. Then added full H/M/S decimal support in a single commit

The tests pass at each stage.

I didn't borrow the examples from the other gem in the end, because our method applies the duration to a timestamp as well as computing it whereas theirs produces a duration object, but I think I've gotten reasonable coverage of the major cases we want to support. See what you think!

"P1Y1M1DT1H1M1S" => "1971-02-02T01:01:01.000Z",

# Negative duration
"-P1Y1M1DT1H1M1S" => "1968-11-29T22:58:59.000Z",
Copy link
Collaborator

@johnnyshields johnnyshields Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are negative values supported for each unit, e.g. P1Y-1M == 11 months?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain that negative durations are part of the spec, as it mentions them only once, to say:

2.1.6
duration
non-negative quantity attributed to a time interval

From conversations in the moment.js repo (1, 2) it appears that this is an area where implementations vary, some supporting prefix signs, some supporting infix.

For the purposes of this PR, I'm just trying to cover the existing code, which supports a prefix sign alone.

Copy link
Collaborator

@pitbulk pitbulk Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some info about the xsd:duration type: http://www.datypic.com/sc/xsd/t-xsd_duration.html
It seems infix should not be valid.

The numbers may be any unsigned integer, with the exception of the number of seconds, which may be an unsigned decimal number. So P15.5Y seems invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool, thanks for the extra info!

As a result of the earlier conversations I've permitted decimal hours and minutes; would you like me to back that change out, since the xsd:duration spec is tighter than the ISO one?

Copy link
Collaborator

@pitbulk pitbulk Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, and sorry for providing late this info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, glad to have it resolved! I've updated the branch and referred to the xsd:duration docs in the commit message.

Simon Coffey added 2 commits August 13, 2021 11:41
Since the introduction of cacheDuration parsing in 25cbddd we've been
seeing parsing failures for one of our IdPs, whose cacheDuration value
is set to

    cacheDuration="PT6H0M0.000S"

This is a valid ISO8601 duration, however the regexp being used for
parsing doesn't provide for the possibility of milliseconds.

The ISO8601 spec[1] supports decimal values for at least some of a
duration's components (S4.4.3.2):

  > If necessary for a particular application, the lowest order
  > components may have a decimal fraction.

It's not entirely plain which components this might refer to, however
the xsd:duration spec[2] is plainer:

  > The numbers may be any unsigned integer, with the exception of the
  > number of seconds, which may be an unsigned decimal number.

This commit therefore adds fractional seconds support to the duration
parsing. To ensure that we actually respect these values, I've updated
our parsed value coercion to produce floats for all components. This is
arguably unnecessary for the non-float components (since the regexp will
only accept integers), but it allows us to reduce the code with no ill
effects.

The ISO spec permits arbitrary decimal precision, and supports both
commas and dots as the decimal separator, so since the xsd:duration
schema doesn't mention either concern, I've followed the ISO spec here.

[1] https://www.loc.gov/standards/datetime/iso-tc154-wg5_n0038_iso_wd_8601-1_2016-02-16.pdf
[2] http://www.datypic.com/sc/xsd/t-xsd_duration.html
We can condense our code a bit by handling the sign at the point where
we coerce the duration parts from strings to numbers.
@johnnyshields
Copy link
Collaborator

Let's merge?

@pitbulk pitbulk merged commit 018d079 into SAML-Toolkits:master Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants