Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcards in purl? #84

Open
copernico opened this issue May 7, 2020 · 12 comments
Open

Wildcards in purl? #84

copernico opened this issue May 7, 2020 · 12 comments
Labels

Comments

@copernico
Copy link

In certain applications it would make sense to specify certain parts of purl's as intervals or wildcards.

Examples:

  • pkg:npm/foobar@12.3.*
  • pkg:npm/foobar@12.3.[0..11]

Is this covered in the current specification? Will it be? Or is something that is out of scope (e.g., because consumers of purl can still decide to parse strings containing such wildcard expressions according to application-specific semantics?).

Thanks, keep up the good work!

@stevespringett
Copy link
Member

The spec does not provide this capability. There are currently no plans to support this as there have been no proposals that work across all ecosystems. Related to #66

@copernico
Copy link
Author

copernico commented May 15, 2020

@stevespringett Thanks for your reply. In our project, we're settling on using a super-set of purl, where version intervals and wildcards are allowed, but those "wildcard" expressions are ultimately expanded to "constant" (standard) purls.

A more general question is: how uncommon are scenarios where one needs to do something similar to what we are doing...because if that is a common need, perhaps, before everyone diverges and adopts a different solution, some alignment might desirable....

@stevespringett
Copy link
Member

There is certainly a need to have ranges and wildcards. Lots of orgs and various projects have a need for it. However, I have not seen any proposal set forth that would:

  1. Provide the ability to specify wildcards.
  2. Address the impact it would have on location.

For example, if a wildcard or range is specified, how does that impact WHERE the package is located? Case in point, some packages start out being published in one package repo, and move to being published to a different package repo. A good example of this is https://github.com/everit-org/json-schema/ where the project started publishing artifacts to Maven Central, but moved to Jitpack for increased security and transparency in the resulting artifact.

Since PURL has a default package repository for most PURL types, a wildcard or range would assume the location remains constant, when it fact it may not.

Proposals welcome.

@pombredanne
Copy link
Member

@copernico @stevespringett here are some thoughts on the topic
(re-posted from ossf/wg-vulnerability-disclosures#28 (comment) )

Version ranges (or more generally version constraints and that would include any wildcards or globs) are not an easy thing as there cannot be a universal definition since there is no universal way to represent and compare versions across all package types.

To add them to purl, we could either have a partly universal/partly type-specific approach or be entirely package type-specific.

a partly universal/partly type-specific approach

  • Define a base universal notation (universal as used for any package type / ecosystem)
  • Delegate the semantics of version comparisons (which are essential to a proper definition of an ordered range of versions) to each package type with a possible sensible default (such as semver)
  • have a conversion/mapping that would translate the universal notation to the package type-specific one

an entirely package type-specific approach

  • Accept that version ranges are entirely package type-specific and do not attempt to define some universal notation and mappings
  • And always use the package type notations/version comparison.

IMHO the package type-specific approach is simpler, self-explanatory as it does not define any new notation and convention by reusing the package one and above all is always correct. Trying to define some universal notation has been something attempted in CPE and I am not sure this has been working OK. It would always be a compromise of sorts with some dark corners. The semantics differences of Python vs. Debian vs.RPM vs. Ruby vs. npm and semver are often minor but I think the details can matter a lot.

In all the cases, the other item is about where to put this version range. I do not think it belongs to the version attribute. Instead it should be its own and would therefore be best as a standard qualifier with a name TBD such as version_range or version_constraints.

Some ideas of what the semantics could be:

  • if a version is provided then that's the one version this purl is for.
  • if no version is provided and a version range qualifier is provided, then the purl if for that range
  • if no version and not version range qualifier is provided, then the purl if for any version

@copernico
Copy link
Author

Hi @pombredanne , thanks for this nice summary of the two options; I have to admit I do not have a clear idea of the semantic differences of versioning among the different package types. Would you have a few examples to clarify the cases where they are so different that deserve special treatment?

@joshbressers
Copy link

I'm in the process of turning advisories into json using the OSSF wg-vulnerability-disclosures schema as the starting point
https://github.com/ossf/wg-vulnerability-disclosures/blob/main/src/schema/vulnerability.schema.json

The current version semantics in that schema are nightmarish so I wanted to see what purl would look like. I've found myself in a place related to this issue.

I can list every vulnerable version of my application. It's a very long list. while I don't really care since the target audience of my json is computers, there will be some who complain about this. Obviously a wildcard could shorten this list substantially.

My current hangup is how to deal with the arrow of time.

I can list all vulnerable versions. so for example something like

pkg:generic/exampleco/example@1.0.0
pkg:generic/exampleco/example@1.1.0
pkg:generic/exampleco/example@2.0.0

I can list the fixed versions

pkg:generic/exampleco/example@1.0.1
pkg:generic/exampleco/example@1.1.1
pkg:generic/exampleco/example@2.0.1

But now I feel like how to deal with the future. How does one explain that versions >= 1.0.1 and < 1.1.0 are not vulnerable to something.

@pombredanne
Copy link
Member

@joshbressers @copernico more thoughts on version ranges and/or wildcards... I would likely prefer ranges over wildcards, though both could be combined in a single syntax... that is, if there were to be a single syntax!

Here are a few thoughts that I first collected there aboutcode-org/vulnerablecode#140 (comment) as we are also working a concrete implementation (which I think is a good thing as it provides for concrete experimentations)

There is no (mostly) universal syntax for version ranges and there is no (mostly) universal ways to compare two versions.

@copernico re:

Would you have a few examples to clarify the cases where they are so different that deserve special treatment?

Several package types define their own syntax and semantics. For instance:

Yet, version ranges are useful and used in dependency specs and for vulnerabilities. In that later case they can help relate a future, not-yet-known package version to a known vulnerability impacting it if that package version is within that range. (and a reminder that Package URLs are not only for vulnerabilities)

Since there is no universal syntax and algorithm for version comparison we could either:

  1. define a (mostly) universal syntax for version ranges and be package-type specific ways to normalize to that syntax.
  2. OR use each package-type specific syntax and do not normalize it

Regardless of a generic or type-specific syntax, the semantics of versions comparisons would need to be package type-specific at best to be correct, and in some cases they may need to be specific to package instance e.g. there could be a collection of algorithm to choose from with type defaults and possibly an override at a package level (and a package-level version scheme feels rather contrived)

Mixing version and version ranges in a single field is going to be a source of confusion IMHO, therefore I would much prefer to avoid overloading the version with a ranges syntax: rather we should craft some new qualifier for that as suggested by @mprpic in #66 (comment)

@david-a-wheeler
Copy link

Mixing version and version ranges in a single field is going to be a source of confusion IMHO, therefore I would much prefer to avoid overloading the version with a ranges syntax: rather we should craft some new qualifier for that as suggested by @mprpic in #66 (comment)

A new qualifier would be fine. That might be even better, because then it would be less-hard to replace the version syntax with a different one (just use a new keyword). I just want to be able to represent sets of versions.

@pombredanne
Copy link
Member

pombredanne commented May 3, 2021

After a bit of experimentation with @sbs2001 with version ranges there are a few things that became clear to me:

  1. it is possible to have a simple and mostly universal syntax for version ranges
  2. let's not overload the version component of a purl but instead add this as a purl qualifier. It could also be specified for use anywhere outside of package URLs.
  3. there is no such thing as a universal way to sort versions. There is a limited number of ways to sort them (much less than package types) but still no way to have a universal way on how two version of a package type are compared.

Therefore a universal syntax for version ranges cannot make full sense without an indication of the version "scheme" that is used to compare the versions in the range.

See also aboutcode-org/vulnerablecode#119 (comment)

Here is a proposed approach as a mini spec for such version specifiers and version ranges:

Universal version specifiers syntax:
<scheme>:<range>,<range>, ...

For instance:
semver:1.2.3,>=2.0.0

With these operators and syntactic elements:

  • Each range is declared this way:

    • "=": Version equality operator. Implied if not present: means the version should be exactly that as in "=1.2.3"
    • "!=": Version exclusion operator. Means the versions range should exclude a version "!=1.2.3"
    • "<=", ">=": Inclusive range operator such as "<=1.2.3" which means all versions less than or equal to "1.2.3"
    • "<", ">": Exclusive range operator such as "<1.2.3" which means all versions less than "1.2.3"
  • Multiple ranges can form a larger version ranges specifier separated by a comma
    such as in ">=1.2.3,<2.0.0,!=1.2.4" which means all versions greater than or equal to "1.2.3" but less than "2.0.0" and not "1.2.4".

  • Spaces are not significant and are removed in the canonical form: "!=1.2.3" and "! = 1.2.3" are equivalent.

  • The ordering of multiple ranges in a specifier is not significant. The canonical ordering is TBD.

  • A version cannot contains version syntax characters (><=!,): these need to be quoted using the URL quoting rules.

The comparison of two versions as "less than" or "greater than" is entirely defined by the scheme and is scheme-specific. A version scheme covers both a certain version ranges syntax and how two versions are compared. We can define how a given scheme syntax can be transformed to this new universal syntax. Schemes are related to Package URL types in the sense that each Package URL type is related to one version scheme, but multiple types can reuse the same scheme, such as semver.

Some known schemes and their codes are:

Note that Apache Maven and NuGet are following more or less a math intervals syntax https://en.wikipedia.org/wiki/Interval_(mathematics)

See also: aboutcode-org/vulnerablecode#119 for many pointers on how to existing libraries

Also @sbs2001 did put together a Python library to experiment with handling these versions and deal with scheme-specific version range syntax conversion, version sorting and comparison in https://github.com/nexB/univers and https://pypi.org/project/univers/ as well as checking if a version is in a range: https://github.com/nexB/univers/blob/63bd5aec16ec95b5b811ede638ac225f3ab1f6c6/src/univers/version_range.py#L25

@coderpatros
Copy link
Contributor

Each range is declared this way:

* "=": Version equality operator. Implied if not present: means the version should be exactly that as in "=1.2.3"
* "!=": Version exclusion operator. Means the versions range should exclude a version "!=1.2.3"
* "<=", ">=": Inclusive range operator such as "<=1.2.3" which means all versions less than or equal to "1.2.3"
* "<", ">": Exclusive range operator such as "<1.2.3" which means all versions less than  "1.2.3"

I think the encoded values for these will make version ranges hard to grok for humans.

Maybe something like eq for =, lte for <=, gte for >=, lt for <, and gt for >.

Would need to double check, but pretty sure (, ) and ! don't require encoding.

So the example semver:>=1.2.3,<2.0.0,!=1.2.4, or semver:%3E%3D1.2.3,%3C2.0.0,%21%3D1.2.4, could perhaps be represented as semver:gte(1.2.3),lt(2.0.0),!eq(1.2.4).

Note: I might be mistaken on encoding requirements. Validation of that by someone else would be appreciated.

@coderpatros
Copy link
Contributor

A version cannot contains version syntax characters (><=!,): these need to be quoted using the URL quoting rules.

Just re-read this part. Pretty sure you can't use < or > unencoded in URLs.

@pombredanne
Copy link
Member

@copernico Please see #139 for a first shot as an approach to resolve this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants