-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version range #66
Comments
If nothing else, capturing the various ecosystem version sorting is really key. Then you could at least use purl ids as upper and lower bounds and still evaluate what's in between. |
@iamwillbar there is a need a alright to have a common way to express version ranges... but I wonder if this is possible, because there is no universal way to express versions. Semver comes close (but cannot handle some epochs or Debian "or" AFAIK) @brianf @iamwillbar if you were to provide some unified specification for version ranges what would it look like? |
@pombredanne I suppose nothing is preventing purl users from specifying the versioning scheme in a qualifier, e.g.
If we were to use the Python one, and use this security release as an example, the purls for the released versions could be:
Mandatory qualifiers are not a thing in the spec however, so this would solely depend on maintainers of said projects to use them. |
Sure thing nothing prevents this, and we could even make it part of the spec too. That said, in the context in the context vulnerability reporting is there really a need, value an correctness to use a version range? I started wondering about this when @sbs2001 mentioned it in https://gitter.im/aboutcode-org/vulnerablecode?at=5f70857f5a56b467a5f2a835 At a point in time I can state that a list of concrete and discrete versions (not a range but a list) are subject to a certain vulnerability, and that there is a list of concrete and discrete versions (not a range but a list) in which that vulnerability has been patched/resolved fixed. This could be a list of Package URLs or a list of versions, not ranges. Anything that uses a range or some wildcard is either potentially incorrect or misleading or both, which to me makes the range value both low and/or dangerous. And this is likely even more so when looking as distro packages such as RPM or Debian packages that would add patch numbers to the upstream version scheme with a releases/build number and or epoch and the affected versions would rarely resolve to a proper range but always be correct when using a list. Does this make some sense? |
@mprpic this thread has a good argument from @copernico for the need of version ranges https://gitter.im/aboutcode-org/vulnerablecode?at=5f7231fa6e85e0058c5f4aaf |
Now here are some aesthetic considerations: A purl
Hum 😒 |
Ugh, I figured it would make purl version ranges unreadable. If ranges will be included, I don't see a way to eliminate the aesthetic problems it creates. The only thing I can think of is maybe to break down the version clauses into individual purl qualifiers rather than a single version_range string. That would likely make the purl readable without as much encoding. |
Hey folks, looks like this issue has gone stale, but I'd love to restart the conversation. @pombredanne 's suggestion seems entirely reasonable, even if the URL encoding of the characters makes it less human readable. |
@jhutchings1 the issue has not gone stale at all ... it is just that we are making practical experiments with a version range separately in another repo! There is a draft spec there (it would need to be extracted and brought here) in there: ATM the draft starts to have some legs ... but I need to play with actual real working to validate that this can work at scale. |
@jhutchings1 actually I separated a draft spec in a clean branch here aboutcode-org/univers#11 |
/me takes deep breath i'm gonna break a personal and OSS best practice rule and spread unsupported FUD. Sorry. I'm only doing it because i see concrete progress being made here, and i think not saying something may be more harmful. i've come to believe that version ranges are, in general, harmful. i do have an alternative that i've been working on for a while - it's not public because it's unfinished, but the relevant bits are plausibly finished enough for Totally understood that my unspecified, general concern should not block actual progress, though. |
Maybe playing MisterObvious, but IMHO the issue is not version range or version list: the root cause of our headaches is version computability (operators ==, <, >, <=, >=), and while it is more or less OK for Semver or CalVer, except for some wildcards and attributes corner cases, it is indeed a tough one. When the NVD introduced the new way of defining ranges with
I'd suggest extreme care: the NVD people have been working on software inventory for 20 years, are not stupid, and yet kind of failed (at least for the use cases we now have). There has been a very long discussion on the topic in the upcoming CVE JSON Schema development, now in v5.0.0 release candidate 5. I can't find the exact the exact discussion reference back, but as of today on CVE / CPE side the outcomes are there: |
@jbmaillet re:
yes it is! and a range in any notation demands to be informed by how two versions are compared. In the experimental spec at aboutcode-org/univers#11 for a compact range notation and in the WIP companion working implementation at https://github.com/nexB/univers/tree/main/src/univers by @sbs2001
The WIP spec has extensive research on the topic when used for vulnerable ranges, including the NVD approach, but also when used for package dependencies ranges. The NVD The draft "vers" specs tries to address this with a slightly different goal to have a compact yet obvious notation for version ranges. At this stage it makes sense that I move the draft at aboutcode-org/univers#11 to a PR here as the two are closely tied! :) |
@jbmaillet and everyone here ... See #139 .... comments are badly needed. |
Indeed. But,
i'd say "less," at least for semver, where the tendency is to construct ranges with bounds on versions that may not yet exist - and even if they do, versions may come to exist after the publication of the range. But, i see this comment CVEProject/cve-schema#87 (comment), particularly:
and suspect that if you're embracing ranges while having accepted this, then the additional things i could add will be of marginal value, which is OK. /me bows out |
I'd agree that version ranges as a mechanism for choosing dependencies is generally bad. (Hence why things like LATEST and RELEASE were deprecated in Maven 3 years ago). However for this spec, we still need a way to express ranges, eg this vulnerability applies to versions x to y. IOW ranges are required to be expressive generally, but using them to declare dependencies is a bridge too far...but I don't think that's the point of the spec here. |
@sdboyer Hey! it has been a while.... great to for you to drop by! Let me ping you on twitter. I am pombr there
I agree ++. I am intrigued by what your alternative could be! Eventually ranges are all leaky and make false promise at some level and in practice, only full enumerations might be correct yet ... they do exist in the wild and capturing the wild beasts is what I want somehow. @brianf re:
Exactly.
The spec does not take a stand on how ranges would be used and it could be used to depict vulnerable or dependent ranges. |
Ranges in Maven are very rarely used. Going way back to the start of Maven
2 it was understood that it was an anti-pattern with limited use cases and
that ultimately was memorialized with the Maven 3 changes I referenced
above. Build reproducibility was the key reason to discourage ranges back
in the day.
…On Tue, Nov 30, 2021 at 1:32 PM Philippe Ombredanne < ***@***.***> wrote:
@sdboyer <https://github.com/sdboyer> Hey! it has been a while.... great
to for you to drop by! Let me ping you on twitter. I am pombr there
i've come to believe that version ranges are, in general, harmful.
I agree ++. I am intrigued by what your alternative could be!
Eventually ranges are all leaky and make false promise at some level and
in practice, only full enumerations might be correct yet ... they do exist
in the wild and capturing the wild beasts is what I want somehow.
@brianf <https://github.com/brianf> re:
However for this spec, we still need a way to express ranges, eg this
vulnerability applies to versions x to y.
IOW ranges are required to be expressive generally,
Exactly.
but using them to declare dependencies is a bridge too far...but I don't
think that's the point of the spec here.
The spec does not take a stand on how ranges would be used and it could be
used to depict vulnerable or dependent ranges.
I have a question though wrt. Maven: how common would you say using ranges
are?
https://maven.apache.org/pom.html#Dependency_Version_Requirement_Specification
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#66 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAPWCFRGZD36J2VXUE5HM3UOUKDJANCNFSM4JBPE3GA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@brianf re:
Thanks... this confirms my impression. As an aside, it's funny that dependency ranges have been mostly abandoned by Maven, yet are fairly prevalent in Python, npm and Ruby package manifests, commonly accompanied by an extra full enumeration of pinned versions a.k.a. a lockfile. |
Maven takes the stance that you should be locking by default, with tooling
to make updates when you want/need to. Other systems take the opposite
approach which is why you see the prevalence of lockfiles to achieve the
same thing.
…On Tue, Nov 30, 2021 at 2:18 PM Philippe Ombredanne < ***@***.***> wrote:
@brianf <https://github.com/brianf> re:
Ranges in Maven are very rarely used.
Thanks... this confirms my impression. As an aside, it's funny that
dependency ranges have been mostly abandoned by Maven, yet are fairly
prevalent in Python, npm and Ruby package manifests, commonly accompanied
by an extra full enumeration of pinned versions a.k.a. a lockfile.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#66 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAPWCCWIN62RPRKCUJSXJTUOUPRTANCNFSM4JBPE3GA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
(This is a long rant, but you can jump to the conclusion.) To give a bit of context to my comments, and self introducing: I work in the IoT/embedded field, for automotive systems, cybersecurity (plus a bit of OSS licensing). That's Linux, Android, AUTOSAR, FreeRTOS as environments, and a SLOC count of 95% of C/C++ the rest being Java or Kotlin. Everything is built from sources, either from archives + good old autotools or from plain git repositories (think a la AOSP or buildroot or Yocto). The source code is often a fork from upstream, at least for the Linux kernels (the SoC vendors fork the kernel, and we fork it again for our own customization, or for some CVE or plain bug backports because upgrading is extremely painful in such context). Last time I checked, the Linux kernel on an LTS branch such as those we use has 13000+ Kconfig options: in a typical industrial product, only 20% of the code is actually compiled (and hence only roughly 20% of the CVE are relevant, for example). An Android source tree (which is more than AOSP, because AOSP does not come with a kernel for your SoC CPU nor bootloader nor hypervisor etc) + our added OSS and our proprietary code is about 100GB before build, with about 800 git repositories (AOSP for Android 12, without kernel nor added customization, consist of 1079 git repositories as of today). And it is only part of a system/product, and of course we have several systems/products. As a result, a complete system/product will have about 1000 CVE to track (yes, a thousand), 95% of which being false positives (code not compiled per build configuration, fix backported, but mostly poorly document CVE, more on this latter) in hundreds of reposirories. Plus our suppliers private advisories etc. In this context, I use CVE (and other sources), and so I'm stuck with CPE for now, ready to jump to SWID when they will be used by the NVD. I do not (yet) use purl, nor SPDX, both by lack of need and by lack of spare time (also, we have our own internal tooling and processes in place). But I consider any effort in software inventory, version computation/matching, dependency tracking, SBOM as important for my securities assesments and OSS licensing compliance , and I try to follow the development in these areas. So BTW: thank you all for your work. This being said:
And this is not done because it is too much of an analysis workload... so the this workload is transferred to auditors/analysts like me. Considering the kernel is by far my biggest volume and flow of continuously incoming CVE, that most kernel maintainer don't care about CVE (some of them even making it a personal matter), that this situation has always been so (even when versions where - partially - listed in the NVD before the UpToExcluding etc syntax), the kernel organization or Linux foundation is not and does not want to be a CNA a full enumeration of versions will never, ever, work. Version range is the "least worst" option. Google with Android is even worst: they put all there CVE in a unique CPE CONCLUSION: Don't get me wrong: a full version list could work in theory, it would be suitable and great, but it does not match the industrial reality. And it's in great part a question of people and organizations, not a question of specification. So computable version range are hard, but they are a MUST. You can enumerate version for a libfoobar that has new CVE once per month or quarter, but this does not matter if you do not address code base such as the kernel, with as of today more than 500 CVE on its 4.14 branch, had close to 2500 CVE in all its history, or Android (close to 3800 CVE in all its history) and new ones coming every weeks: these are (some) of the hard cases to address I deal with daily, I imagine there are others in other ecosystems/industry, I seen hundreds of CVE on Windows/Oracle/Citrix/IT products or Jenkins/Atlassian/tooling passing every week. Sorry for this long rant. At least, if you never work in the embedded field, now you know why "the S in IoT is for Security". ;-) |
This makes all sense and I agree saying this is a mess is an understatement! I am rather familiar with contexts similar to yours and this brings a question (and possibly something we could craft into some project): assuming that you can efficiently determine and trace the subset of kernel code that you use in a given build , what is the minimum you would need to be able to sort CVEs there? Would knowing the fixing commit (and therefore a fixing patch) be enough as a first pass to determine if the built code subset contains the fixable code? I feel that you are likely solving an important problem and that there may be a way to pull and pool energies to fix this together (probably elsewhere, not in purl proper) (side note: I have somewhat efficiently used |
@pombredanne , in my experience, on the kernel which is my hard case, there are:
These 2 sets of course overlap. CVE documentation is most of the time terrible, but you can still cross-leverage on it to help the situation. Firs the build configuration: it is very easy and build agnostic to generate a compilation database using for example a tool such as Bear (don't get blocked on the Clang aspect: it works as well with regular or cross gcc) (same for the CMake things: it works fine with good old GNU make or totally alien build systems such as Android with its ninja/soong). The only limitation is C/C++. Note that there are tools similar to Bear for other languages / ecosystems. It is also very easy to get a list of files implied in a CVE, if they are mentioned in the CVE description as is often the case for the kernel, just by using good old regexp. Then you cross both sources of information and voila: you know if a file was compiled or not, and hence if the CVE is relevant or not. 80% of kernel CVE automatically sorted out as false positives. Then for the fixes and backports: The kernel does not mandate mentioning a CVE Id in a commit, so this is unusable[*]. But there is an official kernel documentation for a backport formalism in the git commit message. It is easy to spot CVE references which include a full git sha1, again using regexp. So you get the git sha1 fixe(s), you explore your git history searching for either the sha1 "as is" or the "sha1 as a backport" and voila, 20-50% of false positives automatically CVE sorted out. There are a few corner cases in both passes, but you get the idea. Also note that this can be used on other pieces of software, either similarly highly configurable, or that use the same backport formalism (for example GNOME GLib). The information still missing, is when / in which version was the bug first introduced? Some people do it, without giving the full details. PS:
Some colleagues and I used such an strace strategy circa 2011 for OSS and commercial licensing compliance, which was our concern at the time, with good results. But there are other and better technologies now, we would do it differently (see above for Bear as an example). *: It makes sense not to mandate a CVE Id in a commit. For example you might not get a CVE Id yet if your are not a CNA (which the kernel should be!) and still would want to do the fix. My metric figures show that since 2002, CVE Id have never been mentioned in more than 25% of the cases, topping in CVE-2013-NNNN. In August, is was around 15% for CVE-2021-NNNN. |
@pombredanne What is the current status of https://github.com/package-url/purl-spec/blob/version-range-spec/VERSION-RANGE-SPEC.rst? Ready for use? 🤔 |
@tschmidtb51 I am pretty satisfied with it at this stage. Unless there are objections I will likely merge it this week. |
@pombredanne In general, I like the approach. I flagged some details, where I think the spec should be improved for clarity and the benefit of simplicity (e.g. prohibit consecutive pipes and empty |
@pombredanne status? |
Ping ^ |
This is now merged in https://github.com/package-url/purl-spec/blob/master/VERSION-RANGE-SPEC.rst and I am closing. Thank you all! and please come with issues and PR to fix anything that does not work out. The vers thing is already in use in CycloneDX ECMA-424 https://tc54.org/cyclonedx/ and OASIS CSAF https://oasis-open.github.io/csaf-documentation/ Yeah! 🎉 |
@pombredanne I might have missed something obvious, but how is These comments https://github.com/package-url/purl-spec/pull/139/files#r811475168 in the PR are touching on this point as well but I don't see anything mentioned in the final specs. |
Is there desire for PURL to support version ranges or is that out of scope? For example, to describe vulnerable versions of a package.
The text was updated successfully, but these errors were encountered: