-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store version ranges #140
Comments
Context Security advisories as a part of convention provide patched/vulnerable packages using a version range. For eg ruby provides this https://github.com/rubysec/ruby-advisory-db/blob/6efbdb053cbe41e55f435ddee25a1562fb73f3f2/gems/actionpack/CVE-2012-1099.yml#L23 When an advisory provides such version ranges we in ideal case want to convert the version range into a discrete set of versions, ie "resolve the range". We do this by
The actual problem : In some cases there is no API to carry out step 1 . ie Version ranges can't be resolved into discrete versions. We want some way to store such data in a similar manner when we have discrete versions. Eg https://security.gentoo.org/glsa/202009-18 |
Maybe it is not necessary to require that version ranges should be resolved a-priori (that is, resolved to generate a finite set of existing artifacts); they could be used to determine a-posteriori whether or not a given artifact is in the range or not. The advantage of the latter method is that you would not need the API to determine what exists. One way could be to allow only the "last segment" of the version to vary; for example: 2.2 and up --> 2.2, 2.3, 2.4,....... (but not 3.0) Would this make sense? It's a compromise, this still requires enumerating a few intervals to achieve the correct semantics, but at least it is simple, hopefully easy to grasp, and quite flexible to cover the large majority of cases. Your thoughts? |
Ranges with single bound are problematic (they ignore backports). A more formal way of Unrelated to this : Primary motivation to store ranges ATM is avoid losing data from places like https://security.gentoo.org/glsa and then have something to fallback on when some specific version is not present in our data. |
@sbs2001 I think the pessimistic operator (which I ignored, thanks for the pointer) describes exactly the semantics I had in mind. Re: backports: I am not sure I understand what you mean; in any case I would stay away from using wallclock time (as in released-before, or released-after) when determining if a version is "after" or "before" another: I guess what matters are version identifiers: for sure 3.1 is after 3.0; but is 2.1 after 3.0? maybe, or maybe not, we need a separate interval expression for 2.x |
@sbs2001 you wrote:
Do you mind digging this up and pasting the chat log in a comment here? |
For reference there is a related Package URL PR by @david-a-wheeler at package-url/purl-spec#93 that I need to reply to and tickets at package-url/purl-spec#66 and package-url/purl-spec#84 |
Here are some thoughts and background rehashing for reference: ContextThere is no (mostly) universal syntax for version ranges and there is no (mostly) universal ways to compare two versions. Each package type may define their own syntax and semantics. For instance:
ProblemVersion ranges are useful because they can help to map a future, not-yet-known package version to a known vulnerability impacting it if that package version is within that range. Some solution elementsThere are a few things to consider:
As for the problem at hand here:
So in recap IMHO we should ideally:
|
@pombredanne - thanks for the excellent list of examples! That at least gives us specific examples to compare. |
@david-a-wheeler frankly I wished everyone would use semver ;) |
Here are some relevant messages regarding that matter. You can search gitter using this and read the whole thing there :) . This is borderline cryptic without the context.
|
My thinking on this has morphed a bit since my initial comments in package-url/purl-spec#84 My use case revolves around semver (and only semver) and it's still painful. The example: introduced in version 7.0.0 and 6.5.2 and fixed in version 7.1.1 and 6.8.12 is VERY hard to capture in a way that is easy to understand or parse. I now plan to create a service focused specifically on PURL IDs. The PURL API will be a way to get a listing of all product names, release versions, date of release, and other metadata I find useful along the way. The vulnerability data will list only vulnerable PURL IDs. If the version isn't listed, it's not affected. The vulnerable IDs field will be quite large in some cases, but that's OK because this data is for machines not humans. I envision the workflow to look something like
Or
As I work on this problem I have no doubt my thoughts will change again. |
You are not really solving the problem.
I was on the same page as you until I had a chat with @pombredanne regarding using "chronology" as a basis for version comparision :
And also
would need some comparator function. This approach would work if you are working with just some sane versioning scheme, like semver. |
@joshbressers I am curious, how would you obtain |
Hah, I figured chronological would be a less confusing way to describe this all, I was clearly wrong :) Let's ignore that word. What I really want is a list of releases in order. I wrongly assumed dates could do that Here is an example My version list in the order it was released looks like this
I know there is a vulnerability in
So we end up with something that looks like this
Now I can figure out the closest fix pretty easily. |
I am building this service for the products I work on. I need machine readable data, and I get to control everything that's happening. It's a very different problem than the general community. |
neat! if you think there is some bits and data that could be useful feel free to reach out! |
Thanks @pombredanne! Everything I do will end up public on github, I'll certainly be looking for honest feedback :) |
same here! 👍 |
And a few extra references to versions specs in the wild:
|
Very interesting discussion; I have not gone through all the different version specification schemes and I am somewhat familiar with only a small subset of them, but I suspect (should I say, hope) they are all variants of a general scheme x.y.z.j.k.h where versions can be arranged in a tree with a depth (figure from : https://link.springer.com/article/10.1007/s10664-020-09830-x) Can you show an example of versioning scheme that does not fit this (possibly naïve) generalization? |
There are exceptions. Sentimenal versioning lists some examples. In TeX and METAFONT (two tools widely used in mathematics), new versions add a new digit approaching an irrational number. The version numbers of TeX approach π (the current version is 3.14159265) and the version numbers of METAFONT approach e. Perhaps more importantly, projects occasionally CHANGE their version number schemes. This is made famous by Bill Gates counts to 10. The Windows version numbers are (overly simplified) as 1, 2, 3, 3.1, 3.11, 95, 98, NT, 2000, XP, Vista, 7, 8, 10. The solution used by the packaging formats rpm (for Red Hat, Fedora, CentOS, etc.) and deb (Debian, Ubuntu, etc.) is to add "epoch numbers", integers that notionally precede the "normal" version number. See the Fedora docs on this and t the Debian docs on this. Typically an epoch, if included is written as the epoch number, colon, then the "normal" version number. One quirk: in rpm, an epoch epoch number is lower than anything with a given epoch number, while in Debian an "empty" epoch is considered 0. I think we need to at least support epoch numbers, because otherwise there's no way to handle people who change version number schemes, and that is the standard way to do it. |
@copernico you wrote:
I think that your generalization works as it stands. Even if Debian and RPM packages use of epochs as pointed by @david-a-wheeler the epoch would still fit in a tree view of the versions world as the first optional segment. IMHO the variations are on how you would create that tree that would require to compare version and things that do change are whether:
I cannot fathom of a (mostly) universal way to organize the tree by comparing the versions reliably (reliably being the difference between stating that a version is not vulnerable vs. vulnerable for instance e.g. rising a false negative) with a single algorithm that is not package-type specific. The closest that would come to mind would be @AMDmi3 's awesome https://github.com/repology/libversion which has a great doc highlighting the complexity of trying to get things right at scale https://github.com/repology/libversion/blob/master/doc/ALGORITHM.md and that support most everything including distros versions. And also @orsinium https://github.com/dephell/dephell_specifier with support for Python PEP-440, Semver, Ruby, npm and Maven |
I and @pombredanne recently had a discussion regarding how to handle version ranges of packages in the context of vulnerablecode. And we decided to run a little experiment. We would store concrete relationships between packages(these include version) and vulnerabilities the same way we are already doing. Now coming to new things : We would have another table like :
Eg value of Now if a user asks for vulnerability status of some version of npm package For resolving ranges we would be using #140 (comment) 's 2.1 point . The universal syntax would be more or less a stripped down version of PEP 440 (this is just an experiment). Periodically we would also fetch all versions of packages contained in @pombredanne correct me if I misunderstood you anywhere :) |
@sbs2001 this makes ++ sense. To recap and reformulate my understanding this would mean:
And your approach boils down to going from the most to the least specific:
|
Repasting here the design for version ranges from #119 (comment) and updating it at the same time: Version ranges specifierA version ranges specifier is a string with this syntax:
For example
Notes and caveats:
Some of the known schemes and their codes are:
Implementationhttps://github.com/nexB/univers by @sbs2001 implements this spec https://github.com/nexB/univers Usage in VulnerableCodeHere is the design we discussed to put version ranges to use here. One problem is that the package version ranges a vulnerability applies to may be misleading after they have been published unless they are updated. For openstack/ossa@777e7b7#r51222097 was last updated in 2014 and does not apply to "All versions", but really only to package versions known at the times this advisory was published. A related problem is unbounded version ranges, or the lack of version ranges altogether, where an advisory tells when a vulnerability is fixed but not when it appeared, such as https://github.com/mozilla/foundation-security-advisories/blob/master/announce/2016/mfsa2016-14.md The difficulty is that we do not want to miss reporting any version that is vulnerable (a dangerous false negative) yet we do not want to pollute the reporting with package versions that are not certain to be vulnerable (false positive). As a solution, the proposed design tries to handle these two cases:
For instance with openstack/ossa@777e7b7#r51222097 that last updated ~ 7 years ago, the confidence that it applies to a package version released in 2021 should be fairly low. Also since confidence and version ranges specifier are stored they can also be refined and curated by hand in the future. Therefore, in addition to concrete relationships between package versions and a vulnerability we want to store also a version range with these specifics:
The confidence values that will be returned with this query should be based on a few factors such as:
When storing ranges the unbound ranges are a possible source of problems as they may resolve incorrectly to version that are NOT affected by a vulnerability. To cope with this we should be able to query and find all PackageRelatedVulnerability and Vulnerability that an open e.g. that are missing a lower bound, and upper bound or have no bound to use as an input for reaching out to upstream data sources or package projects, to create a wall of shame or as an input to curation and review. We also need to revert the changes in #436 and ensure that we effectively store all the concrete relationships as defined here. |
Here is an example with real data:
Based on this:
Tomorrow:
The day after tomorrow:
So:
|
Add release information to review tags.
FYI, this is an interesting related ticket: CVEProject/cve-schema#87 |
In particular this comment I posted is of relevance here: |
Thanks for raising this, I am closing this now and I will let Philippe merge the purl vers PR package-url/purl-spec#139 now we have something that mostly works for the version range. |
To support vulnerabilities that impact or fix a package version range, we would need to store that data.
In addition to that, we would also store the relationship between vuln and package for every known packages at the time we create or update a vuln.
This is follow up of #119
The text was updated successfully, but these errors were encountered: