Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store version ranges #140

Closed
pombredanne opened this issue Jan 14, 2020 · 31 comments
Closed

Store version ranges #140

pombredanne opened this issue Jan 14, 2020 · 31 comments

Comments

@pombredanne
Copy link
Collaborator

pombredanne commented Jan 14, 2020

To support vulnerabilities that impact or fix a package version range, we would need to store that data.
In addition to that, we would also store the relationship between vuln and package for every known packages at the time we create or update a vuln.
This is follow up of #119

@pombredanne
Copy link
Collaborator Author

From dupe 248:

#65 and #84 don't provide concrete versions hence we need to store version ranges for that we must first improve the data models

@sbs2001
Copy link
Collaborator

sbs2001 commented Oct 7, 2020

Context

Security advisories as a part of convention provide patched/vulnerable packages using a version range. For eg ruby provides this https://github.com/rubysec/ruby-advisory-db/blob/6efbdb053cbe41e55f435ddee25a1562fb73f3f2/gems/actionpack/CVE-2012-1099.yml#L23

When an advisory provides such version ranges we in ideal case want to convert the version range into a discrete set of versions, ie "resolve the range".

We do this by

  1. Obtaining all the version released till date of the given package by calling some API. In this case the endpoint will be https://rubygems.org/api/v1/versions/actionpack.json .

  2. Now we classify the versions into two groups. a. Satisfies the range b. Does not satisfy any range. And then using some logic a group packages are either vulnerable or patched while b group packages have opposite vulnerability status than the a group ones.

The actual problem :

In some cases there is no API to carry out step 1 . ie Version ranges can't be resolved into discrete versions. We want some way to store such data in a similar manner when we have discrete versions. Eg https://security.gentoo.org/glsa/202009-18

@copernico
Copy link

copernico commented Oct 7, 2020

Maybe it is not necessary to require that version ranges should be resolved a-priori (that is, resolved to generate a finite set of existing artifacts); they could be used to determine a-posteriori whether or not a given artifact is in the range or not. The advantage of the latter method is that you would not need the API to determine what exists.
In practice though, one needs to enforce some limits on how to interpret intervals that are open-ended (for example, version 2.2 and up).

One way could be to allow only the "last segment" of the version to vary; for example:

2.2 and up --> 2.2, 2.3, 2.4,....... (but not 3.0)
2.2.1 and up --> 2.2.1, 2.2.2, 2.2.3 (but not 2.3.x)
3.0 and up --> any 3.x (but not 4.x, 5.x etc)

Would this make sense? It's a compromise, this still requires enumerating a few intervals to achieve the correct semantics, but at least it is simple, hopefully easy to grasp, and quite flexible to cover the large majority of cases.

Your thoughts?

@sbs2001
Copy link
Collaborator

sbs2001 commented Oct 7, 2020

@copernico

Ranges with single bound are problematic (they ignore backports). A more formal way of allow only the last segment is to use the pessimistic-operator https://thoughtbot.com/blog/rubys-pessimistic-operator . There's tooling for that alright.

Unrelated to this :
I actually proposed something similar which was to avoid resolving ranges entirely and store them directly in DB (you can find the convo with @haikoschol and @pombredanne in the chat if you scroll wayyyy up). We rejected the proposal primarily because assuming versioning format is a bad idea. My current opinion is to use ranges as a last resort/fallback.

Primary motivation to store ranges ATM is avoid losing data from places like https://security.gentoo.org/glsa and then have something to fallback on when some specific version is not present in our data.

@copernico
Copy link

@sbs2001 I think the pessimistic operator (which I ignored, thanks for the pointer) describes exactly the semantics I had in mind. Re: backports: I am not sure I understand what you mean; in any case I would stay away from using wallclock time (as in released-before, or released-after) when determining if a version is "after" or "before" another: I guess what matters are version identifiers: for sure 3.1 is after 3.0; but is 2.1 after 3.0? maybe, or maybe not, we need a separate interval expression for 2.x
Basically, if a fix of, say, version 3.1.1 is then backported to, say, 2.1.14, what we would have to do is to also specify 2.1.4 and subsequent as fixed, in a separate (extended-)purl.

@pombredanne
Copy link
Collaborator Author

@sbs2001 you wrote:

I actually proposed something similar which was to avoid resolving ranges entirely and store them directly in DB (you can find the convo with @haikoschol and @pombredanne in the chat if you scroll wayyyy up). We rejected the proposal primarily because assuming versioning format is a bad idea. My current opinion is to use ranges as a last resort/fallback.

Do you mind digging this up and pasting the chat log in a comment here?

@pombredanne
Copy link
Collaborator Author

pombredanne commented Oct 13, 2020

For reference there is a related Package URL PR by @david-a-wheeler at package-url/purl-spec#93 that I need to reply to and tickets at package-url/purl-spec#66 and package-url/purl-spec#84
And also ossf/wg-vulnerability-disclosures#28 (comment)

@pombredanne
Copy link
Collaborator Author

pombredanne commented Oct 13, 2020

Here are some thoughts and background rehashing for reference:

Context

There is no (mostly) universal syntax for version ranges and there is no (mostly) universal ways to compare two versions. Each package type may define their own syntax and semantics. For instance:

Problem

Version ranges are useful because they can help to map a future, not-yet-known package version to a known vulnerability impacting it if that package version is within that range.

Some solution elements

There are a few things to consider:

  1. in all cases storing version ranges when available in a vulnerability is a useful data point
  2. since there is no universal syntax and algorithm for version comparison we could either:
    2.1 define a mostly universal syntax for ranges (as part of Package URLs) and package-specific way to normalize to that syntax.
    2.2 use a package-specific syntax and do not normalize it
  3. regardless of the syntax selected, the semantics of versions comparison would need to be package type-specific at best to be correct, and in some cases they may be specific to  package instance e.g. there would be a collection of algorithm to choose from with type defaults and possibly an override at a package level ( and a package-level version scheme feels rather contrived)

As for the problem at hand here:

  • We want to store the versions range
  • We want to store concrete real relationships between a package version and a vulnerability. This matters because storing a version range only (e.g. potential relationship) may need to be overridden. Also it allows us to navigate the graph of relationships between packages and vulnerabilities efficiently. This means that each time we add a new vulnerability we need to resolve a possible range to a list of concrete known package versions.
  • Yet at the same time, we do not know yet about future versions (or may not be up to date about all the known version of a package just now) so we would also need a way to resolve any version that would not be found as stored in the DB against version ranges. 

So in recap IMHO we should ideally:

  1. store version range
  2. store concrete relationships
  3. resolve ranges on create or on access or on query either using a list of known versions or without that list

@david-a-wheeler
Copy link

@pombredanne - thanks for the excellent list of examples! That at least gives us specific examples to compare.

@pombredanne
Copy link
Collaborator Author

@david-a-wheeler frankly I wished everyone would use semver ;)

@sbs2001
Copy link
Collaborator

sbs2001 commented Oct 13, 2020

@pombredanne

Do you mind digging this up and pasting the chat log in a comment here?

Here are some relevant messages regarding that matter. You can search gitter using this and read the whole thing there :) . This is borderline cryptic without the context.

Shivam Sandbhor @sbs2001 Jan 31 08:33

This will need us to map the vulnerabilty to packages with the 3 relationships(affected version range,unaffected version range,fixed version range) . We will just query the package, and check if the version we have matches any of the range in the relationships.

@haikoschol sorry for the vague diagram. Basically we will be having 3 tables.

1st table with 2 columns which would be 'id' and 'package name'. This table will have just the package name not the version , so for eg ffmpeg will get to occupy only 1 row in this table, even though they have like 100 different versions . Let's call the id of package as 'pid'

2nd table will have 2 columns: 1st for id and 2nd for the vulnerability name for eg 'CVE-XXXXXXXXX'. Let's call the id of vulnerability as 'vid'

This is the relations table 3rd table will have 6 columns ,explained as below:
1 This column is basically a foreign key, it has the 'pid' of the corresponding package
2.It is a Foreign key 'vid' of corresponding vulnerability id
3 Unaffected version range of the package for the package with id 'pid' and vulnerability 'vid' .
4.Affected version range for 'pid' and 'vid'
5.Patched version range for 'pid' and 'vid'
6.Id of row
Side Note: We could break this table into smaller parts

Philippe Ombredanne @pombredanne Jan 31 00:51
. We will just query the package, and check if the version we have matches any of the range in the relationships.

that could work too, but storing ranges only means that there is no concrete relationships to actual package records?

Instead I would suggest effectively to store all the versions that are known to be impacted
AND eventually fetch all the new released versions as they are released... and while doing that do the work once to determine if they are impacted or not based on the version ranges?
Haiko Schol @haikoschol Feb 01 02:48
@sbs2001 regarding your proposed data model using version ranges: why not get rid of the relations table as well and just have a foreign key on Package together with the three version ranges in Vulnerability?
one difference between the current model and yours (with or without the relations table) is that part of the data filtering work needs to be done outside of the database. the python code gets all vulnerabilities that affect any version of a given package and has to check for each one of them whether the version the user is interested in falls in the "affected" version range

Haiko Schol @haikoschol Feb 01 02:54
for one package the list of all vulnerabilities that affect any version is probably pretty short. but we want to receive a list of packages (package URLs actually) that constitutes all dependencies for a given project.
another issue is, and i think this is what Philippe was referring to, that assuming every package we deal with uses a sane versioning scheme on which the concept of "ranges" can be applied is quite optimistic
afaik some package managers put no restrictions on the format of versions. so it's possible that we get "versions" like "bob", "jane", "alice", etc.

Haiko Schol @haikoschol Feb 01 02:59
or a project changes their versioning scheme at some point. firefox is an example of that

@joshbressers
Copy link

My thinking on this has morphed a bit since my initial comments in package-url/purl-spec#84

My use case revolves around semver (and only semver) and it's still painful. The example: introduced in version 7.0.0 and 6.5.2 and fixed in version 7.1.1 and 6.8.12 is VERY hard to capture in a way that is easy to understand or parse.

I now plan to create a service focused specifically on PURL IDs. The PURL API will be a way to get a listing of all product names, release versions, date of release, and other metadata I find useful along the way.

The vulnerability data will list only vulnerable PURL IDs. If the version isn't listed, it's not affected. The vulnerable IDs field will be quite large in some cases, but that's OK because this data is for machines not humans.

I envision the workflow to look something like

  1. give me a list of all products
  2. give me a list of all versions released for product foo
  3. Extract the PURL IDs I care about
  4. Give me a list of all vulnerabilities affecting the following PURL IDs

Or

  1. Give me a list of all PURL IDs affected by this vulnerability
  2. Get a chronological list of releases
  3. Walk the list to find the version closest to mine not affected by the vulnerability in question

As I work on this problem I have no doubt my thoughts will change again.

@sbs2001
Copy link
Collaborator

sbs2001 commented Oct 13, 2020

@joshbressers

You are not really solving the problem.

Get a chronological list of releases

I was on the same page as you until I had a chat with @pombredanne regarding using "chronology" as a basis for version comparision :

Shivam Sandbhor
@sbs2001
Sep 29 11:59
@pombredanne FWIW if you need something universal to compare versions , I think we are looking at the wrong thing to compare for.

Instead of comparing the version numbers. It makes more sense to me to compare using the release date of whatever we are comparing. IMHO this should be easy to implement too since we have most of code to fetch package metadata at multiple places in *code projects


Philippe Ombredanne
@pombredanne
Sep 29 14:28
but what about non-linear histories? say foo 1.0 and foo 2.0 have both the same vulnerability. It is patched first in foo 1.1 and then later in foo 2.2 (which comes after foo 2.1) ?

Shivam Sandbhor
@sbs2001
Sep 29 14:33
I don't get the problem here. In this case 1.1 and 2.2 will have release date greater than the vulnerable packages.

Philippe Ombredanne
@pombredanne
Sep 29 14:34
yes but 1.1 is not a fix for 2.0

Shivam Sandbhor
@sbs2001
Sep 29 14:38
right. This fails for providing the closest fix.

And also

Walk the list to find the version closest to mine not affected by the vulnerability in question

would need some comparator function. This approach would work if you are working with just some sane versioning scheme, like semver.

@sbs2001
Copy link
Collaborator

sbs2001 commented Oct 13, 2020

@joshbressers I am curious, how would you obtain only vulnerable PURL IDs ? AFAIK almost all security advisories compress those discrete set of packages into version ranges and let the consumer interpret/resolve the ranges.

@joshbressers
Copy link

Hah, I figured chronological would be a less confusing way to describe this all, I was clearly wrong :)

Let's ignore that word. What I really want is a list of releases in order. I wrongly assumed dates could do that

Here is an example

My version list in the order it was released looks like this

version = [
  '1.0.0',
  '1.0.1',
  '1.1.0',
  '2.0.0',
  '1.1.1',
  '2.1.0',
  '1.1.2',
  '2.1.1',
  '1.1.3'
]

I know there is a vulnerability in

vulnerable_versions = [
  '1.0.1',
  '1.1.0',
  '2.0.0',
  '1.1.1',
  '1.1.2'
]

So we end up with something that looks like this

version = [
  '1.0.0',
  '1.0.1', # vulnerable
  '1.1.0', # vulnerable
  '2.0.0', # vulnerable
  '1.1.1', # vulnerable
  '2.1.0',
  '1.1.2', # vulnerable
  '2.1.1',
  '1.1.3'
]

Now I can figure out the closest fix pretty easily.

@joshbressers
Copy link

@joshbressers I am curious, how would you obtain only vulnerable PURL IDs ? AFAIK almost all security advisories compress those discrete set of packages into version ranges and let the consumer interpret/resolve the ranges.

I am building this service for the products I work on. I need machine readable data, and I get to control everything that's happening. It's a very different problem than the general community.

@pombredanne
Copy link
Collaborator Author

@joshbressers

I am building this service for the products I work on.

neat! if you think there is some bits and data that could be useful feel free to reach out!

@joshbressers
Copy link

@joshbressers

I am building this service for the products I work on.

neat! if you think there is some bits and data that could be useful feel free to reach out!

Thanks @pombredanne!

Everything I do will end up public on github, I'll certainly be looking for honest feedback :)

@pombredanne
Copy link
Collaborator Author

I'll certainly be looking for honest feedback :)

same here! 👍

@pombredanne
Copy link
Collaborator Author

pombredanne commented Oct 13, 2020

And a few extra references to versions specs in the wild:

@copernico
Copy link

copernico commented Oct 13, 2020

Very interesting discussion; I have not gone through all the different version specification schemes and I am somewhat familiar with only a small subset of them, but I suspect (should I say, hope) they are all variants of a general scheme x.y.z.j.k.h where versions can be arranged in a tree with a depth d that is a small integer (typically 3 or 4), as depicted in this figure (for a subtree corresponding to a given X.Y major.minor release series)
releases

(figure from : https://link.springer.com/article/10.1007/s10664-020-09830-x)

Can you show an example of versioning scheme that does not fit this (possibly naïve) generalization?

@david-a-wheeler
Copy link

There are exceptions. Sentimenal versioning lists some examples.

In TeX and METAFONT (two tools widely used in mathematics), new versions add a new digit approaching an irrational number. The version numbers of TeX approach π (the current version is 3.14159265) and the version numbers of METAFONT approach e.

Perhaps more importantly, projects occasionally CHANGE their version number schemes. This is made famous by Bill Gates counts to 10. The Windows version numbers are (overly simplified) as 1, 2, 3, 3.1, 3.11, 95, 98, NT, 2000, XP, Vista, 7, 8, 10.

The solution used by the packaging formats rpm (for Red Hat, Fedora, CentOS, etc.) and deb (Debian, Ubuntu, etc.) is to add "epoch numbers", integers that notionally precede the "normal" version number. See the Fedora docs on this and t the Debian docs on this. Typically an epoch, if included is written as the epoch number, colon, then the "normal" version number. One quirk: in rpm, an epoch epoch number is lower than anything with a given epoch number, while in Debian an "empty" epoch is considered 0.

I think we need to at least support epoch numbers, because otherwise there's no way to handle people who change version number schemes, and that is the standard way to do it.

@pombredanne
Copy link
Collaborator Author

@copernico you wrote:

Can you show an example of versioning scheme that does not fit this (possibly naïve) generalization?

I think that your generalization works as it stands. Even if Debian and RPM packages use of epochs as pointed by @david-a-wheeler the epoch would still fit in a tree view of the versions world as the first optional segment.

IMHO the variations are on how you would create that tree that would require to compare version and things that do change are whether:

  • each version segment allows strings vs. numbers
  • each version segment is treated as a number or as a string
  • if leading zeroes in a string or numeric segment are significant or not
  • and then there suffixes (rc1, alphe, pre, SNAPSHOT) that a certain package type may treat differently.

I cannot fathom of a (mostly) universal way to organize the tree by comparing the versions reliably (reliably being the difference between stating that a version is not vulnerable vs. vulnerable for instance e.g. rising a false negative) with a single algorithm that is not package-type specific.

The closest that would come to mind would be @AMDmi3 's awesome https://github.com/repology/libversion which has a great doc highlighting the complexity of trying to get things right at scale https://github.com/repology/libversion/blob/master/doc/ALGORITHM.md and that support most everything including distros versions.

And also @orsinium https://github.com/dephell/dephell_specifier with support for Python PEP-440, Semver, Ruby, npm and Maven

@sbs2001
Copy link
Collaborator

sbs2001 commented Oct 19, 2020

I and @pombredanne recently had a discussion regarding how to handle version ranges of packages in the context of vulnerablecode. And we decided to run a little experiment.

We would store concrete relationships between packages(these include version) and vulnerabilities the same way we are already doing.

Now coming to new things :

We would have another table like :

class VulnerablePackageRanges :
   vulnerability : Foreign key to vulnerability
   package : A string of package url  without the version. 
   version_range:  A string containing version ranges for which the given package is vulnerable to the vulnerability

Eg value of package could be pkg:npm/foo .

Now if a user asks for vulnerability status of some version of npm package foo , there would be 2 cases :
1. We already have data about the package and it's specific version. In this case we return what we know .
2. We don't have any data about the asked package in the concrete packages. In this case we would look whether there
exists a range expression for the said package(with same name, type omit version). If yes we resolve the range and determine the vulnerability status of the
package. Else we return empty handed .

For resolving ranges we would be using #140 (comment) 's 2.1 point . The universal syntax would be more or less a stripped down version of PEP 440 (this is just an experiment).

Periodically we would also fetch all versions of packages contained in VulnerablePackageRanges and resolve them using the already present ranges.

@pombredanne correct me if I misunderstood you anywhere :)

@pombredanne
Copy link
Collaborator Author

@sbs2001 this makes ++ sense. To recap and reformulate my understanding this would mean:

  1. we experiment with using a mostly universal syntax for version ranges based on the well specified https://www.python.org/dev/peps/pep-0440/#version-specifiers . This is used to store a version range as a single string
  2. the actual comparison procedure of two versions (and the check if a version falls within a range) would be:
    2.1 specific to a package type (e.g. npm, pypi, etc) ...
    2.2 ... with a default if a package does not have it (likely based on dephell or repology ways)
    2.3 ... and the ability for a single package type/ns/name to override this
    ... though these would be refinements post experiments

And your approach boils down to going from the most to the least specific:

  • first search for a concrete and explicit relationship between a package version and a vulnerability
  • else, do a version range check between the package version and package/vulnerability/version range if any
  • else ... we later could also navigate the package graph for inferences, say we know that pkg:deb/foo@1.2 and pkg:rpm/foo@1.2 have the same source code and extend the search to other related package type/names

@pombredanne
Copy link
Collaborator Author

Repasting here the design for version ranges from #119 (comment) and updating it at the same time:

Version ranges specifier

A version ranges specifier is a string with this syntax:
<scheme>:<range>,<range>

  • For example:
    semver:1.2.3,>=2.0.0

  • The <scheme> (such as semver, debian, etc.) determines how to interpret a version range and in how two versions compare as lesser or greater and if a version is within a range.

  • The <scheme> is followed by one or more <range> separated by a comma.

  • Each <range> is declared this way:

    • "=": Version equality operator. Implied if not present and means that a version must be equal to this value as in "=1.2.3"
    • "!=": Version exclusion operator. Means version should be excluded "!=1.2.3"
    • "<=", ">=": Inclusive range operator such as "<=1.2.3" which means all versions less than or equal to "1.2.3"
    • "<", ">": Exclusive range operator such as "<1.2.3" which means all versions less than "1.2.3"

For example >=1.2.3,<2.0.0means all versions greater than or equal to 1.2.3 but less than 2.0.0

  • Within a range the syntax of a version such as 1.2.3 is defined by the scheme
  • Spaces are not significant and are removed in the canonical form: "!=1.2.3" and "! = 1.2.3" are equivalent.
  • Version ranges specifiers are case-insensitive and lowercased in their canonical form.
  • The ordering of multiple <range>s in a specifier is not significant. The canonical ordering is TBD.
  • A range cannot contains operator characters (><=!,*). If required (which should be rare in practice ) they need to be quoted using the URL quoting rules.
  • Equality = and exclusion != is based on the exact test of two lower-cased version strings and is not scheme-specific.
  • The <scheme> determines:
    • how two versions are compared as greater than or lesser.
    • how its version range specifiers syntax can be reduced to the simplified range specifiers syntax defined here.
  • The special "star range" of <scheme>:* means that any version would match this range. A star range can only be used alone and no other range can be added. It should be used sparingly as unbounded ranges are rare and typically problematic.

Notes and caveats:

  • Comparing versions from two different schemes is unspecified (and typically does not make sense even though there may be some obvious similarities between the semver version of an npm and the debian version of its Debian packaging.
  • Schemes are related to Package URL types in the sense that each Package URL type is related to one version scheme, but multiple types can reuse the same scheme (such as semver).

Some of the known schemes and their codes are:

Implementation

https://github.com/nexB/univers by @sbs2001 implements this spec

https://github.com/nexB/univers

Usage in VulnerableCode

Here is the design we discussed to put version ranges to use here.

One problem is that the package version ranges a vulnerability applies to may be misleading after they have been published unless they are updated. For openstack/ossa@777e7b7#r51222097 was last updated in 2014 and does not apply to "All versions", but really only to package versions known at the times this advisory was published.

A related problem is unbounded version ranges, or the lack of version ranges altogether, where an advisory tells when a vulnerability is fixed but not when it appeared, such as https://github.com/mozilla/foundation-security-advisories/blob/master/announce/2016/mfsa2016-14.md

The difficulty is that we do not want to miss reporting any version that is vulnerable (a dangerous false negative) yet we do not want to pollute the reporting with package versions that are not certain to be vulnerable (false positive).

As a solution, the proposed design tries to handle these two cases:

  • by storing concrete vulnerability-package relationships when we are confident this relationship exists
  • by storing a version ranges specifier in a vulnerability-package relationship and being able to query if a package version satisfies it and compute a confidence value in these cases (e.g. signaling a possible false positive and avoiding false negative).

For instance with openstack/ossa@777e7b7#r51222097 that last updated ~ 7 years ago, the confidence that it applies to a package version released in 2021 should be fairly low.

Also since confidence and version ranges specifier are stored they can also be refined and curated by hand in the future.

Therefore, in addition to concrete relationships between package versions and a vulnerability we want to store also a version range with these specifics:

  1. A version ranges specifier is a string as defined above. Stored in PackageRelatedVulnerability

  2. If all the versions in the range are exactly pinned/concrete version, then we would not store a range. Instead we store only the concrete relationships.

  3. When a vulnerability is created or updated, we consider its date of creation or last update and we:

  • update its stored version range (string) as needed (TBD deal with overrides n the future)
  • update the concrete relationships with package versions (including possibly updating, creating and deleting relationships.)
  • This should be based on a best effort of the set of known package versions that existed as released only up to the date of creation or last update of the vulnerability.
  • For some corner cases, we need a special version range with the value * which means that all versions of a package are impacted. This should be used rarely as in most cases this can be instead an open range with no upper or lower bound. When have such range or unbounded ranges, we should limit the creation of a concrete vulnerability-package relationship to some fixed number of versions (TBD, possibly one year back and up to 5 versions back) to avoid creating unverified relationships
  1. When a new package version becomes known independently of a vulnerability update or creation, we do not update or create new relationships

  2. There is a new notion of "confidence" that we should store at the vulnerability-package relationship level. This should be maximal by default and could be overridden manually.

  3. When querying for the vulnerabilities of a package version, we return two sets of relationships:

  • the relationships stored in the DB with the stored confidence, typically high confidence.
  • a query of potential relationships based on checking if the requested package version is within a vulnerabilty-package (PackageRelatedVulnerability)-stored version range.

The confidence values that will be returned with this query should be based on a few factors such as:

  • "decay"/discount based on how old the vulnerability range was last updated
  • and/or the time passed between the vulnerability disclosure/update and the date of the package release
  • and/or whether the version range is "closed" e.g. has a lower bound, and upper bound or no bound.

When storing ranges the unbound ranges are a possible source of problems as they may resolve incorrectly to version that are NOT affected by a vulnerability. To cope with this we should be able to query and find all PackageRelatedVulnerability and Vulnerability that an open e.g. that are missing a lower bound, and upper bound or have no bound to use as an input for reaching out to upstream data sources or package projects, to create a wall of shame or as an input to curation and review.

We also need to revert the changes in #436 and ensure that we effectively store all the concrete relationships as defined here.

@pombredanne
Copy link
Collaborator Author

@sbs2001 @Hritik14 I hope I captured today's chat correctly ^

@pombredanne
Copy link
Collaborator Author

Here is an example with real data:

  1. Today, CVE-2021-foo is published and it affects the django package and these version ranges:
  • django <1
  • django 1.2> to <2
  • django 2.3> to <3
  • django 3.1> to <4

Based on this:

  • I can conclude that 1.3, 2.4, and 3.2 versions that exist today are vulnerable, and I would create a hard relationship they are the versions that exist at the time of the advisory publication.
  • I also store a version ranges spec for this vulnerability/package
  • other versions 2.2, 1.1, 3.0 are not marked as anything (and not even stored) since they are not impacted at all and outside of the range

Tomorrow:

  • There are new package releases of 1.4, 2.3, 3.3 and 3.4: they are potentially vulnerable as they are within the stored ranges spec. Yet I will NOT store a new concrete relationship (yet). Instead a query will catch them because they are part of the range, but this is a potential issue, not a verified one.
  • Based on the day since the vulnerability was last updated we can apply some confidence "decay" based on time passed. Say for instance, I define that 5 years is the time for a vulnerability to decay entirely, then after a year, the confidence that this vulnerability applies to a version that matches its version ranges spec but that was not yet released when the vulnerability was last updated would be 4/5th, e.g. 80/100 as opposed to 100%. The specifics of this are secondary and can be designed later.

The day after tomorrow:

  • there is a an update on the advisory: there are now fixes available in 1.5, 2.5 and 3.5. When we get the data we are:
    • updating both the store version ranges spec AND t
    • we can update these versions 1.4, 3.2, 3.3, 3.4 as vulnerable with a concrete relationship, e.g. reporting this now as as a verified issue.

So:

  1. package releases done after a vulnerability publication/update and that satisfy the original vulnerable ranges are NOT triggering a relationship update. They are though queried and reported.
  2. Fix published later that updates the vulnerable ranges trigger a concrete relationship update

@pombredanne
Copy link
Collaborator Author

FYI, this is an interesting related ticket: CVEProject/cve-schema#87

@pombredanne
Copy link
Collaborator Author

pombredanne commented Aug 24, 2021

In particular this comment I posted is of relevance here:

@TG1999
Copy link
Contributor

TG1999 commented Jan 17, 2023

Thanks for raising this, I am closing this now and I will let Philippe merge the purl vers PR package-url/purl-spec#139 now we have something that mostly works for the version range.

@TG1999 TG1999 closed this as completed Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants