-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add improver pipeline to flag ghost packages #644 #917 #1395 #1533
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few nits for your review.
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
- Package can have malicious, ghost, yanked, valid and unknown status Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
e093a58
to
b848747
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks... here a few nits for your review... I wonder if there is a real need to convert versions with univers for this use case.
Also:
- Thanks for adding logging!
- Please add a CHANGELOG entry for logging and pipeline introduction
version_class = RANGE_CLASS_BY_SCHEMES[purl.type].version_class | ||
|
||
try: | ||
return {version_class(v.value) for v in versions(str(purl))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check for a ghost, why do we need to convert things to a version class?
Could we just check the if the version string exists?
Otherwise we are eventually ghosting all the packages with a version we cannot parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check for a ghost, why do we need to convert things to a version class?
Could we just check the if the version string exists?
Simple string comparison may lead to incorrectly flagging legitimate versions as ghost packages:
"1.2.0" != "1.02.0"
"v1.2.3" != "1.2.3"
"1.0" != "1.0.0"
Otherwise we are eventually ghosting all the packages with a version we cannot parse
If we cannot parse the version for a given base_purl, we skip those and don’t mark them as ghosts.
Signed-off-by: Keshav Priyadarshi <git@keshav.space> Co-authored-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
Signed-off-by: Keshav Priyadarshi <git@keshav.space>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go!
Would this also address situations where the upstream index is wrong, such as #915 ? |
@armijnhemel at present, this will flag the ghost packages from the following ecosystems: |
I am not sure you understood my question: what if upstream is wrong (as is the case in Alpine see #915 )? Are you always assuming that the index is correct? |
@armijnhemel Yes, this pipeline assumes that package index is right. We should have seperate pipeline for case where index is wrong. We will need some generalized way to conclude that package index is reporting wrong version. |
So far I have only seen it once (in Alpine, see #915) so perhaps it could just be added as an exception without building a separate pipeline but I guess it might happen more often. |
@armijnhemel The general approach to curation is going to be:
|
This PR introduces the aboutcode pipeline for defining and running improvements on data (see #1395). It also adds a new pipeline to flag ghost packages. Ghost packages are those packages that are not present in the upstream package manager index.