Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated security and update routine before every release #8815

Closed
hpvd opened this issue Dec 3, 2020 · 28 comments
Closed

Automated security and update routine before every release #8815

hpvd opened this issue Dec 3, 2020 · 28 comments
Labels
area/security type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@hpvd
Copy link

hpvd commented Dec 3, 2020

Is your enhancement request related to a problem? Please describe.
To get the most out of every release regarding security, performance and "bug-freeness" it may be a good idea to make reasonable updating of dependencies a good routine before every release.

Describe the solution you'd like

what would help (if not already used):

  1. enabling GitHubs alerts for vulnerable dependencies for pulsar see https://docs.github.com/en/free-pro-team@latest/github/managing-security-vulnerabilities/about-alerts-for-vulnerable-dependencies

-> if possible a bot automatically should open an issue to fix these findings / update the dependencies as soon as fixes are available

  1. since possible not all vulnerabilities are reported/found it may also be an idea having a dynamic/automated table of dependencies:

-> before every release one should look at this table and update all (most) dependencies to their latest version (or note a hint why this is not possible at this time (e.g. incompatible changes)
-> of course one could automate open update issues as well, but these may result for too many intermediate steps between releases

@hpvd hpvd added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Dec 3, 2020
@hpvd hpvd changed the title Automated security and update routine before release Automated security and update routine before every release Dec 3, 2020
@hpvd
Copy link
Author

hpvd commented Dec 4, 2020

here you can find a blog post with the anoucement of the availability of automatic code scanning for security
https://github.blog/2020-09-30-code-scanning-is-now-available/

@sijie
Copy link
Member

sijie commented Dec 8, 2020

@hpvd thank you for reporting this. We will consider it in our future releases.

@hpvd
Copy link
Author

hpvd commented Dec 9, 2020

A new GitHub feature which may also lead to some kind of "security routine" when merging pull requests, was presented at GitHub Universe 2020: "Dependency Review" :
From announcement:

Dependency review
Today, dependency graph helps you understand your dependencies, and security alerts notify you of newly discovered vulnerabilities in your dependencies. But what if you could receive these alerts before introducing vulnerable code through new or updated dependencies?
Dependency review helps reviewers and contributors understand dependency changes and their security impact at every pull request.

https://github.blog/2020-12-08-new-from-universe-2020-dark-mode-github-sponsors-for-companies-and-more/
also
https://docs.github.com/en/free-pro-team@latest/github/collaborating-with-issues-and-pull-requests/reviewing-dependency-changes-in-a-pull-request

@hpvd
Copy link
Author

hpvd commented Dec 9, 2020

These points could possibly be classified as "low-hanging fruits" in the field of security (at least if they work as expected and there are not to many false positive findings introduced...)

@hpvd
Copy link
Author

hpvd commented Dec 9, 2020

as a last point to this topic: it may be also interesting to give GitHub's "super linter" a try and let it check the hole project on every release or on every pull via GitHub action...
see https://github.com/github/super-linter/

@fmiguelez
Copy link
Contributor

We use dependency-check-maven Maven Plugin to automate CVE checks against updated DB on used dependencies within build process. It is pretty straightforward.

@codelipenghui
Copy link
Contributor

Cool @fmiguelez Would you please push a PR to enable this great plugin? Also, this should be check in the CI to avoid introduce some known CVE issues.

@alexku7
Copy link

alexku7 commented Mar 4, 2021

Hello guys

We try to certify the pulsar according the few security standards .
We scanned the pulsar image 2.7.0 by WhiteSource
Unfortunately , 167 high risk CVE have be discovered in the 55 outdated libraries that were marked is High risk vulnerable .

It's "bit" makes our effort to certify the pulsar for the highly secured production environment to be complicated 😞

On the other hand , there is the opened issue about automated security scanning.

Any change to move this issue forward or at least t upgrade the outdated libraries with high risk?
Could make significant boost to adoption the pulsar by many security regulated environments

@hpvd
Copy link
Author

hpvd commented Mar 4, 2021

many thanks @alexku7 for describing your findings and view in details including the concrete consequence.
Imho this is not only an obstacle for "highly secured production environment" but for a not small part of possible production usages.
As trying to describe in the issue and it's comments, it's not only about security but also about performance and "bug-freeness" which both potentially saves lots of time in analyzing, allocating and solving problems which may already have been fixed by others...
Taking care for this as routine in every release, it should be -after an initial bigger step- a manageable amount of work which is good to catch some of the "low hanging fruits" in a smart way...

@hpvd
Copy link
Author

hpvd commented Mar 4, 2021

-> Could there be a better advertising for pulsars' awesome quality, than being used directly by people and companies working in highly secured fields ?? :-)

@frankjkelly
Copy link
Contributor

Yeah these code / dependency / image scanners are pretty harsh but several of our own customers want security reports of all dependent software so any effort to minimize these issues in Pulsar - especially if it's in a maintenance release e.g. 2.6.4 could be extremely valuable. And if there's a documented process to mitigate in a PR then even someone like me could probably do it as it's in our own interests and happy to deliver value to the broader community :-)

@hpvd
Copy link
Author

hpvd commented Mar 4, 2021

Of course we have also seen, the major work in fields of security and code quality in the past months
(probably coming to live in v2.8), like

  • enabling spotbugs in many components,
  • working on E2E encryption,
  • fixing things resulting in flaky tests
  • etc...

-> this is pretty awesome, and important.
Beyond that, this issues is about the routines and automatics making it possible to get most out of all the works put into pulsar.

@hpvd
Copy link
Author

hpvd commented Mar 4, 2021

@alexku7 would be happy to see the statistics when scanning upcoming v2.8 with same tool (white source)!

@alexku7
Copy link

alexku7 commented Mar 4, 2021

@alexku7 would be happy to see the statistics when scanning upcoming v2.8 with same tool (white source)!

Sure :) no problem
I posted the exported report for 2.7.0 in the slack channel .
https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1614882939234100

@lhotari
Copy link
Member

lhotari commented Jun 7, 2021

There's now #10855 to add a scheduled OWASP Dependency Check to scan library vulnerabilities once per day.

@frankjkelly
Copy link
Contributor

@lhotari this is great news! Thanks so much!

@hpvd
Copy link
Author

hpvd commented Jun 7, 2021

awesome ;-)

@lhotari
Copy link
Member

lhotari commented Jun 8, 2021

The results of the scheduled OWASP Dependency Check scans can be found here:
https://github.com/apache/pulsar/actions/workflows/ci-owasp-dependency-check.yaml

@hpvd
Copy link
Author

hpvd commented Nov 16, 2021

just another topic for optimizing code quality and security further:
Use Automatic Fuzzing to find bugs (e.g. as part of CI / via github action) #12789

-> with the latest possibilities of integration CI process, this is now relatively easy to use but powerful

@hpvd
Copy link
Author

hpvd commented Feb 1, 2022

just learned about the github's dependency graph.
When looking into it for pulsar, there are

dependency graph for pulsar: https://github.com/apache/pulsar/network/dependencies

@hpvd
Copy link
Author

hpvd commented Feb 1, 2022

just to have a first impression without having to leave this issue:

def number of dependencies
Dependencies defined in pom.xml 170
Dependencies defined in tests/pom.xml 1
Dependencies defined in docker/pom.xml 1
Dependencies defined in pulsar-io/pom.xml 1
Dependencies defined in testmocks/pom.xml 7
Dependencies defined in buildtools/pom.xml 20
Dependencies defined in pulsar-sql/pom.xml 19
Dependencies defined in distribution/pom.xml 1
Dependencies defined in pulsar-proxy/pom.xml 23
Dependencies defined in …/pulsar/pom.xml 5
Dependencies defined in pulsar-broker/pom.xml 49
Dependencies defined in pulsar-client/pom.xml 24
Dependencies defined in pulsar-common/pom.xml 33
Dependencies defined in …/website/package.json 16
Dependencies defined in jclouds-shaded/pom.xml 3
Dependencies defined in managed-ledger/pom.xml 14
... ...
... ...
... ...

@hpvd
Copy link
Author

hpvd commented Feb 2, 2022

With this high number of dependencies of all kinds and different ages
the main question that is bothering me:

=> Is it enough (or a least the best thing we could do at this time)
if only the dependencies with already well known/reported security issues are identified and updated?
like addressed: #13972 (which is great of course!!)

-> a) Or is there a big risk of sacrificing security, performance and bug-freeness we didn't see yet
(see goal of this issue #8815 (comment))
resulting from some of the other dependencies (with no yet reported security risks)
for which there are also already updates available (sometimes for a long time)?

-> b) How can we be sure that every dependency, introduced several years ago, is still in use / really needed in todays pulsar?

@hpvd
Copy link
Author

hpvd commented Feb 2, 2022

just to show numbers are constantly growing (yes this is no statistic ;-) only good to transport the feeling...)
from yesterday to today: one more dependency was introduced

def number of dependencies on 01 Feb 2022 number of dependencies on 02 Feb 2022
Dependencies defined in pom.xml 170 171

@lhotari
Copy link
Member

lhotari commented Feb 2, 2022

With this high number of dependencies of all kinds and different ages the main question that is bothering me:

=> Is it enough (or a least the best thing we could do at this time) if only the dependencies with already well known/reported security issues are identified and updated? like addressed: #13972 (which is great of course!!)

-> a) Or is there a big risk of sacrificing security, performance and bug-freeness we didn't see yet (see goal of this issue #8815 (comment)) resulting from some of the other dependencies (with no yet reported security risks) for which there are also already updates available (sometimes for a long time)?

-> b) How can we be sure that every dependency, introduced several years ago, is still in use / really needed in todays pulsar?

Very good questions.

@nicoloboschi and @dlg99 from DataStax have been contributing many changes to address vulnerable library versions. DataStax has bought a license for Sonatype IQ Server and scans also Apache Pulsar frequently.

Another aspect in the Software Supply Chain security is the build reproducibility: are the built artifacts built from the source code that it claims to be built from. For Java projects, there's more information in https://reproducible-builds.org/docs/jvm/ and https://github.com/jvm-repo-rebuild/reproducible-central . It would be good to get Apache Pulsar as part of the Reproducible Builds program. Reproducible Builds have been discussed a few times.

@hpvd Since the mailing list is the main channel for making major decisions in Apache projects, it would be useful to bring up your improvement suggestions to the Apache Pulsar community. dev@pulsar.apache.org would be a good list to have this discussion. Mailing list details are at https://pulsar.apache.org/en/contact/ .

@hpvd
Copy link
Author

hpvd commented Feb 4, 2022

many thanks for your answer, additional details and advice! Will bring some points to the list within the next weeks...

btw: does anybody look on pulsar with a tool like jarchitect to keep a good overview over dependencies?
sounds interesting/helpful to me:

dependency graphs etc
https://www.jarchitect.com/JArchitectv2020

JArchitect comes with several facilities that allow the efficient dependency management. In seconds you can know which part of the code will be impacted if you refactor a class, you can be advised if a layer dependency violation has been accidentally created, you can pinpoint precisely which part of the code relies on a particular tier component, you can list methods that can be reached from a given method etc…

edit: deactivated active link
edit2: there seems to be a trial: The trial license is fully featured, but time limited (14-day free trial.)

@hpvd
Copy link
Author

hpvd commented Aug 11, 2022

another interesting topic in this field of automatic security scanning:
Automatic Scan for CWEs (additional to CVEs) #17069

@hpvd
Copy link
Author

hpvd commented Nov 4, 2022

just to visualize/summarize the current state:
our current procedure/routine seems to miss 35 fixable vulnerabilities (CVE) when releasing latest version 2.10.2

okay, a (very) few less if

  • not all were public known on release day last week
  • we do not want to fix all (why??)
  • or can't fix all immediately because of really major changes in how dependencies work which took some more time to be adapted...

for details see #18348

@tisonkun
Copy link
Member

Moved to the open-ended discussion forum.

I suggest you directly send patches and the maintainers will be glad to review them. Keep requesting helps little: Open-source software grows with contributions.

@apache apache locked and limited conversation to collaborators Dec 28, 2022
@tisonkun tisonkun converted this issue into discussion #19093 Dec 28, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
area/security type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

No branches or pull requests

8 participants