Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Mozilla #393

Merged
merged 17 commits into from
Feb 8, 2022
Merged

Collect Mozilla #393

merged 17 commits into from
Feb 8, 2022

Conversation

Hritik14
Copy link
Collaborator

@Hritik14 Hritik14 commented Mar 20, 2021

Fixes #78. Most of the work is done. Here's the checklist

  • Parsing for mozilla data source
  • Working mozilla importer
  • Migrate to GitDataSource
  • Write test cases
  • More verbose comments
  • Update dependencies

Currently, it requires two extra dependencies: PyGithub and markdown
I would eliminate the PyGithub dependency next but I think markdown has
to stay.
Final dependencies would be comitted later.

I would also really like advices on

  • The FIXMEs present in current code.
  • Should we mark all the pks <= resolved pkg as vulnerable as mozilla doesn't ship that information

This is how it looks for now
image

Signed-off-by: Hritik Vijay hritikxx8@gmail.com

@Hritik14 Hritik14 changed the title [WIP] Collect Mozilla Collect Mozilla Mar 21, 2021
Copy link
Collaborator

@sbs2001 sbs2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hritik14 I've done a light review for now, will do another review once changes are made.

vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importer_yielder.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
@Hritik14
Copy link
Collaborator Author

@sbs2001 any update ?

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 25, 2021

@Hritik14 are you sure, you pushed the changes ?

@Hritik14
Copy link
Collaborator Author

@sbs2001 ah, not yet. I actually replied on the requested changes. I'm not sure if you get notifications for those replies. Do let me know and I'll leave a comment next time as well.

@Hritik14
Copy link
Collaborator Author

Hritik14 commented Mar 25, 2021

@sbs2001 I've refactored as suggested and also added a helper functions to split markdown and the front matter. I'd add it to the istio importer after this PR is merged.

@Hritik14
Copy link
Collaborator Author

rebased

Copy link
Collaborator

@sbs2001 sbs2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Hritik14 I've left some comments inline for your consideration. This looks good btw .

vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
# FIXME: Needs improvement
# Should we add 'bugs' section in references too?
# Should we add 'impact'/severity of CVE in references too?
# If yes, then fix alpine_linux importer as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong at alpine linux ?

Copy link
Collaborator Author

@Hritik14 Hritik14 Mar 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the severity of the CVE itself ?
Severity is only available in references, so maybe we should consider adding the CVE Mitre as a reference (as done in this PR) more uniformly.
In Alpine, the CVE is not added as a reference but only as id. It won't really be required there as no severity is mentioned in the advisory for alpine, I wrote it just to maintain uniformaty.
This is what I was thinking
https://github.com/nexB/vulnerablecode/blob/a95a0e9c21ba9a2689dffe46c6c55fe7b8dacf78/vulnerabilities/importers/mozilla.py#L150

vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Outdated Show resolved Hide resolved
p = ""
if h3tag:
for tag in h3tag.next_siblings:
if tag.name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, could you point me to an example where this is not true ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a precautionary measure. I'm only collecting if a p tag follows an h3 which seems more or less like a standard. There is no strict rule about how to format the markdown file, so I figured we should get the correct data if at all.

if tag.name:
if tag.name != "p":
break
p += tag.get_text()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that collect the strong tag too ? Like the one at https://github.com/mozilla/foundation-security-advisories/blob/master/announce/2016/mfsa2016-14.md ? Not a big deal but we should avoid collecting those tags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
For the linked document it returns

Security researcher Holger Fuhrmannek reported that a malicious\nGraphite....

Only consecutive p tags are collected and get_text() makes sure we're getting only textual representation of the content present in p tags.

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Mar 30, 2021
Refactors based on requested review
aboutcode-org#393 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14
Copy link
Collaborator Author

Hritik14 commented Apr 1, 2021

@sbs2001 any update ?

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 1, 2021
Refactors based on requested review
aboutcode-org#393 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 1, 2021
Refactors based on requested review
aboutcode-org#393 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Hritik14 added 12 commits April 2, 2021 14:53
This is based on aboutcode-org#78. Most of the work is done. Here's the checklist
- [x] Parsing for mozilla data source
- [x] Working mozilla importer
- [ ] Migrate to GitDataSource
- [ ] Solve TODOs
- [ ] Write test cases

Currently, it requires two extra dependencies: PyGithub and markdown
I would eliminate the PyGithub dependency next but I think markdown has
to stay.
Final dependences would be comitted later.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
[x] Migrate to GitDataSource
[x] Update dependencies
[x] More verbose comments

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Use batch_advisories for now. It has it's own problems and there's aboutcode-org#338 for that.
The generator thing won't do much, since we are importing like 10-20 MBs of data.

The codebase already has overuse of methods starting with _ , I'd say avoid them.
They don't help with readability nor are they trivial in this case

_parse_md and _parse_yml:
The name is misleading. The function is parsing + enriching the data.
converted to get_advisories_from_yml and get_advisories_from_md

Remove `"branch": None` in importer_yielder

Group imports

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
spaces are important, otherwise it would fail to produce a valid
yaml front matter in case of
https://raw.githubusercontent.com/mozilla/foundation-security-advisories/master/announce/2012/mfsa2012-85.md
Anyway, line shouldn't be altered in a splitter.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Mozilla website uses rsplit to extract the name and version so it should
be better in any case.
https://github.com/mozilla/bedrock/blob/765a60450235d810cf941676e4a29f012a9eaaba/bedrock/security/models.py#L29
Based on discussion here
mozilla/foundation-security-advisories#76

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Refactors based on requested review
aboutcode-org#393 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Hritik14 added 2 commits April 2, 2021 14:59
I would really write a test case for this someday too

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
see a few comments inline for you consideration.

Also, can you avoid having a test zip in vulnerabilities/tests/test_data/mozilla.zip ? text is fine instead.
This will simplify the tests and help review too.

mdlines = []
splitter = mdlines

for index, line in enumerate(lines.split("\n")):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this instead?

import saneyaml

# normalize line endings just in case:
text = text.replace("\r\n", "\n")
front_matter, _, body = text.rpartition("\n---\n")
front_matter = saneyaml.load(front_matter)

For instance:

>>> import saneyaml
>>> text = """---
... title: ISTIO-SECURITY-2019-001
... subtitle: 安全公告
... description: 错误的权限控制。
... cve: [CVE-2019-12243]
... publishdate: 2019-05-28
... keywords: [CVE]
... skip_seealso: true
... aliases:
...     - /zh/blog/2019/cve-2019-12243
...     - /zh/news/2019/cve-2019-12243
... ---
... 
... {{< security_bulletin
...         cves="CVE-2019-12243"
...         cvss="8.9"
...         vector="CVSS:3.0/AV:A/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:N/E:H/RL:O/RC:C"
...         releases="1.1 to 1.1.6" >}}
... 
... ## 内容{#context}
... """
>>> text = text.replace("\r\n", "\n")
>>> front_matter, _, body = text.rpartition("\n---\n")
>>> front_matter = saneyaml.load(front_matter)
>>> print(front_matter)
{'title': 'ISTIO-SECURITY-2019-001', 'subtitle': '安全公告', 'description': '错误的权限控制。', 'cve': ['CVE-2019-12243'], 'publishda
te': '2019-05-28', 'keywords': ['CVE'], 'skip_seealso': True, 'aliases': ['/zh/blog/2019/cve-2019-12243', '/zh/news/2019/cve-2019-122
43']}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not been using saneyaml anywhere in the project. If that is preferred, I can add it to requirements and we can start using that but I don't see any specific reason to do so here.
Regarding rpartition. We cannot have that either as a markdown is allowed to have a "\n---"\n.
Consider this

>>> text=""""---
... announced: February 4, 2014
... fixed_in:
... - Firefox 27
... - Firefox ESR 24.3
... - Thunderbird 24.3
... - Seamonkey 2.24
... impact: High
... reporter: Cody Crews
... title: Clone protected content with XBL scopes
... ---
... 
... <h3>Description</h3>
... 
... Text
... ---
... other text
... """
>>> text = text.replace("\r\n", "\n")
>>> front_matter, _, body = text.rpartition("\n---\n")
>>> 
>>> front_matter
'"---\nannounced: February 4, 2014\nfixed_in:\n- Firefox 27\n- Firefox ESR 24.3\n- Thunderbird 24.3\n- Seamonkey 2.24\nimpact: High\nreporter: Cody Crews\ntitle: Clone protected content with XBL scopes\n---\n\n<h3>Description</h3>\n\nText'
>>> 
>>> body
'other text\n'
>>> 

The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208

Further, as it is a helper, we cannot have it mozilla specific. All it should do is to split the front matter and markdown.
I could have something like this

import yaml

text="""---
announced: February 4, 2014
fixed_in:
- Firefox 27
- Firefox ESR 24.3
- Thunderbird 24.3
- Seamonkey 2.24
impact: High
reporter: Cody Crews
title: Clone protected content with XBL scopes
---

<h3>Description</h3>
---
other text
"""
# normalize line endings just in case:
text = text.replace("\r\n", "\n")
linezero,_, text = text.partition("---\n")
if not linezero: # nothing before first ---
    front_matter,_, body = text.partition("---")
    front_matter = yaml.safe_load(front_matter)
else:
    front_matter = ""
    body = linezero + "---\n" + text
print(front_matter)
print(body)

which prints

{'announced': 'February 4, 2014', 'fixed_in': ['Firefox 27', 'Firefox ESR 24.3', 'Thunderbird 24.3', 'Seamonkey 2.24'], 'impact': 'High', 'reporter': 'Cody Crews', 'title': 'Clone protected content with XBL scopes'}


<h3>Description</h3>
---
other text

but doesn't look any better to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but doesn't look any better to me.

It feels a bit more readable may be?

The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208

We could very much reuse it as-is as well. This is under an MPL license so we would have to do it in an orderly fashion with proper license tracking and code separation though.
If we do not copy it, we cannot reuse code from it though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is required by multiple importers, let's move this to it's own PR. Opened #443

vulnerabilities/importers/mozilla.py Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Show resolved Hide resolved
Advisory(
summary=description,
vulnerability_id="",
impacted_package_urls=[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can you get no impacted packages and fixed packages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream doesn't provide with a list of impacted package. We might consider all packages until this package as impacted but I'm not sure about this. There are many other importers that do this too. Eg: https://github.com/nexB/vulnerablecode/blob/f254b0d4ac54b70c648055a7e8eda16c05dce0f9/vulnerabilities/importers/alpine_linux.py#L194

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. We need a ticket though, as this may need to be interpreted as "all versions before this version are vulnerable" and it warrant some research and discussions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, is this going to be redundant now, just like the suse_backport importer. We now have a improved notion of fixed/resolved/patched package now. Due to #436

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbs2001 That is scary. I've opened #449 to discuss this further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to merge this sooner than later. ... See also #449 (comment)

vulnerabilities/importers/mozilla.py Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Show resolved Hide resolved
vulnerabilities/importers/mozilla.py Show resolved Hide resolved
@Hritik14
Copy link
Collaborator Author

Hritik14 commented Apr 5, 2021

@pombredanne Thank you for the through review. Please check unresolved conversations and I'll push the updated code.

@pombredanne
Copy link
Member

@Hritik14 done :) Thank you for having looked at all these in details!

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 15, 2021
Provide a helper for uniform cve search in importers.
Based on aboutcode-org#393 (comment)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Apr 19, 2021
Provide a helper for uniform cve search in importers.
Based on aboutcode-org#393 (comment)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 26, 2021

@pombredanne re zip files: see #442 (comment) .

@Hritik14 IMHO mocking the whole GItDataSource's logic of collecting changes is redundant, since there are thorough tests of that already. There is some mock db things which IMHO is also not required . Instead you could just test the to_advisories and see whether correct Advisory objects are returned. It's fast (no need to setup DB and such) and easy to implement.

Something like https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/tests/test_retiredotnet.py . There are tests which do check what ends up in db, but I think those should be limited and ideally should write small amounts of data.

@pombredanne
Copy link
Member

@Hritik14 do you mind to rebase?

@pombredanne pombredanne added this to the v30.0 milestone Feb 2, 2022
@Hritik14 Hritik14 force-pushed the collect_mozilla branch 2 times, most recently from 6d372dc to d192798 Compare February 8, 2022 10:22
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this first and fix it later.
Please enter an issue to track that

@pombredanne pombredanne dismissed sbs2001’s stale review February 8, 2022 10:33

We are merging first and will fix issues afterward.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14 Hritik14 merged commit 60a1906 into aboutcode-org:main Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect Mozilla
3 participants