Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shadowserver 2022-01 update #2143

Closed
wants to merge 82 commits into from
Closed

Shadowserver 2022-01 update #2143

wants to merge 82 commits into from

Conversation

elsif2
Copy link
Collaborator

@elsif2 elsif2 commented Jan 25, 2022

This commit is to update the Shadowserver bots to support all current report types.

Test Results

sudo -u intelmq INTELMQ_SKIP_INTERNET=1 python3 -m unittest
----------------------------------------------------------------------
Ran 1457 tests in 74.351s

OK (skipped=246)

@sebix
Copy link
Member

sebix commented Jan 26, 2022

I'm not sure if all of this PR was created on purpose. It contains PR#2134 and 6991597 (not as commit, but as a separate change?), but also tons of other changes in the shadowserver parser which I don't understand and some updates to the shadowserver collector, which need to be reviewed. Many updates to the tests, also new test which I appreciate highly! All in all you changes 142 files in just one commit. Can you please split that enormous commit into smaller ones? E.g. the shadowserver collector, update to the parser and test fixes, new tests or so.

@codecov-commenter
Copy link

codecov-commenter commented Feb 2, 2022

Codecov Report

Merging #2143 (31d0595) into develop (3b15e09) will increase coverage by 0.13%.
The diff coverage is 97.86%.

@@             Coverage Diff             @@
##           develop    #2143      +/-   ##
===========================================
+ Coverage    76.30%   76.43%   +0.13%     
===========================================
  Files          441      456      +15     
  Lines        23649    23937     +288     
  Branches      3739     3454     -285     
===========================================
+ Hits         18046    18297     +251     
- Misses        4864     4894      +30     
- Partials       739      746       +7     
Impacted Files Coverage Δ
intelmq/bots/parsers/shadowserver/_config.py 97.14% <ø> (-1.61%) ⬇️
intelmq/bots/parsers/shadowserver/parser.py 84.21% <ø> (-9.76%) ⬇️
.../tests/bots/parsers/shadowserver/test_blocklist.py 100.00% <ø> (ø)
...rsers/shadowserver/test_event4_honeypot_darknet.py 100.00% <ø> (ø)
...ers/shadowserver/test_event4_honeypot_http_scan.py 100.00% <ø> (ø)
...ots/parsers/shadowserver/test_event4_ip_spoofer.py 90.47% <ø> (ø)
...ers/shadowserver/test_event4_microsoft_sinkhole.py 100.00% <ø> (ø)
...hadowserver/test_event4_microsoft_sinkhole_http.py 100.00% <ø> (ø)
.../bots/parsers/shadowserver/test_event4_sinkhole.py 100.00% <ø> (ø)
.../parsers/shadowserver/test_event4_sinkhole_http.py 100.00% <ø> (ø)
... and 78 more

Copy link
Member

@sebix sebix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, thank you for this PR, mus have been a lot of work!

There lots of changes which I am unable to review properly. Someone with access to the data and the API needs to cover these parts.
Maybe @monoidic or @waldbauer-certat ?

There are a few changes which should be documented in the NEWS file:

  • change of the identifier of the blocklist, honeypot feeds, ip spoofer, MS sinkhole, ..... That will cause troubles for the users and should therefore be documented very well, e.g. with a table
    • And I don't see a good reason to change "open" and "vulnerable" to "scan". "Scan" describes they way how the information was retrieved, but not what the event is about. The IoCs do not describe, that the address do scanning, but that they are vulnerable.
  • change of extra.naics -> extra.source.naics

From the formal side a summary of the changes is necessary to communicate to the users what they need to adapt in their workflows.

@elsif2
Copy link
Collaborator Author

elsif2 commented Feb 16, 2022

I made a number of changes to address your comments. Please let me know if further changes are required.

@aaronkaplan
Copy link
Member

I have been trying to go through this massive PR and tbh I got interrupted frequently due to work. It's a bit ... huge.
@elsif2 could we maybe arrange for a call and you walk me through this a bit?
I would really like to finally merge this.

document the old parameter `country` and its status
warn if used
adapt the test
@elsif2
Copy link
Collaborator Author

elsif2 commented Jul 6, 2022

Please don't forget that this PR's code is not compatible with current IntelMQ configurations (and I see no apparent reason for these changes) so users will need to adapt here, interrupting the automatic upgrade procedure. To prevent this, I propose to add an upgrade method to intelmq/lib/upgrades.py to automatically change the configuration, reducing not only the frustration on the users side, but also to minimize the errors when doing so. And an entry to the changelog is missing.

Are the incompatible configurations the changes to just the classification.identifier or does it include changes to the classification.type also?

@sebix
Copy link
Member

sebix commented Jul 7, 2022

Please don't forget that this PR's code is not compatible with current IntelMQ configurations (and I see no apparent reason for these changes) so users will need to adapt here, interrupting the automatic upgrade procedure. To prevent this, I propose to add an upgrade method to intelmq/lib/upgrades.py to automatically change the configuration, reducing not only the frustration on the users side, but also to minimize the errors when doing so. And an entry to the changelog is missing.

Are the incompatible configurations the changes to just the classification.identifier or does it include changes to the classification.type also?

You are mixing two topics here. One is the runtime configuration of IntelMQ. You changed the first column of mapping, which is the feed name, an (optional) parameter of the shadowserver parser. This can be handled by an upgrade function so that the users do not need to adapt this parameter by hand.

The other topic is a bigger issue. You have changed various mapped values, which means the resulting, parsed data is then different to what it was before. Until now, we have been very hesitant to do this, as it causes lots of work for the users. They need to check and adapt all processing steps after the shadowserver bots. Not only the bot's configurations (e.g. filtering, rules, notifications, ...), but also any further processing scripts outside the IntelMQ Core (and many users have a lot of them) and including changes to their databases to have consistent values for statistics etc. That's a huge effort for everyone. To at least aid in this last step, we have provided UPDATE-statements in the past. Just have a look a the NEWS file, you'll find some examples there.

@elsif2
Copy link
Collaborator Author

elsif2 commented Jul 7, 2022

The other topic is a bigger issue. You have changed various mapped values, which means the resulting, parsed data is then different to what it was before. Until now, we have been very hesitant to do this, as it causes lots of work for the users. They need to check and adapt all processing steps after the shadowserver bots. Not only the bot's configurations (e.g. filtering, rules, notifications, ...), but also any further processing scripts outside the IntelMQ Core (and many users have a lot of them) and including changes to their databases to have consistent values for statistics etc. That's a huge effort for everyone. To at least aid in this last step, we have provided UPDATE-statements in the past. Just have a look a the NEWS file, you'll find some examples there.

I am working on an revision that will set the classification.identifier to match the current develop branch.

Once that is done, the only substantial change to the mapped values impacts the configurations where extra.naics, extra.sector, and extra.sic were used instead of extra.source.naics, extra.source.sector, and extra.source.sic. This change should make it easier for users as the fields are consistent across all report types.

I suppose another option would be to configure both the non-source and source versions for those three fields. Please let me know your preference.

@elsif2
Copy link
Collaborator Author

elsif2 commented Jul 8, 2022

I was not sure what release version to use in intelmq/lib/upgrades.py - currently set for 3.1.0.

I think the only outstanding issue is how to handle the source.naics/sic/sector changes.

@elsif2
Copy link
Collaborator Author

elsif2 commented Jul 12, 2022

I completed the compatibility updates. Here is a diff showing the changes for each feed:
diff.txt

I believe this PR is ready to be merged. Please let me know if you have any suggestions or concerns.

@sebix sebix assigned elsif2 and unassigned aaronkaplan Jul 26, 2022
@sebix
Copy link
Member

sebix commented Jul 26, 2022

According to @aaronkaplan, @elsif2 himself is responsible for this PR.

Update to existing test cases to match current report types.

New tests for added report types.

pycodestyle fixes

add testdata licenses

pycodestyle fix

Added reports parameter

Suggested changes to the parser

Proposed details for the release

Test script updates for suggested changes

Test input updates

Realign columns

Update compromised_website.csv

Update scan_adb.csv

Update scan_adb.csv

Update scan_ftp.csv

Update scan_ipp.csv

Update scan_snmp.csv

Realign columns

Remove duplicates

Changed malware.name to extra.infection

Updated SPDX-FileCopyrightText

shadowserver api: document and warn on old parameter

document the old parameter `country` and its status
warn if used
adapt the test

DOC: fix NEWS entry of PR#2143

Added the sector field to scan_amqp, scan_cwmp, and scan_vnc.

Copyright and raw field updates

Added the sector field to scan_amqp, scan_cwmp, and scan_vnc.

Copyright updates

Added phish_url and scan_modbus reports.

Update source.url and source.fqdn for phish_url and malware_url reports.  Update classification.taxonomy and classification.type for scan_modbus report.

* additional field type validation changes
* added count, bytes, duration, avg_pps, and max_pps fields to event_honeypot_ddos_amp
* added 'protocol.application': 'https' to scan_ssl, scan_ssl_freak, and scan_ssl_poodle
* added 'extra.tag' to scan_* and device_id

Replaced scan_modbus with scan_ics

Addeed event4_honeypot_ddos, event4_honeypot_ddos_target, scan_dvr_dhcpdiscover, and scan_socks.

Tests for event4_honeypot_ddos.

Tests for event4_honeypot_ddos_target.

Tests for scan_dvr_dhcpdiscover.

Tests for scan_socks.

Rename file

Rename file

update:scan_mdns, scan_smb, and special; add:scan_ddos_middle_box

cleanup renamed license files

updated scan_mdns test files

updated scan_smb test files

updated special test files

add scan_ddos_middlebox test files

add scan_ddos_middlebox test

updated schema

Updated scan_smb tests

Updated scan_ntp tests

Updated scan_snmp tests

New scan_docker test

New scan_kubernetes test

New scan_mysql test

Updated report schema for June 2022

Added scan_epmd test

Revert "Added scan_epmd test"

This reverts commit 01edea1.

Revert: Fix for recover_line method as commited in #2192

Added scan_couchdb

Test case for scan_couchdb

Added scan6_rpd

Added/updated README with maintainer details

Restored feed names and classification.identifiers to minimize upgrade impact.

Merge repair

pycodestyle repairs

codespell fixes

license compliance fixes

pycodestyle fixes

Feed configuration updates for compatibility with the original.

Added scan_postgres test

Added additional IPv6 aliases

Fix for recover_line method as commited in #2192
@elsif2
Copy link
Collaborator Author

elsif2 commented Jul 26, 2022

According to @aaronkaplan, @elsif2 himself is responsible for this PR.

I am not authorized to merge a pull request. I created a clean branch for this PR as shadowserver-202207. Please advise on how you would like me to proceed.

@sebix
Copy link
Member

sebix commented Aug 2, 2022

According to @aaronkaplan, @elsif2 himself is responsible for this PR.

I am not authorized to merge a pull request.

I'm caught between two stools, as I don't know the details of your agreement with @aaronkaplan. Can you please check with him?

I'd like to do a release of IntelMQ (3.1.0), and the shadowserver updates would be the major feature of the new release.

@elsif2
Copy link
Collaborator Author

elsif2 commented Aug 3, 2022

Replaced by #2227

@elsif2 elsif2 closed this Aug 3, 2022
@elsif2 elsif2 deleted the elsif-2022-01 branch August 3, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: bots feature Indicates new feature requests or new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants