Skip to content

Commit

Permalink
ENH: Added personal access token support to GitHub Collector
Browse files Browse the repository at this point in the history
As github basic authentication has been marked as deprecated by
GitHub, we now implemented the Personal Access Token authentication
method.

Fixes #1549

Signed-off-by: Sebastian Waldbauer <waldbauer@cert.at>
  • Loading branch information
waldbauer-certat committed Jul 7, 2022
1 parent 79cae29 commit 7b2dc7a
Show file tree
Hide file tree
Showing 6 changed files with 20 additions and 34 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ CHANGELOG
- Add support for unverified SSL/STARTTLS connections (PR#2055 by Sebastian Wagner).
- Fix exception handling for aborted IMAP connections (PR#2187 by Sebastian Wagner).
- `intelmq.bots.collectors.blueliv`: Fix Blueliv collector requirements (PR#2161 by Gethvi).
- `intelmq.bots.collectors.github_api._collector_github_api`: Added personal access token support (PR#2145 by Sebastian Waldbauer, fixes #1549).

#### Parsers
- `intelmq.bots.parsers.alienvault.parser_otx`: Save CVE data in `extra.cve` instead of `extra.CVE` due to the field name restriction on lower-case characters (PR#2059 by Sebastian Wagner).
Expand Down
10 changes: 8 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ The parameter `timeout` has been merged into `redis_cache_ttl`.
### Postgres databases


### Bots

#### Github Collector
GitHub removed the basic `Username/Password` Authentication in favor of personal access tokens. So the GitHub Collector uses an Personal Access Token for authentication [Github Documentation: Generate a personal access token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)


3.0.2 Maintenance release (2021-09-10)
--------------------------------------
Two performance issues were fixed. One affected all collectors which processed high volumes of data and the other issue affected some bots which used threading.
Expand Down Expand Up @@ -105,7 +111,7 @@ and the XMPP bots were deprecated in 391d625.
#### Sieve expert
The Sieve expert bot has had major updates to its syntax. Breaking new changes:
* the removal of the `:notcontains` operator, which can be replaced using the newly added
expression negation, e.g `! foo :contains ['.mx', '.zz']` rather than `foo :notcontains ['.mx', '.zz']`.
expression negation, e.g `! foo :contains ['.mx', '.zz']` rather than `foo :notcontains ['.mx', '.zz']`.
* changed operators for comparisons against lists of values, e.g `source.ip :in ['127.0.0.5', '192.168.1.2']` rather than `source.ip == ['127.0.0.5', '192.168.1.2']`
The "old" syntax with `==` on lists is no longer valid and raises an error.

Expand Down Expand Up @@ -284,7 +290,7 @@ CentOS 7 (with EPEL) provides both Python 3.4 and Python 3.6. If IntelMQ was ins
type and reloads them afterwards. Removes any external dependencies (such as curl or wget).
This is a replacement for shell scripts such as `update-tor-nodes`, `update-asn-data`,
`update-geoip-data`, `update-rfiprisk-data`.

Usage:
```
intelmq.bots.experts.asn_lookup.expert --update-database
Expand Down
3 changes: 1 addition & 2 deletions docs/user/bots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -378,8 +378,7 @@ Github API
**Configuration Parameters**

* **Feed parameters** (see above)
* `basic_auth_username:` GitHub account username (optional)
* `basic_auth_password:` GitHub account password (optional)
* `personal_access_token:` GitHub account personal access token [GitHub documentation: Creating a personal access token](https://developer.github.com/changes/2020-02-14-deprecating-password-auth/#removal)
* `repository:` GitHub target repository (`<USER>/<REPOSITORY>`)
* `regex:` Valid regular expression of target files within the repository (defaults to `.*.json`)
* `extra_fields:` Comma-separated list of extra fields from `GitHub contents API <https://developer.github.com/v3/repos/contents/>`_.
Expand Down
27 changes: 6 additions & 21 deletions intelmq/bots/collectors/github_api/_collector_github_api.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,16 @@
# SPDX-FileCopyrightText: 2021 Sebastian Waldbauer
# SPDX-FileCopyrightText: 2022 Sebastian Waldbauer
#
# SPDX-License-Identifier: AGPL-3.0-or-later

# -*- coding: utf-8 -*-
"""
GITHUB API Collector bot
"""
import base64
from typing import Optional

import requests
from intelmq.lib.bot import CollectorBot

try:
import requests
except ImportError:
requests = None

static_params = {
'headers': {
'Accept': 'application/vnd.github.v3.text-match+json'
Expand All @@ -23,16 +19,12 @@


class GithubAPICollectorBot(CollectorBot):
basic_auth_username = None
basic_auth_password = None
personal_access_token: Optional[str] = None

def init(self):
if requests is None:
raise ValueError('Could not import requests. Please install it.')

self.__user_headers = static_params['headers']
if self.basic_auth_username is not None and self.basic_auth_password is not None:
self.__user_headers.update(self.__produce_auth_header(self.basic_auth_username, self.basic_auth_password))
if self.personal_access_token:
self.__user_headers.update({'Authorization': self.personal_access_token})
else:
self.logger.warning('Using unauthenticated API access, means the request limit is at 60 per hour.')

Expand All @@ -55,10 +47,3 @@ def github_api(self, api_path: str, **kwargs) -> dict:
return response.json()
except requests.RequestException:
raise ValueError(f"Unknown repository {api_path!r}.")

@staticmethod
def __produce_auth_header(username: str, password: str) -> dict:
encoded_auth_bytes = base64.b64encode(bytes(f'{username}:{password}', encoding='utf-8'))
return {
'Authorization': 'Basic {}'.format(encoded_auth_bytes.decode('utf-8'))
}
3 changes: 1 addition & 2 deletions intelmq/etc/feeds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1922,8 +1922,7 @@ providers:
collector:
module: intelmq.bots.collectors.github_api.collector_github_contents_api
parameters:
basic_auth_username: USERNAME
basic_auth_password: PASSWORD
personal_access_token: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
repository: StrangerealIntel/DailyIOC
regex: .*.json
parser:
Expand Down
10 changes: 3 additions & 7 deletions intelmq/tests/bots/collectors/github_api/test_collector.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@
SHOULD_PASS_WITH_TXT_FILES_AND_EXTRA_FIELD_SIZE_TEST = {
'CONFIG': {
'name': 'Github feed',
'basic_auth_username': 'dummy_user',
'basic_auth_password': 'dummy_password',
'personal_access_token': 'super_special_access_token',
'repository': 'author/repository',
'extra_fields': 'size, sha',
'regex': '.*.txt'
Expand All @@ -59,8 +58,7 @@
SHOULD_FAIL_BECAUSE_REPOSITORY_IS_NOT_VALID_CONFIG = {
'CONFIG': {
'name': 'Github feed',
'basic_auth_username': 'dummy_user',
'basic_auth_password': 'dummy_password',
'personal_access_token': 'super_special_access_token',
'repository': 'author/',
'extra_fields': 'size',
'regex': '.*.txt'
Expand All @@ -70,8 +68,7 @@
SHOULD_FAIL_WITH_BAD_CREDENTIALS = {
'CONFIG': {
'name': 'Github feed',
'basic_auth_username': 'dummy_user',
'basic_auth_password': 'bad_dummy_password',
'personal_access_token': 'faulty_access_token',
'repository': 'author/repo',
'regex': '.*.txt'
}
Expand All @@ -95,7 +92,6 @@ def print_requests_get_parameters(url, *args, **kwargs):
main_mock = MagicMock(content=EXAMPLE_CONTENT_STR)
return main_mock


class TestGithubContentsAPICollectorBot(test.BotTestCase, TestCase):
"""
A TestCase for GithubContentsAPICollectorBot.
Expand Down

0 comments on commit 7b2dc7a

Please sign in to comment.