Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/secureli-435: Implement custom PII scan #496

Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,10 @@ Services do not leverage 3rd Party or External Dependencies directly, not even t
Unit tests of Services are done with mock services and abstractions.
### Scanning Services
Scanning is largely done by way of pre-commit hooks as configured in `.pre-commit-config.yaml`. However, we do also implement our own custom scans in separate Scanner Services, e.g., the PII scan. The results of these multiple scans are then merged together at the Action layer for output.
## Abstractions
Abstractions are mini-services that are meant to encapsulate a 3rd-party dependency. This allows us to mock interfaces we don’t own easier than leveraging duck typing and magic mocks while testing our own logic. This also allows us to swap out the underlying dependency with less risk of disruption to our entire application.
Expand Down
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,24 @@ All you need to do is run:

Running `secureli init` will allow seCureLI to detect the languages in your repo, install pre-commit, install all the appropriate pre-commit hooks for your local repo, run a scan for secrets in your local repo, and update the installed hooks.

## Scan

To manually trigger a scan, run:

```bash
% secureli scan
```

This will run through all hooks and the PII scan, unless a `--specific-test` option is used. The default is to scan staged files only. To scan all files instead, use the `--mode all-files` option.

### PII Scan

seCureLI utilizes its own PII scan, rather than using an existing pre-commit hook. To exclude a line from being flagged by the PII scanner, you can use a `disable-pii-scan` marker in a comment to disable the scan for that line.

```
test_var = "some dummy data I don't want scanned" # disable-pii-scan
```

# Upgrade

## Upgrading seCureLI via Homebrew
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "poetry.core.masonry.api"
name = "secureli"
version = "0.31.0"
description = "Secure Project Manager"
authors = ["Caleb Tonn <caleb.tonn@slalom.com>"]
authors = ["Caleb Tonn <caleb.tonn@slalom.com>"] # disable-pii-scan
license = "Apache-2.0"
readme = "README.md"

Expand Down
2 changes: 1 addition & 1 deletion scripts/secureli-deployment.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ else
echo "Pulling down the most recent published secureli release v${secureliVersion}"
gh release download v$secureliVersion
export secureliSha256=$(sha256sum ./secureli-${secureliVersion}.tar.gz | awk '{print $1}')
git config --global user.email "secureli-automation@slalom.com"
git config --global user.email "secureli-automation@slalom.com" # disable-pii-scan
git config --global user.name "Secureli Automation"
python ./scripts/get-secureli-dependencies.py
cd homebrew-secureli
Expand Down
24 changes: 12 additions & 12 deletions secureli/actions/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from secureli.repositories import secureli_config
from secureli.repositories.repo_settings import SecureliRepository, TelemetrySettings
from secureli.modules.language_analyzer import language_analyzer, language_support
from secureli.modules.core.core_services.scanner import ScannerService
from secureli.modules.core.core_services.scanner import HooksScannerService
from secureli.modules.core.core_services.updater import UpdaterService

from secureli.modules.shared.utilities import format_sentence_list
Expand All @@ -27,15 +27,15 @@ def __init__(
echo: EchoAbstraction,
language_analyzer: language_analyzer.LanguageAnalyzerService,
language_support: language_support.LanguageSupportService,
scanner: ScannerService,
hooks_scanner: HooksScannerService,
Copy link
Contributor Author

@kathleen-hogan-slalom kathleen-hogan-slalom Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to leave context from a prior conversation during standup, I decided to rename this service so that it's more explicitly clear what it's used for, in contrast to any custom scans that are implemented (e.g., the PII scan). This scanner solely scans using external pre-commit hooks. Calling it a generic ScannerService didn't quite capture that nuance IMO.

I'd also like to rename this file in a follow-up PR to match this change. I didn't do this as part of this PR because it wouldn't be as clear which in-file changes I made.

secureli_config: secureli_config.SecureliConfigRepository,
settings: SecureliRepository,
updater: UpdaterService,
):
self.echo = echo
self.language_analyzer = language_analyzer
self.language_support = language_support
self.scanner = scanner
self.hooks_scanner = hooks_scanner
self.secureli_config = secureli_config
self.settings = settings
self.updater = updater
Expand Down Expand Up @@ -76,10 +76,8 @@ def verify_install(
outcome=update_config.outcome,
)

pre_commit_config_location = (
self.action_deps.scanner.pre_commit.get_preferred_pre_commit_config_path(
folder_path
)
pre_commit_config_location = self.action_deps.hooks_scanner.pre_commit.get_preferred_pre_commit_config_path(
folder_path
)
if not pre_commit_config_location.exists():
update_result: VerifyResult = (
Expand Down Expand Up @@ -289,7 +287,7 @@ def _run_post_install_scan(
)
self.action_deps.echo.print(f"running {secret_test_id}.")

scan_result = self.action_deps.scanner.scan_repo(
scan_result = self.action_deps.hooks_scanner.scan_repo(
folder_path,
ScanMode.ALL_FILES,
specific_test=secret_test_id,
Expand Down Expand Up @@ -426,8 +424,10 @@ def _update_secureli_pre_commit_config_location(
)
if response:
try:
new_file_path = self.action_deps.scanner.pre_commit.migrate_config_file(
folder_path
new_file_path = (
self.action_deps.hooks_scanner.pre_commit.migrate_config_file(
folder_path
)
)
return VerifyResult(
outcome=VerifyOutcome.UPDATE_SUCCEEDED, file_path=new_file_path
Expand All @@ -436,8 +436,8 @@ def _update_secureli_pre_commit_config_location(
return VerifyResult(outcome=VerifyOutcome.UPDATE_FAILED)
else:
self.action_deps.echo.warning(".pre-commit-config.yaml migration declined")
deprecated_location = self.action_deps.scanner.get_pre_commit_config_path(
folder_path
deprecated_location = (
self.action_deps.hooks_scanner.get_pre_commit_config_path(folder_path)
)
return VerifyResult(
outcome=VerifyOutcome.UPDATE_CANCELED, file_path=deprecated_location
Expand Down
31 changes: 23 additions & 8 deletions secureli/actions/scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@
from secureli.modules.shared.models.publish_results import PublishResultsOption
from secureli.modules.shared.models.result import Result
from secureli.modules.observability.observability_services.logging import LoggingService
from secureli.modules.core.core_services.scanner import ScannerService
from secureli.modules.shared.models.scan import ScanMode
from secureli.modules.core.core_services.scanner import HooksScannerService
from secureli.modules.pii_scanner.pii_scanner import PiiScannerService
from secureli.modules.shared.models.scan import ScanMode, ScanResult
from secureli.settings import Settings
from secureli.modules.shared import utilities

Expand All @@ -36,11 +37,13 @@ def __init__(
action_deps: action.ActionDependencies,
echo: EchoAbstraction,
logging: LoggingService,
scanner: ScannerService,
hooks_scanner: HooksScannerService,
pii_scanner: PiiScannerService,
git_repo: GitRepo,
):
super().__init__(action_deps)
self.scanner = scanner
self.hooks_scanner = hooks_scanner
self.pii_scanner = pii_scanner
self.echo = echo
self.logging = logging
self.git_repo = git_repo
Expand All @@ -52,10 +55,12 @@ def _check_secureli_hook_updates(self, folder_path: Path) -> VerifyResult:
:param folder_path: The folder path containing the .secureli/ folder
"""

self.action_deps.echo.print("Checking for pre-commit hook updates...")
pre_commit_config = self.scanner.pre_commit.get_pre_commit_config(folder_path)
self.action_deps.echo.info("Checking for pre-commit hook updates...")
pre_commit_config = self.hooks_scanner.pre_commit.get_pre_commit_config(
folder_path
)

repos_to_update = self.scanner.pre_commit.check_for_hook_updates(
repos_to_update = self.hooks_scanner.pre_commit.check_for_hook_updates(
pre_commit_config
)

Expand Down Expand Up @@ -156,10 +161,20 @@ def scan_repo(
if verify_result.outcome in self.halting_outcomes:
return

scan_result = self.scanner.scan_repo(
# Execute PII scan (unless `specific_test` is provided, in which case it will be for a hook below)
pii_scan_result: ScanResult | None = None
if not specific_test:
pii_scan_result = self.pii_scanner.scan_repo(
folder_path, scan_mode, files=files
)

# Execute hooks
hooks_scan_result = self.hooks_scanner.scan_repo(
folder_path, scan_mode, specific_test, files=files
)

scan_result = utilities.merge_scan_results([pii_scan_result, hooks_scan_result])

details = scan_result.output or "Unknown output during scan"
self.echo.print(details)

Expand Down
18 changes: 13 additions & 5 deletions secureli/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@
from secureli.modules.shared.resources import read_resource
from secureli.modules import language_analyzer
from secureli.modules.observability.observability_services.logging import LoggingService
from secureli.modules.core.core_services.scanner import ScannerService
from secureli.modules.core.core_services.scanner import HooksScannerService
from secureli.modules.core.core_services.updater import UpdaterService
from secureli.modules.pii_scanner.pii_scanner import PiiScannerService
from secureli.modules.secureli_ignore import SecureliIgnoreService
from secureli.settings import Settings

Expand Down Expand Up @@ -130,11 +131,17 @@ class Container(containers.DeclarativeContainer):
)

"""The service that scans the repository using pre-commit configuration"""
scanner_service = providers.Factory(
ScannerService,
hooks_scanner_service = providers.Factory(
HooksScannerService,
pre_commit=pre_commit_abstraction,
)

"""The service that scans the repository for potential PII"""
pii_scanner_service = providers.Factory(
PiiScannerService,
repo_files=repo_files_repository,
)

updater_service = providers.Factory(
UpdaterService,
pre_commit=pre_commit_abstraction,
Expand All @@ -148,7 +155,7 @@ class Container(containers.DeclarativeContainer):
echo=echo,
language_analyzer=language_analyzer_service,
language_support=language_support_service,
scanner=scanner_service,
hooks_scanner=hooks_scanner_service,
secureli_config=secureli_config_repository,
settings=settings_repository,
updater=updater_service,
Expand All @@ -175,7 +182,8 @@ class Container(containers.DeclarativeContainer):
action_deps=action_deps,
echo=echo,
logging=logging_service,
scanner=scanner_service,
hooks_scanner=hooks_scanner_service,
pii_scanner=pii_scanner_service,
git_repo=git_repo,
)

Expand Down
24 changes: 10 additions & 14 deletions secureli/modules/core/core_services/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@
from typing import Optional
from pathlib import Path

import pydantic
import re

from secureli.modules.shared.abstractions.pre_commit import PreCommitAbstraction
from secureli.modules.shared.models.scan import ScanFailure, ScanMode, ScanResult
from secureli.modules.shared.models.scan import (
ScanFailure,
ScanMode,
ScanOutput,
ScanResult,
)
kathleen-hogan-slalom marked this conversation as resolved.
Show resolved Hide resolved
from secureli.repositories.repo_settings import PreCommitSettings


Expand All @@ -18,15 +22,7 @@ class OutputParseErrors(str, Enum):
REPO_NOT_FOUND = "repo-not-found"


class ScanOuput(pydantic.BaseModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to the models folder along with ScanFailure, ScanMode, and ScanResult. It felt more appropriate there.

"""
Represents the parsed output from a scan
"""

failures: list[ScanFailure]


class ScannerService:
class HooksScannerService:
"""
Scans the repo according to the repo's seCureLI config
"""
Expand Down Expand Up @@ -63,13 +59,13 @@ def scan_repo(
failures=parsed_output.failures,
)

def _parse_scan_ouput(self, folder_path: Path, output: str = "") -> ScanOuput:
def _parse_scan_ouput(self, folder_path: Path, output: str = "") -> ScanOutput:
"""
Parses the output from a scan and returns a list of Failure objects representing any
hook rule failures during a scan.
:param folder_path: folder containing .secureli folder, usually repository root
:param output: Raw output from a scan.
:return: ScanOuput object representing a list of hook rule Failure objects.
:return: ScanOutput object representing a list of hook rule Failure objects.
"""
failures = []
failure_indexes = []
Expand Down Expand Up @@ -102,7 +98,7 @@ def _parse_scan_ouput(self, folder_path: Path, output: str = "") -> ScanOuput:
for file in files:
failures.append(ScanFailure(id=id, file=file, repo=repo))

return ScanOuput(failures=failures)
return ScanOutput(failures=failures)

def _get_single_failure_output(
self, failure_start: int, output_by_line: list[str]
Expand Down
Loading
Loading