Skip to content

Commit

Permalink
Feature/secureli-435: Implement custom PII scan (#496)
Browse files Browse the repository at this point in the history
secureli-435

This PR implements a new, custom PII scan. This scan occurs separately
from our existing scans which utilize pre-commit hooks external to the
secureli codebase.


## Changes
* Creates new PII Scanner and adds the scan to the existing Scan Action

## Testing
* Unit tests updated
* Screenshot of manual test below:
<img width="685" alt="Screenshot 2024-03-22 at 9 49 55 AM"
src="https://github.com/slalombuild/secureli/assets/56691584/da3cef63-fb16-4062-9697-80be73b79d0c">


## Clean Code Checklist
- [x] Meets acceptance criteria for issue
- [x] New logic is covered with automated tests
- [x] Appropriate exception handling added
- [ ] Thoughtful logging included
- [x] Documentation is updated
- [ ] Follow-up work is documented in TODOs
- [ ] TODOs have a ticket associated with them
- [x] No commented-out code included


<!--
Github-flavored markdown reference:
https://docs.github.com/en/get-started/writing-on-github
-->

---------

Co-authored-by: Kathleen Hogan <kathleenhogan@thessagroup.com>
  • Loading branch information
kathleen-hogan-slalom and Kathleen Hogan authored Mar 22, 2024
1 parent fbd983d commit 9c43ac7
Show file tree
Hide file tree
Showing 20 changed files with 701 additions and 122 deletions.
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,10 @@ Services do not leverage 3rd Party or External Dependencies directly, not even t
Unit tests of Services are done with mock services and abstractions.
### Scanning Services
Scanning is largely done by way of pre-commit hooks as configured in `.pre-commit-config.yaml`. However, we do also implement our own custom scans in separate Scanner Services, e.g., the PII scan. The results of these multiple scans are then merged together at the Action layer for output.
## Abstractions
Abstractions are mini-services that are meant to encapsulate a 3rd-party dependency. This allows us to mock interfaces we don’t own easier than leveraging duck typing and magic mocks while testing our own logic. This also allows us to swap out the underlying dependency with less risk of disruption to our entire application.
Expand Down
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,24 @@ All you need to do is run:
Running `secureli init` will allow seCureLI to detect the languages in your repo, install pre-commit, install all the appropriate pre-commit hooks for your local repo, run a scan for secrets in your local repo, and update the installed hooks.
## Scan
To manually trigger a scan, run:
```bash
% secureli scan
```
This will run through all hooks and custom scans, unless a `--specific-test` option is used. The default is to scan staged files only. To scan all files instead, use the `--mode all-files` option.
### PII Scan
seCureLI utilizes its own PII scan, rather than using an existing pre-commit hook. To exclude a line from being flagged by the PII scanner, you can use a `disable-pii-scan` marker in a comment to disable the scan for that line.
```
test_var = "some dummy data I don't want scanned" # disable-pii-scan
```
# Upgrade
## Upgrading seCureLI via Homebrew
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "poetry.core.masonry.api"
name = "secureli"
version = "0.31.0"
description = "Secure Project Manager"
authors = ["Caleb Tonn <caleb.tonn@slalom.com>"]
authors = ["Caleb Tonn <caleb.tonn@slalom.com>"] # disable-pii-scan
license = "Apache-2.0"
readme = "README.md"

Expand Down
2 changes: 1 addition & 1 deletion scripts/secureli-deployment.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ else
echo "Pulling down the most recent published secureli release v${secureliVersion}"
gh release download v$secureliVersion
export secureliSha256=$(sha256sum ./secureli-${secureliVersion}.tar.gz | awk '{print $1}')
git config --global user.email "secureli-automation@slalom.com"
git config --global user.email "secureli-automation@slalom.com" # disable-pii-scan
git config --global user.name "Secureli Automation"
python ./scripts/get-secureli-dependencies.py
cd homebrew-secureli
Expand Down
24 changes: 12 additions & 12 deletions secureli/actions/action.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from secureli.repositories import secureli_config
from secureli.repositories.repo_settings import SecureliRepository, TelemetrySettings
from secureli.modules.language_analyzer import language_analyzer, language_support
from secureli.modules.core.core_services.scanner import ScannerService
from secureli.modules.core.core_services.scanner import HooksScannerService
from secureli.modules.core.core_services.updater import UpdaterService

from secureli.modules.shared.utilities import format_sentence_list
Expand All @@ -27,15 +27,15 @@ def __init__(
echo: EchoAbstraction,
language_analyzer: language_analyzer.LanguageAnalyzerService,
language_support: language_support.LanguageSupportService,
scanner: ScannerService,
hooks_scanner: HooksScannerService,
secureli_config: secureli_config.SecureliConfigRepository,
settings: SecureliRepository,
updater: UpdaterService,
):
self.echo = echo
self.language_analyzer = language_analyzer
self.language_support = language_support
self.scanner = scanner
self.hooks_scanner = hooks_scanner
self.secureli_config = secureli_config
self.settings = settings
self.updater = updater
Expand Down Expand Up @@ -76,10 +76,8 @@ def verify_install(
outcome=update_config.outcome,
)

pre_commit_config_location = (
self.action_deps.scanner.pre_commit.get_preferred_pre_commit_config_path(
folder_path
)
pre_commit_config_location = self.action_deps.hooks_scanner.pre_commit.get_preferred_pre_commit_config_path(
folder_path
)
if not pre_commit_config_location.exists():
update_result: VerifyResult = (
Expand Down Expand Up @@ -289,7 +287,7 @@ def _run_post_install_scan(
)
self.action_deps.echo.print(f"running {secret_test_id}.")

scan_result = self.action_deps.scanner.scan_repo(
scan_result = self.action_deps.hooks_scanner.scan_repo(
folder_path,
ScanMode.ALL_FILES,
specific_test=secret_test_id,
Expand Down Expand Up @@ -426,8 +424,10 @@ def _update_secureli_pre_commit_config_location(
)
if response:
try:
new_file_path = self.action_deps.scanner.pre_commit.migrate_config_file(
folder_path
new_file_path = (
self.action_deps.hooks_scanner.pre_commit.migrate_config_file(
folder_path
)
)
return VerifyResult(
outcome=VerifyOutcome.UPDATE_SUCCEEDED, file_path=new_file_path
Expand All @@ -436,8 +436,8 @@ def _update_secureli_pre_commit_config_location(
return VerifyResult(outcome=VerifyOutcome.UPDATE_FAILED)
else:
self.action_deps.echo.warning(".pre-commit-config.yaml migration declined")
deprecated_location = self.action_deps.scanner.get_pre_commit_config_path(
folder_path
deprecated_location = (
self.action_deps.hooks_scanner.get_pre_commit_config_path(folder_path)
)
return VerifyResult(
outcome=VerifyOutcome.UPDATE_CANCELED, file_path=deprecated_location
Expand Down
31 changes: 23 additions & 8 deletions secureli/actions/scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@
from secureli.modules.shared.models.publish_results import PublishResultsOption
from secureli.modules.shared.models.result import Result
from secureli.modules.observability.observability_services.logging import LoggingService
from secureli.modules.core.core_services.scanner import ScannerService
from secureli.modules.shared.models.scan import ScanMode
from secureli.modules.core.core_services.scanner import HooksScannerService
from secureli.modules.pii_scanner.pii_scanner import PiiScannerService
from secureli.modules.shared.models.scan import ScanMode, ScanResult
from secureli.settings import Settings
from secureli.modules.shared import utilities

Expand All @@ -36,11 +37,13 @@ def __init__(
action_deps: action.ActionDependencies,
echo: EchoAbstraction,
logging: LoggingService,
scanner: ScannerService,
hooks_scanner: HooksScannerService,
pii_scanner: PiiScannerService,
git_repo: GitRepo,
):
super().__init__(action_deps)
self.scanner = scanner
self.hooks_scanner = hooks_scanner
self.pii_scanner = pii_scanner
self.echo = echo
self.logging = logging
self.git_repo = git_repo
Expand All @@ -52,10 +55,12 @@ def _check_secureli_hook_updates(self, folder_path: Path) -> VerifyResult:
:param folder_path: The folder path containing the .secureli/ folder
"""

self.action_deps.echo.print("Checking for pre-commit hook updates...")
pre_commit_config = self.scanner.pre_commit.get_pre_commit_config(folder_path)
self.action_deps.echo.info("Checking for pre-commit hook updates...")
pre_commit_config = self.hooks_scanner.pre_commit.get_pre_commit_config(
folder_path
)

repos_to_update = self.scanner.pre_commit.check_for_hook_updates(
repos_to_update = self.hooks_scanner.pre_commit.check_for_hook_updates(
pre_commit_config
)

Expand Down Expand Up @@ -156,10 +161,20 @@ def scan_repo(
if verify_result.outcome in self.halting_outcomes:
return

scan_result = self.scanner.scan_repo(
# Execute PII scan (unless `specific_test` is provided, in which case it will be for a hook below)
pii_scan_result: ScanResult | None = None
if not specific_test:
pii_scan_result = self.pii_scanner.scan_repo(
folder_path, scan_mode, files=files
)

# Execute hooks
hooks_scan_result = self.hooks_scanner.scan_repo(
folder_path, scan_mode, specific_test, files=files
)

scan_result = utilities.merge_scan_results([pii_scan_result, hooks_scan_result])

details = scan_result.output or "Unknown output during scan"
self.echo.print(details)

Expand Down
19 changes: 14 additions & 5 deletions secureli/container.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,9 @@
from secureli.modules.shared.resources import read_resource
from secureli.modules import language_analyzer
from secureli.modules.observability.observability_services.logging import LoggingService
from secureli.modules.core.core_services.scanner import ScannerService
from secureli.modules.core.core_services.scanner import HooksScannerService
from secureli.modules.core.core_services.updater import UpdaterService
from secureli.modules.pii_scanner.pii_scanner import PiiScannerService
from secureli.modules.secureli_ignore import SecureliIgnoreService
from secureli.settings import Settings

Expand Down Expand Up @@ -130,11 +131,18 @@ class Container(containers.DeclarativeContainer):
)

"""The service that scans the repository using pre-commit configuration"""
scanner_service = providers.Factory(
ScannerService,
hooks_scanner_service = providers.Factory(
HooksScannerService,
pre_commit=pre_commit_abstraction,
)

"""The service that scans the repository for potential PII"""
pii_scanner_service = providers.Factory(
PiiScannerService,
repo_files=repo_files_repository,
echo=echo,
)

updater_service = providers.Factory(
UpdaterService,
pre_commit=pre_commit_abstraction,
Expand All @@ -148,7 +156,7 @@ class Container(containers.DeclarativeContainer):
echo=echo,
language_analyzer=language_analyzer_service,
language_support=language_support_service,
scanner=scanner_service,
hooks_scanner=hooks_scanner_service,
secureli_config=secureli_config_repository,
settings=settings_repository,
updater=updater_service,
Expand All @@ -175,7 +183,8 @@ class Container(containers.DeclarativeContainer):
action_deps=action_deps,
echo=echo,
logging=logging_service,
scanner=scanner_service,
hooks_scanner=hooks_scanner_service,
pii_scanner=pii_scanner_service,
git_repo=git_repo,
)

Expand Down
29 changes: 10 additions & 19 deletions secureli/modules/core/core_services/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@
from typing import Optional
from pathlib import Path

import pydantic
import re

import secureli.modules.shared.models.scan as scan
from secureli.modules.shared.abstractions.pre_commit import PreCommitAbstraction
from secureli.modules.shared.models.scan import ScanFailure, ScanMode, ScanResult
from secureli.repositories.repo_settings import PreCommitSettings


Expand All @@ -18,15 +17,7 @@ class OutputParseErrors(str, Enum):
REPO_NOT_FOUND = "repo-not-found"


class ScanOuput(pydantic.BaseModel):
"""
Represents the parsed output from a scan
"""

failures: list[ScanFailure]


class ScannerService:
class HooksScannerService:
"""
Scans the repo according to the repo's seCureLI config
"""
Expand All @@ -37,10 +28,10 @@ def __init__(self, pre_commit: PreCommitAbstraction):
def scan_repo(
self,
folder_path: Path,
scan_mode: ScanMode,
scan_mode: scan.ScanMode,
specific_test: Optional[str] = None,
files: Optional[str] = None,
) -> ScanResult:
) -> scan.ScanResult:
"""
Scans the repo according to the repo's seCureLI config
:param scan_mode: Whether to scan the staged files (i.e., the files about to be
Expand All @@ -49,27 +40,27 @@ def scan_repo(
If None, run all hooks.
:return: A ScanResult object containing whether we succeeded and any error
"""
all_files = True if scan_mode == ScanMode.ALL_FILES else False
all_files = True if scan_mode == scan.ScanMode.ALL_FILES else False
execute_result = self.pre_commit.execute_hooks(
folder_path, all_files, hook_id=specific_test, files=files
)
parsed_output = self._parse_scan_ouput(
folder_path, output=execute_result.output
)

return ScanResult(
return scan.ScanResult(
successful=execute_result.successful,
output=execute_result.output,
failures=parsed_output.failures,
)

def _parse_scan_ouput(self, folder_path: Path, output: str = "") -> ScanOuput:
def _parse_scan_ouput(self, folder_path: Path, output: str = "") -> scan.ScanOutput:
"""
Parses the output from a scan and returns a list of Failure objects representing any
hook rule failures during a scan.
:param folder_path: folder containing .secureli folder, usually repository root
:param output: Raw output from a scan.
:return: ScanOuput object representing a list of hook rule Failure objects.
:return: ScanOutput object representing a list of hook rule Failure objects.
"""
failures = []
failure_indexes = []
Expand Down Expand Up @@ -100,9 +91,9 @@ def _parse_scan_ouput(self, folder_path: Path, output: str = "") -> ScanOuput:
files = self._find_file_names(failure_output_list=failure_output_list)

for file in files:
failures.append(ScanFailure(id=id, file=file, repo=repo))
failures.append(scan.ScanFailure(id=id, file=file, repo=repo))

return ScanOuput(failures=failures)
return scan.ScanOutput(failures=failures)

def _get_single_failure_output(
self, failure_start: int, output_by_line: list[str]
Expand Down
Loading

0 comments on commit 9c43ac7

Please sign in to comment.