Fix ggshield secret scan trying to scan too large documents #631

agateau-gg · 2023-07-12T15:26:03Z

Description

Scannable.is_longer_than() currently compares the size passed to it to the decoded size, but what we actually want to compare it to is the UTF-8 encoded size.

To implement this without reading the content multiple times, Scannable implementations now store the UTF-8 encoded size, and Scannable helper methods (_is_file_longer_than() and _decode_bytes()) return the UTF-8 encoded size in addition to their current return value.

When the content they work on is already UTF-8, the helpers try to avoid reading it until necessary.

This PR also integrates the latest py-gitguardian changes, which are also related to this issue.

Issue: #561

`Scannable.is_longer_than()` currently compares the size passed to it to the decoded size, but what we actually want to compare it to is the *UTF-8 encoded size*. To implement this without reading the content multiple times, `Scannable` implementations now store the UTF-8 encoded size, and `Scannable` helper methods (`_is_file_longer_than()` and `_decode_bytes()`) return the UTF-8 encoded size in addition to their current return value. When the content they work on is already UTF-8, the helper methods try to avoid reading it until necessary.

codecov-commenter · 2023-07-12T15:39:13Z

Codecov Report

Merging #631 (db208b9) into main (e631154) will increase coverage by 0.05%.
The diff coverage is 91.89%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #631      +/-   ##
==========================================
+ Coverage   93.98%   94.03%   +0.05%     
==========================================
  Files         103      103              
  Lines        4804     4812       +8     
==========================================
+ Hits         4515     4525      +10     
+ Misses        289      287       -2

Flag	Coverage Δ
unittests	`94.03% <91.89%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
ggshield/secret/docker.py	`90.90% <50.00%> (+1.07%)`	⬆️
ggshield/scan/file.py	`94.33% <100.00%> (-0.11%)`	⬇️
ggshield/scan/scannable.py	`97.84% <100.00%> (+0.25%)`	⬆️

Walz

LGTM

agateau-gg force-pushed the agateau/fix-maximum-size-check branch from d822636 to db208b9 Compare July 12, 2023 15:36

fix: update py-gitguardian to get the 'replacement character' fix

676c741

agateau-gg linked an issue Jul 12, 2023 that may be closed by this pull request

ggshield and pygitguardian disagree on the way to check the maximum document size #561

Closed

agateau-gg requested a review from Walz July 12, 2023 16:33

Walz approved these changes Jul 13, 2023

View reviewed changes

Walz merged commit 32d6583 into main Jul 13, 2023

Walz deleted the agateau/fix-maximum-size-check branch July 13, 2023 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ggshield secret scan trying to scan too large documents #631

Fix ggshield secret scan trying to scan too large documents #631

agateau-gg commented Jul 12, 2023 •

edited

Loading

codecov-commenter commented Jul 12, 2023 •

edited

Loading

Walz left a comment

Fix ggshield secret scan trying to scan too large documents #631

Fix ggshield secret scan trying to scan too large documents #631

Conversation

agateau-gg commented Jul 12, 2023 • edited Loading

Description

codecov-commenter commented Jul 12, 2023 • edited Loading

Codecov Report

Walz left a comment

Choose a reason for hiding this comment

agateau-gg commented Jul 12, 2023 •

edited

Loading

codecov-commenter commented Jul 12, 2023 •

edited

Loading