Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ggshield secret scan trying to scan too large documents #631

Merged
merged 2 commits into from
Jul 13, 2023

Conversation

agateau-gg
Copy link
Collaborator

@agateau-gg agateau-gg commented Jul 12, 2023

Description

Scannable.is_longer_than() currently compares the size passed to it to the decoded size, but what we actually want to compare it to is the UTF-8 encoded size.

To implement this without reading the content multiple times, Scannable implementations now store the UTF-8 encoded size, and Scannable helper methods (_is_file_longer_than() and _decode_bytes()) return the UTF-8 encoded size in addition to their current return value.

When the content they work on is already UTF-8, the helpers try to avoid reading it until necessary.

This PR also integrates the latest py-gitguardian changes, which are also related to this issue.

Issue: #561

`Scannable.is_longer_than()` currently compares the size passed to it to
the decoded size, but what we actually want to compare it to is the
*UTF-8 encoded size*.

To implement this without reading the content multiple times,
`Scannable` implementations now store the UTF-8 encoded size, and
`Scannable` helper methods (`_is_file_longer_than()` and
`_decode_bytes()`) return the UTF-8 encoded size in addition to their
current return value.

When the content they work on is already UTF-8, the helper methods try
to avoid reading it until necessary.
@agateau-gg agateau-gg force-pushed the agateau/fix-maximum-size-check branch from d822636 to db208b9 Compare July 12, 2023 15:36
@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2023

Codecov Report

Merging #631 (db208b9) into main (e631154) will increase coverage by 0.05%.
The diff coverage is 91.89%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##             main     #631      +/-   ##
==========================================
+ Coverage   93.98%   94.03%   +0.05%     
==========================================
  Files         103      103              
  Lines        4804     4812       +8     
==========================================
+ Hits         4515     4525      +10     
+ Misses        289      287       -2     
Flag Coverage Δ
unittests 94.03% <91.89%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
ggshield/secret/docker.py 90.90% <50.00%> (+1.07%) ⬆️
ggshield/scan/file.py 94.33% <100.00%> (-0.11%) ⬇️
ggshield/scan/scannable.py 97.84% <100.00%> (+0.25%) ⬆️

Copy link
Collaborator

@Walz Walz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Walz Walz merged commit 32d6583 into main Jul 13, 2023
@Walz Walz deleted the agateau/fix-maximum-size-check branch July 13, 2023 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ggshield and pygitguardian disagree on the way to check the maximum document size
3 participants