Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multi module project #1100

Conversation

Emptyless
Copy link

support multi module project by using the build system to locate the go.mod if present. Related to #501 #622

@codecov-commenter
Copy link

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (1285eb7) 73.93% compared to head (8bf5661) 73.73%.
Report is 1 commits behind head on master.

Files Patch % Lines
analyzer.go 43.75% 7 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1100      +/-   ##
==========================================
- Coverage   73.93%   73.73%   -0.21%     
==========================================
  Files          50       50              
  Lines        2947     2958      +11     
==========================================
+ Hits         2179     2181       +2     
- Misses        694      701       +7     
- Partials       74       76       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

analyzer.go Outdated
packageFiles = append(packageFiles, path.Join(pkgPath, filename))
var goModFile string
goModDir := abspath[0 : len(abspath)-len(pkgPath)] // get root dir
if modPkgs, err := packages.Load(&packages.Config{Mode: packages.NeedModule, Dir: abspath}, abspath); err == nil && len(modPkgs) == 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only loading the module files, or does parse all the package as well? Please could you double check? I want to avoid to introduce performance issues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this 1? What if there are more than 1 module files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check on performance but my assumption was with only NeedModule is that it skips the remaining parsing , the check is for 1 given that it should only find a single go.mod for the given source file - or there are go files with different package identifiers in the same directory (which is not valid).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added benchmarks, see below

analyzer.go Show resolved Hide resolved
analyzer.go Outdated
filePath := path.Join(abspath, filename)
relPath, err := filepath.Rel(goModDir, filePath)
if err != nil {
return []*packages.Package{}, fmt.Errorf("get relative path between %q and %q: %w", goModDir, filePath, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail the whole scanning in case the relative path of one file cannot be looked-up. This is impractical in very large projects. I would log the error, and just skip the file if it cannot find the relative path.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit

@ccojocar
Copy link
Member

Please try to add some tests as well in the analyzer_test.go for a multi-module project.

@Emptyless
Copy link
Author

Please try to add some tests as well in the analyzer_test.go for a multi-module project.

will add them tonight or tomorrow

@Emptyless
Copy link
Author

Executed benchmarks with and without the multi module scanning disabled (based on env var). See benchmarks in analyzer_test.go:

Results:

BenchmarkWithMultiModuleDisabled
BenchmarkWithMultiModuleDisabled/_1
BenchmarkWithMultiModuleDisabled/_1-10 	       7	 145905167 ns/op
BenchmarkWithMultiModuleDisabled/_10
BenchmarkWithMultiModuleDisabled/_10-10         	       7	 172454298 ns/op
BenchmarkWithMultiModuleDisabled/_100
BenchmarkWithMultiModuleDisabled/_100-10        	       3	 523482389 ns/op
BenchmarkWithMultiModuleDisabled/_1000
BenchmarkWithMultiModuleDisabled/_1000-10       	       1	6146521500 ns/op
BenchmarkWithMultiModuleDisabled/_5000
BenchmarkWithMultiModuleDisabled/_5000-10       	       1	12735920208 ns/op
BenchmarkWithMultiModuleEnabled
BenchmarkWithMultiModuleEnabled/_1
BenchmarkWithMultiModuleEnabled/_1-10           	       7	 160708732 ns/op
BenchmarkWithMultiModuleEnabled/_10
BenchmarkWithMultiModuleEnabled/_10-10          	       6	 183600472 ns/op
BenchmarkWithMultiModuleEnabled/_100
BenchmarkWithMultiModuleEnabled/_100-10         	       2	 612680938 ns/op
BenchmarkWithMultiModuleEnabled/_1000
BenchmarkWithMultiModuleEnabled/_1000-10        	       1	5649391834 ns/op
BenchmarkWithMultiModuleEnabled/_5000
BenchmarkWithMultiModuleEnabled/_5000-10        	       1	13005952500 ns/op

For 5000 file there is about a 2% performance decrease, for 1 file about 10%.

I interpret this, correct me if I'm wrong, that the time it takes to load the module doesnt scale with the amount of files in the package but the actual addition of the call does have a minor impact on projects with few files (< 10). 10% sounds like a lot but we are talking about a total evaluation time of 0,25ms vs 0,28ms.

Copy link
Member

@ccojocar ccojocar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NeedModule is already into the configuration. See

Mode: LoadMode,

There is no need to load it separately. The loader will go back into the folder looking for it. You can easily verify this by lunching a scan inside of a package without a go.mod file.

For instance just take gosec source code, and run a scan inside of ./cmd/gosec folder. You should see that the module is correctly located by the loader and the scan is successful.

For instance, I have the following multi module project:

.
├── go.work
├── module1
│   ├── go.mod
│   └── main.go
├── module2
│   ├── go.mod
│   └── main.go
└── module3
    ├── go.mod
    └── main.go

4 directories, 7 files

and if I lunch a scan in the workspace folder, everything works fine:

[gosec] 2024/02/16 11:35:55 Including rules: default
[gosec] 2024/02/16 11:35:55 Excluding rules: default
[gosec] 2024/02/16 11:35:55 Import directory: /Users/gcojocar/projects/go-multi-modules/module1
[gosec] 2024/02/16 11:35:55 Import directory: /Users/gcojocar/projects/go-multi-modules/module2
[gosec] 2024/02/16 11:35:55 Import directory: /Users/gcojocar/projects/go-multi-modules/module3
[gosec] 2024/02/16 11:35:56 Checking package: main
[gosec] 2024/02/16 11:35:56 Checking file: /Users/gcojocar/projects/go-multi-modules/module1/main.go
[gosec] 2024/02/16 11:35:56 Checking package: main
[gosec] 2024/02/16 11:35:56 Checking file: /Users/gcojocar/projects/go-multi-modules/module3/main.go
[gosec] 2024/02/16 11:35:56 Checking package: main
[gosec] 2024/02/16 11:35:56 Checking file: /Users/gcojocar/projects/go-multi-modules/module2/main.go
Results:


Summary:
  Gosec  : dev
  Files  : 3
  Lines  : 21
  Nosec  : 0
  Issues : 0

The same if I run it inside of a module:

[gosec] 2024/02/16 11:37:15 Including rules: default
[gosec] 2024/02/16 11:37:15 Excluding rules: default
[gosec] 2024/02/16 11:37:15 Import directory: /Users/gcojocar/projects/go-multi-modules/module1
[gosec] 2024/02/16 11:37:15 Checking package: main
[gosec] 2024/02/16 11:37:15 Checking file: /Users/gcojocar/projects/go-multi-modules/module1/main.go
Results:


Summary:
  Gosec  : dev
  Files  : 1
  Lines  : 7
  Nosec  : 0
  Issues : 0

I am still confused about what you are try to solve here. Please could you provide an archive with a project and where exactly you are trying to run the scan?

packageFiles = append(packageFiles, path.Join(pkgPath, filename))
var goModFile string
goModDir := abspath[0 : len(abspath)-len(pkgPath)] // get root dir
if os.Getenv("DISABLE_MULTI_MODULE_MODE") != "true" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the multi-module project requires a workspace file? https://go.dev/ref/mod#workspaces

The switch from single module to multi module project is controlled by GOWORK environment variable and this should be already supported inside of Load function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to use the replace directive in a go.mod. See: https://go.dev/ref/mod#go-mod-file-replace

go.work was introduced later in go1.18 to simplify multi-module projects but is not mandated iirc

var goModFile string
goModDir := abspath[0 : len(abspath)-len(pkgPath)] // get root dir
if os.Getenv("DISABLE_MULTI_MODULE_MODE") != "true" {
if modPkgs, err := packages.Load(&packages.Config{Mode: packages.NeedModule, Dir: abspath}, abspath); err == nil && len(modPkgs) == 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you run the scan in the workspace folder? There are multiple modules, each module with multiple packages. As I understand here, there will be loaded just one of this modules? Is the right module of the package?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a message that if run in a go.work environment, the message will be logged and it will fall back to default behaviour. Since then we rely on the go.work the conf.Dir has to be reset while it still has the lock. This will support go.work behaviour even when not alle modules are used in the go.work file.

package main
func bar(){
}`)
err := pkg1.Build()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a multi-module project. You are missing the go.work file. See the Workspaces for details https://go.dev/ref/mod#workspaces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test was intended to mock a multi-module project without a go.work file, this relies on the published version of a module rather than local filesystem version. This could be the case if within a project it is decided not to commit the go.work version such that CI picks up on versioning issues.

@Emptyless
Copy link
Author

The NeedModule is already into the configuration. See

Mode: LoadMode,

There is no need to load it separately. The loader will go back into the folder looking for it. You can easily verify this by lunching a scan inside of a package without a go.mod file.

the problem is that the package files are not referenced relative to the go.mod so an additional pass is needed in order to load them correctly. The reason your example works is that never becomes an issue but it does with nested go.mod files.

.
├── go.mod
├── manual.go
└── some
   ├── go.mod
   ├── generated.go
   └── nested
       ├── main.go
       └── go.mod

3 directories, 6 files

This will cause errors as the files of nested are loaded as ./some/nested/main.go for package nested instead of main.go. This will cause the packages tool to fail and report (e.g. for some random imported package):

Golang errors in file: [gosec/testmultimodule/some/nested/main.go]:

  > [line 4 : column 2] - could not import github.com/joho/godotenv (invalid package name: "")

I've attached the directory structure above if you want to reproduce this behavior, just put it in the root of the gosec project
testmultimodule.zip

it works with gosec -stdout -verbose=text ./..., it doesnt work with DISABLE_MULTI_MODULE_MODE=true gosec -stdout -verbose=text ./...

@ccojocar
Copy link
Member

the problem is that the package files are not referenced relative to the go.mod so an additional pass is needed in order to load them correctly. The reason your example works is that never becomes an issue but it does with nested go.mod files.

This doesn't look like a correct multi-module project in Go. Please have a look at Workspaces in the documentation https://go.dev/ref/mod#workspaces.

If you set up correctly the workspace folder, it should work fine:

.
├── go.mod
├── go.work
├── manual.go
└── some
    ├── generated.go
    ├── go.mod
    └── nested
        ├── go.mod
        ├── go.sum
        └── main.go

3 directories, 8 file

go.work file:

go 1.22.0

use (
        .
        ./some
        ./some/nested
)

And the scan is successful:

[gosec] 2024/02/16 14:24:21 Including rules: default
[gosec] 2024/02/16 14:24:21 Excluding rules: default
[gosec] 2024/02/16 14:24:21 Import directory: /Users/gcojocar/Downloads/testmultimodule/some
[gosec] 2024/02/16 14:24:21 Import directory: /Users/gcojocar/Downloads/testmultimodule/some/nested
[gosec] 2024/02/16 14:24:21 Import directory: /Users/gcojocar/Downloads/testmultimodule
[gosec] 2024/02/16 14:24:21 Checking package: some
[gosec] 2024/02/16 14:24:21 Checking file: /Users/gcojocar/Downloads/testmultimodule/some/generated.go
[gosec] 2024/02/16 14:24:21 Checking package: testmultimodule
[gosec] 2024/02/16 14:24:21 Checking file: /Users/gcojocar/Downloads/testmultimodule/manual.go
[gosec] 2024/02/16 14:24:22 Checking package: main
[gosec] 2024/02/16 14:24:22 Checking file: /Users/gcojocar/Downloads/testmultimodule/some/nested/main.go
Results:


Summary:
  Gosec  : dev
  Files  : 3
  Lines  : 15
  Nosec  : 0
  Issues : 0

It seems that these changes proposed here are not required. Thank you

@Emptyless
Copy link
Author

Emptyless commented Feb 16, 2024

That workflow will fail in a CI environment which is an example of where we run gosec. See golang/go#62179 on why the go.work file should generally not be committed. I'd prefer to support both flows, local (where e.g. a go.work file is present) and vcs (without go.work)

@ccojocar
Copy link
Member

If you don't want to check it in, you can either generate it on the fly or scan each module folder separately.

Thanks for contributing it but at this time I don't plan to merge it because it doesn't seem canonical to me and brings a performance penalty since most of the projects doesn't need this feature.

@ccojocar ccojocar closed this Feb 16, 2024
@Emptyless
Copy link
Author

Emptyless commented Feb 16, 2024

thanks for your time and review but I think it is shortsighted to close based on a performance penalty of 0,03ms and imho pretty canonical lines and to force a project structure which is generally recommended against in the Go repository. I could make it opt-in behaviour based and invert the env flag if you are truly worried about this.

There have been two issues already where you tried to enforce the same directory layout which was also downvoted by the community (see e.g. #501 (comment)).

edit: replaced strong wording

@ccojocar
Copy link
Member

ccojocar commented Feb 16, 2024

It doesn't seem to be consensus about this in the Go community. When there will be a clear solution, gosec will definitely adopt it.

If you scan a monorepo, it makes perfect sense to check it in. See the discussion in golang/go#53502 (comment).

@Emptyless
Copy link
Author

I've replaced the strong wording, a better wording would be "generally recommended against".

It doesn't seem to be consensus about this in the Go community. When there will be a clear solution, gosec will definitely adopt it.

If you scan a monorepo, it makes perfect sense to check it in. See the discussion in golang/go#53502 (comment).

I think there is a misunderstanding here, in a monorepo is it mentioned that this is fine; later in the thread even mentioned that this should definitely be checked in (golang/go#53502 (comment)). But this is an exception to the general recommendation, the full quote (you linked) is:

go.work files should generally not be checked into VCS, as their purpose is to develop multiple modules at the same time, and each module tends to have its own VCS repository. However, there are certain scenarios where it can make sense to check in go.work files, such as a monorepo holding multiple modules which depend on each other.

would you want gosec to only be able to scan the exceptions to the general recommendation rather than both situations?

@Emptyless
Copy link
Author

Given that there is now consensus in the Go community ("It is generally inadvisable to commit go.work files into version control systems", see golang/website@80af5f5), please reconsider the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants