Improve support for file loading in large workspaces #1087

tsandall · 2018-11-26T16:31:42Z

It should be possible to easily run tools like opa run or opa eval or opa test in workspaces that contain many files, e.g., mono repos. Currently tools like opa run load all JSON and .rego files out of the provided directories, recursively. In large workspaces like mono repos this can be quite expensive.

One option would be introduce support for an .opaignore file that users could configure to control file loading. The .opaignore file could include both blacklist and whitelist directives. For example:

*
!policies

This would ignore all files and then explicitly load files under the path policies.

To implement this we could update the loader package to look for the .opaignore file and then alter loading behaviour accordingly.

The text was updated successfully, but these errors were encountered:

tsandall · 2019-01-08T01:42:38Z

/cc @teq0 if you have further thoughts on how to implement this, feel free to post on Slack or here.

teq0 · 2019-01-11T11:46:29Z

Not sure about the whitelist part, but in the loader there's already the concept of filters, but at the moment filters appear to implemented one at a time, even though there's an exclude() function that takes an array of filters and applies them in order. I can't find any usage of the exclude() function though.

It seems like the simplest approach would be to substitute exclude() into all the places in loader.go that call a single filter directly, and to prime the array of filters with an opaignore() filter as the first filter, which runs each file path against the .opaignore values.

The thing I don't have a good handle on is how to inject a fake .opaignore for testing purposes.

And I'm not sure of the use case for the whitelist. To include specific files in folders that are blacklisted?

tsandall · 2019-01-11T17:37:27Z

Not sure about the whitelist part, but in the loader there's already the concept of filters, but at the moment filters appear to implemented one at a time, even though there's an exclude() function that takes an array of filters and applies them in order. I can't find any usage of the exclude() function though.

exclude is cruft, you can remove it. When I originally added filters I was planning to pass a set of them to the loader (hence exclude) but simplified to one because callers can just implement that logic themselves (and decide whether they want the filters to AND or OR.) I'd rather leave that behavior up to the caller.

It seems like the simplest approach would be to substitute exclude() into all the places in loader.go that call a single filter directly, and to prime the array of filters with an opaignore() filter as the first filter, which runs each file path against the .opaignore values.

The code in the loader basically runs allRec for each path received by the loader. allRec then runs recursively for each path to be loaded. allRec calls the filter to check if a path should be processed. My thinking was that the .opaignore could be implemented by another filter (which could just wrap/call the user-supplied filter if the answer was indeterminate.)

The thing I don't have a good handle on is how to inject a fake .opaignore for testing purposes.

See the loader_test.go file, there are a bunch of integration tests there that set up a temporary filesystem (and do cleanup properly). You could add another test case to exercise this.

And I'm not sure of the use case for the whitelist. To include specific files in folders that are blacklisted?

The use case for whitelisting is when policies are kept in a monorepo and you have an IDE/workspace that loads the whole monorepo. In this world, blacklists are a pain to maintain. It's easier to say:

*
!policies

This is how gitignore works.

Hope this helps!

teq0 · 2019-01-12T02:13:25Z

Got it. So after https://github.com/open-policy-agent/opa/blob/master/loader/loader.go#L236 call another filter that checks against the .opaignore rules.

And I get the WithTempFS() now.

tsandall · 2019-01-12T17:06:21Z

Yes, I suppose that caller filters should override the .opaignore rules, so the caller filters should run first.

One thing to keep in mind/test is that --ignore should not cause the loader to ignore .opaignore files themselves. One of the main reasons we added --ignore in the first place was to let users ignore dotfiles (there is some odd behaviour when you volume mount configmaps in kubernetes that results in the configmap contents being duplicated under the volume mount directory.)

Looking forward to getting this into OPA.

teq0 · 2019-01-18T07:22:29Z

There are a few go implementations of .gitignore functionality, is it OK if I vendor one in rather than writing yet another one? If so, do you have a preference, or any guidelines for picking one?

tsandall · 2019-01-18T16:00:49Z

@teq0 It's fine to vendor as long as:

Minimal/no additional dependencies. In this case, I'm guessing there are no extra required deps.
ALv2 compatible license

teq0 · 2019-02-18T19:49:37Z

Just an update, I have it working, the actual code isn't that much, I'm a bit stuck on getting tests to work though and I've been a bit time poor. But it's coming...

tsandall · 2019-02-19T16:53:27Z

Great! If you get it to a point where you'd like to be done and want to hand it off, feel free to send me the patch and I can finish it off (with you as the git author.)

tsandall · 2021-11-23T22:35:53Z

With the new parser we released last year and the fact that opa can supports bundle layouts, I think this issue can be closed for now. If we want to revisit an .ignore file in the future or lazy loading we can do that with separate issues.

tsandall added enhancement help wanted labels Nov 26, 2018

tsandall mentioned this issue Jan 6, 2019

Add config setting to specify folders containing rego open-policy-agent/vscode-opa#12

Closed

patrick-east mentioned this issue Jun 18, 2019

Lazy load data files with opa eval #1502

Closed

patrick-east mentioned this issue Jul 24, 2019

Add bundle style directory loading for OPA CLI evaluations #1584

Closed

tsandall closed this as completed Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for file loading in large workspaces #1087

Improve support for file loading in large workspaces #1087

tsandall commented Nov 26, 2018

tsandall commented Jan 8, 2019

teq0 commented Jan 11, 2019 •

edited

Loading

tsandall commented Jan 11, 2019

teq0 commented Jan 12, 2019

tsandall commented Jan 12, 2019

teq0 commented Jan 18, 2019

tsandall commented Jan 18, 2019

teq0 commented Feb 18, 2019

tsandall commented Feb 19, 2019

tsandall commented Nov 23, 2021 •

edited

Loading

Improve support for file loading in large workspaces #1087

Improve support for file loading in large workspaces #1087

Comments

tsandall commented Nov 26, 2018

tsandall commented Jan 8, 2019

teq0 commented Jan 11, 2019 • edited Loading

tsandall commented Jan 11, 2019

teq0 commented Jan 12, 2019

tsandall commented Jan 12, 2019

teq0 commented Jan 18, 2019

tsandall commented Jan 18, 2019

teq0 commented Feb 18, 2019

tsandall commented Feb 19, 2019

tsandall commented Nov 23, 2021 • edited Loading

teq0 commented Jan 11, 2019 •

edited

Loading

tsandall commented Nov 23, 2021 •

edited

Loading