Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for file loading in large workspaces #1087

Closed
tsandall opened this issue Nov 26, 2018 · 10 comments
Closed

Improve support for file loading in large workspaces #1087

tsandall opened this issue Nov 26, 2018 · 10 comments

Comments

@tsandall
Copy link
Member

It should be possible to easily run tools like opa run or opa eval or opa test in workspaces that contain many files, e.g., mono repos. Currently tools like opa run load all JSON and .rego files out of the provided directories, recursively. In large workspaces like mono repos this can be quite expensive.

One option would be introduce support for an .opaignore file that users could configure to control file loading. The .opaignore file could include both blacklist and whitelist directives. For example:

*
!policies

This would ignore all files and then explicitly load files under the path policies.

To implement this we could update the loader package to look for the .opaignore file and then alter loading behaviour accordingly.

@tsandall
Copy link
Member Author

tsandall commented Jan 8, 2019

/cc @teq0 if you have further thoughts on how to implement this, feel free to post on Slack or here.

@teq0
Copy link
Contributor

teq0 commented Jan 11, 2019

Not sure about the whitelist part, but in the loader there's already the concept of filters, but at the moment filters appear to implemented one at a time, even though there's an exclude() function that takes an array of filters and applies them in order. I can't find any usage of the exclude() function though.

It seems like the simplest approach would be to substitute exclude() into all the places in loader.go that call a single filter directly, and to prime the array of filters with an opaignore() filter as the first filter, which runs each file path against the .opaignore values.

The thing I don't have a good handle on is how to inject a fake .opaignore for testing purposes.

And I'm not sure of the use case for the whitelist. To include specific files in folders that are blacklisted?

@tsandall
Copy link
Member Author

Not sure about the whitelist part, but in the loader there's already the concept of filters, but at the moment filters appear to implemented one at a time, even though there's an exclude() function that takes an array of filters and applies them in order. I can't find any usage of the exclude() function though.

exclude is cruft, you can remove it. When I originally added filters I was planning to pass a set of them to the loader (hence exclude) but simplified to one because callers can just implement that logic themselves (and decide whether they want the filters to AND or OR.) I'd rather leave that behavior up to the caller.

It seems like the simplest approach would be to substitute exclude() into all the places in loader.go that call a single filter directly, and to prime the array of filters with an opaignore() filter as the first filter, which runs each file path against the .opaignore values.

The code in the loader basically runs allRec for each path received by the loader. allRec then runs recursively for each path to be loaded. allRec calls the filter to check if a path should be processed. My thinking was that the .opaignore could be implemented by another filter (which could just wrap/call the user-supplied filter if the answer was indeterminate.)

The thing I don't have a good handle on is how to inject a fake .opaignore for testing purposes.

See the loader_test.go file, there are a bunch of integration tests there that set up a temporary filesystem (and do cleanup properly). You could add another test case to exercise this.

And I'm not sure of the use case for the whitelist. To include specific files in folders that are blacklisted?

The use case for whitelisting is when policies are kept in a monorepo and you have an IDE/workspace that loads the whole monorepo. In this world, blacklists are a pain to maintain. It's easier to say:

*
!policies

This is how gitignore works.

Hope this helps!

@teq0
Copy link
Contributor

teq0 commented Jan 12, 2019

Got it. So after https://github.com/open-policy-agent/opa/blob/master/loader/loader.go#L236 call another filter that checks against the .opaignore rules.

And I get the WithTempFS() now.

@tsandall
Copy link
Member Author

Yes, I suppose that caller filters should override the .opaignore rules, so the caller filters should run first.

One thing to keep in mind/test is that --ignore should not cause the loader to ignore .opaignore files themselves. One of the main reasons we added --ignore in the first place was to let users ignore dotfiles (there is some odd behaviour when you volume mount configmaps in kubernetes that results in the configmap contents being duplicated under the volume mount directory.)

Looking forward to getting this into OPA.

@teq0
Copy link
Contributor

teq0 commented Jan 18, 2019

There are a few go implementations of .gitignore functionality, is it OK if I vendor one in rather than writing yet another one? If so, do you have a preference, or any guidelines for picking one?

@tsandall
Copy link
Member Author

@teq0 It's fine to vendor as long as:

  1. Minimal/no additional dependencies. In this case, I'm guessing there are no extra required deps.
  2. ALv2 compatible license

@teq0
Copy link
Contributor

teq0 commented Feb 18, 2019

Just an update, I have it working, the actual code isn't that much, I'm a bit stuck on getting tests to work though and I've been a bit time poor. But it's coming...

@tsandall
Copy link
Member Author

Great! If you get it to a point where you'd like to be done and want to hand it off, feel free to send me the patch and I can finish it off (with you as the git author.)

@tsandall
Copy link
Member Author

tsandall commented Nov 23, 2021

With the new parser we released last year and the fact that opa can supports bundle layouts, I think this issue can be closed for now. If we want to revisit an .ignore file in the future or lazy loading we can do that with separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants