Add GH action format check files with `JuliaFormatter` #2074

Saransh-cpp · 2022-10-02T16:41:04Z

Most of the Julia packages I have seen lack a proper style guide for their codebase. I feel JuliaFormatter coupled with a clever workflow can help us maintain a clean codebase. This PR adds a bot-like workflow that fixes the formatting issues associated with a PR and pushes the fixed files in the same branch.

PR Checklist

Tests are added
Entry in NEWS.md
Documentation, if applicable

ToucheSir · 2022-10-02T18:39:56Z

Perhaps the title would be better with s/lint/(auto)format/? That is, unless you're planning on adding some actions for static analysis tools as well.

On formatting, we should use the same style across all FluxML repos. Just copying https://github.com/FluxML/Metalhead.jl/blob/master/.JuliaFormatter.toml should be enough, I think.

Saransh-cpp · 2022-10-03T14:48:19Z

Perhaps the title would be better with s/lint/(auto)format/? That is, unless you're planning on adding some actions for static analysis tools as well.

Are you referring to the PR's title or the commit message of the bot's commit? (Should it be - autoformat: format jl files with JuliaFormatter?)

On formatting, we should use the same style across all FluxML repos. Just copying https://github.com/FluxML/Metalhead.jl/blob/master/.JuliaFormatter.toml should be enough, I think.

Thanks! I will add this in!

Saransh-cpp · 2022-10-03T16:28:27Z

This works now!

Working on a PR created from the base repository - Saransh-cpp#53
Working on a PR created from a fork - Saransh-cpp#55

Note

This will work once this PR is merged (cannot test it before due to security reasons, explained below).
To accomplish this, the repository secrets must contain a personal access token (PAT) of an account with write access to FluxML. The token should be added to repository secrets with the name - "AUTO_LINT_PAT".

Why is PAT required?

A bot commit without a PAT does not trigger workflows. This means that the reformatted files will not go through the conventional CI, which can potentially be dangerous. Let's say a bug in JuliaFormatter rewrites a file with some broken syntax -> this broken formatting update will not be detected by the CI as the CI will not run on the bot commit.

Is this a security concern?

Since this workflow runs on pull_request_target, the PAT itself cannot be leaked. I have successfully tested this here -

Saransh-cpp#55 (see the files changed in the last commit - I tried printing out the PAT that, in a sense, is an attempt to hack FluxML's private tokens) -> https://github.com/Saransh-cpp/Flux.jl/actions/runs/3175153155/jobs/5172838261 (see how the "HACK" step was not executed at all).

This shows that GitHub will run the "Lint" workflow according to the instructions written in the base FluxML repository and not the fork repository.

Due to this reason, we cannot test this action on this PR (as pull_request_merged does not run the workflow using the instructions written in a fork). For more reference - https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

Having said that, mixing pull_request_target with an explicit PR checkout is not always vulnerable. The workflow may, for example:

Reformat and commit the code

Checkout both base and head repositories and generate a diff

Run grep on the checked out source.

Generally speaking, when the PR contents are treated as passive data, i.e. not in a position of influence over the build/testing process, it is safe. But the repository owners must be extra careful not to trigger any script that may operate on PR controlled contents like in the case of npm install.

DilumAluthge · 2022-10-04T05:35:10Z

.github/workflows/lint.yml

+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v3


Combining pull_request_target workflow trigger with an explicit checkout of an untrusted PR is a dangerous practice that may lead to repository compromise.

https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

I see that you already quoted this portion above:

But the repository owners must be extra careful not to trigger any script that may operate on PR controlled contents like in the case of npm install.

This is the problem. Combining pull_request_target with a checkout is a huge footgun. Can it be done safely? Potentially, yes. But it is so easy to make a mistake here. All it takes is for a committer to merge a PR with an innocent looking change (e.g. changing julia to julia --project in the lint workflow), and the repository has been compromised.

My recommendation would be to not go with this approach, because it is too easy to end up compromising the repository.

Instead, if the goal is to enforce a common format across all repositories in the FluxML organization, I would recommend:

Have a CI job that checks if the source code is formatted correctly, and errors if any formatting problems are detected.

Use the needs construct to force all other CI jobs to wait for the successful completion of the formatting check.

If the formatting check does not succeed, do not run any other CI jobs.

Actually, in my above comment, change "potentially compromise" to "compromise". This PR already compromises the repository.

Thanks, @DilumAluthge! Good thing I cross-posted this in ci-dev. I agree, this workflow is potentially risky. If someone merges a PR that changes the workflows then Flux would be at a security risk. Leaving out the PAT completely is equally risky as that would allow the GitHub actions bot to commit anything without the CI running on it.

This PR already compromises the repository.

Could you let me know? Is it that one can write a script to leak the PAT in another workflow? I tried this in one of my PRs, but GitHub does not allow the user to access the PAT -

- name: POTENTIAL HACK run: echo ${{ secrets.PAT }}

ToucheSir · 2022-10-04T22:57:52Z

After another read-through, I agree with @DilumAluthge's concerns and raise one of my own. If the bot is appending formatting commits to PR branches, how are contributors meant to continue working on their feature branches from another checkout without running into conflicts? #2074 (comment) seems like a more reasonable approach and is already used in projects like https://github.com/JuliaDiff/ChainRulesCore.jl/blob/main/.github/workflows/format.yml. To ensure code is formatted in the remote branch, we could also consider a pre-commit hook.

Are you referring to the PR's title or the commit message of the bot's commit?

Both. Your proposed rename looks fine.

Saransh-cpp · 2022-10-05T18:07:44Z

A pre-commit hook would ideally solve all of the problems. I will look into how one develops that and how pre-commit.ci can support JuliaFormatter. For the time being, I think the suggestion of adding an action that fails on wrong formatting sounds best.

Saransh-cpp · 2022-10-05T18:35:25Z

is already used in projects like https://github.com/JuliaDiff/ChainRulesCore.jl/blob/main/.github/workflows/format.yml.

ChainRulesCore.jl tries to do something similar but by using an existing tool - reviewdog. As expected, review dog requires a PAT too - see this CI run -

reviewdog: This GitHub token doesn't have write permission of Review API [1], 
so reviewdog will report results via logging command [2] and create annotations similar to
github-pr-check reporter as a fallback.
[1]: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request_target, 
[2]: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/development-tools-for-github-actions#logging-commands

But printing the diff in the CI is useful too. I think I will use reviewdog without the commenting thing for FluxML too. (@DilumAluthge could you confirm if it is safe to pass in PAT in the reviewdog workflow?)

mcabbott · 2022-10-05T19:44:16Z

FWIW the formatter on ChainRulesCore.jl seems more irritating than helpful. It clutters actual review with lots of comments telling you that [1,2,3] is wrong. Code by different authors disagreeing on minor things like that seems far down the list of style problems -- e.g. there are some easy-to-read straightforward tests, and some where it's a really clever puzzle to figure out what the test is doing.

An ideal auto-formatter that results in less time spent thinking about spacing might be nice. But we spend very little. And a half-baked one which creates more fiddling seems much worse than where we are now.

Other concerns here are:

If we adopt 4 spaces, has someone figured out how to preserve git blame, including on the website?
Doing so will mean every open PR needs a pretty big rebase. I presume these would be smaller if it's set to two spaces.
Note that it edits examples to e.g. write m = Upsample(; scale = (2, 3)), I'm not sure we want that as the documented thing to type in.
It also seems to want a pretty short line length, e.g. this is a lot of lines for a simple thing.
Edit: likewise many throw(...) lines. Other grips include breaking a group of one-line definitions as one was too long, also here, replacing in in (f() for _ in 1:n_alts) with = which is just preference but one is better, adding return print(io, ")") which is pretty close to actively misleading. Maybe better config could avoid these?

ToucheSir · 2022-10-06T00:24:14Z

IMO the most important part of an autoformatter is that it enforces format(file) is the identity operation for all checked-in source files. Especially on FluxML repos, it can get quite frustrating to make a small change, then spend 2-3 minutes reverting or ignoring parts of the diff because you accidentally (or intentionally) bumped the format shortcut (git add -p helps but does not solve the problem).

If we adopt 4 spaces, has someone figured out how to preserve git blame, including on the website?

GitHub supports using .git-blame-ignore-revs for this. Metalhead uses this post FluxML/Metalhead.jl#168.

Saransh-cpp · 2022-10-06T13:57:20Z

Maybe better config could avoid these?

Yes! Right now this uses the config specified in Metalhead.jl, but we can change everything according to our preferences and needs.

Saransh-cpp · 2022-10-06T14:25:51Z

TODO

add .git-blame-ignore-revs
discuss formatting options

Saransh-cpp · 2022-10-15T14:28:34Z

Looks better now. Should the formatting options be discussed in the next community call? Here is the current configuration -

style = "sciml"
whitespace_in_kwargs = true
format_docstrings = true
always_for_in = true
join_lines_based_on_source = true
separate_kwargs_with_semicolon = true
always_use_return = true
margin = 92
indent = 4

Saransh-cpp · 2022-10-16T11:35:14Z

separate_kwargs_with_semicolon is false now. See #2086.

Saransh-cpp · 2023-03-22T15:07:28Z

Given that the PR to add pre-commit support to Julia's main repository was closed, and several developers do not like how JuliaFormatter can be noisy and unconfigurable sometimes, I think it would be best to close this PR for now. There have been some initiatives (thank you!) to support Julia on pre-commit, and I am keeping a close eye on this issue - pre-commit/pre-commit#2689 - for further updates.

Pre-commit support will make setting up a formatting workflow extremely easy and reliable. Please let me know if this PR is still required. Closing this for now :(

ToucheSir · 2023-03-22T15:32:25Z

I think it would be better if we didn't have to rely on pre-commit (which is a heavy dependency and requires a Python interpreter). A solution which works with pure Julia would be better and should be doable, someone just has to write it.

The other blocker is the JuliaFormatter latency from the command line is really only acceptable on 1.9+. If the binary distribution story can be figured out before 1.9 becomes ubiquitous, then we can consider that route instead.

Saransh-cpp · 2023-03-22T17:37:50Z

I hadn't factored in adding pre-commit as a dependency.

Trying out JuliaFormatter after 1.9 is out of Beta sounds great. I think it would be better to create a new PR when 1.9 is out. Resolving conflicts in this PR is getting more and more difficult with every update in Flux's codebase 😅

Saransh-cpp closed this Oct 2, 2022

Saransh-cpp reopened this Oct 2, 2022

Saransh-cpp marked this pull request as draft October 2, 2022 18:20

Saransh-cpp marked this pull request as ready for review October 3, 2022 14:49

DilumAluthge reviewed Oct 4, 2022

View reviewed changes

Saransh-cpp force-pushed the lint-it branch from d97cc7a to 1d40095 Compare October 5, 2022 18:57

Saransh-cpp changed the title ~~Add GH Action to lint every PR~~ Add GH action to format files with JuliaFormatter Oct 5, 2022

Saransh-cpp force-pushed the lint-it branch from 5ea7255 to 289aa9d Compare October 6, 2022 13:59

Saransh-cpp mentioned this pull request Oct 8, 2022

[WIP] adding pre-commit CI and JuliaFormatter JuliaLang/julia#47094

Closed

5 tasks

Saransh-cpp changed the title ~~Add GH action to format files with JuliaFormatter~~ Add GH action format check files with JuliaFormatter Oct 15, 2022

Saransh-cpp force-pushed the lint-it branch from 95457d1 to 9cfa535 Compare October 15, 2022 14:26

Saransh-cpp force-pushed the lint-it branch from 9cfa535 to 3325e11 Compare October 16, 2022 11:34

Saransh-cpp added 2 commits October 19, 2022 14:09

Massive reformat

bf30ed7

Preserve git blame

f263ed7

Saransh-cpp force-pushed the lint-it branch from 3325e11 to f263ed7 Compare October 19, 2022 08:41

Saransh-cpp closed this Mar 22, 2023

Saransh-cpp mentioned this pull request Aug 29, 2023

Adding tooling for JuliaFormatter. #2323

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GH action format check files with `JuliaFormatter` #2074

Add GH action format check files with `JuliaFormatter` #2074

Saransh-cpp commented Oct 2, 2022 •

edited

Loading

ToucheSir commented Oct 2, 2022

Saransh-cpp commented Oct 3, 2022 •

edited

Loading

Saransh-cpp commented Oct 3, 2022

DilumAluthge Oct 4, 2022

DilumAluthge Oct 4, 2022 •

edited

Loading

DilumAluthge Oct 4, 2022

DilumAluthge Oct 4, 2022 •

edited

Loading

Saransh-cpp Oct 4, 2022 •

edited

Loading

ToucheSir commented Oct 4, 2022

Saransh-cpp commented Oct 5, 2022

Saransh-cpp commented Oct 5, 2022 •

edited

Loading

mcabbott commented Oct 5, 2022 •

edited

Loading

ToucheSir commented Oct 6, 2022

Saransh-cpp commented Oct 6, 2022

Saransh-cpp commented Oct 6, 2022 •

edited

Loading

Saransh-cpp commented Oct 15, 2022

Saransh-cpp commented Oct 16, 2022

Saransh-cpp commented Mar 22, 2023

ToucheSir commented Mar 22, 2023

Saransh-cpp commented Mar 22, 2023

Add GH action format check files with JuliaFormatter #2074

Add GH action format check files with JuliaFormatter #2074

Conversation

Saransh-cpp commented Oct 2, 2022 • edited Loading

PR Checklist

ToucheSir commented Oct 2, 2022

Saransh-cpp commented Oct 3, 2022 • edited Loading

Saransh-cpp commented Oct 3, 2022

Note

Why is PAT required?

Is this a security concern?

DilumAluthge Oct 4, 2022

Choose a reason for hiding this comment

DilumAluthge Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

DilumAluthge Oct 4, 2022

Choose a reason for hiding this comment

DilumAluthge Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

Saransh-cpp Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

ToucheSir commented Oct 4, 2022

Saransh-cpp commented Oct 5, 2022

Saransh-cpp commented Oct 5, 2022 • edited Loading

mcabbott commented Oct 5, 2022 • edited Loading

ToucheSir commented Oct 6, 2022

Saransh-cpp commented Oct 6, 2022

Saransh-cpp commented Oct 6, 2022 • edited Loading

Saransh-cpp commented Oct 15, 2022

Saransh-cpp commented Oct 16, 2022

Saransh-cpp commented Mar 22, 2023

ToucheSir commented Mar 22, 2023

Saransh-cpp commented Mar 22, 2023

Add GH action format check files with `JuliaFormatter` #2074

Add GH action format check files with `JuliaFormatter` #2074

Saransh-cpp commented Oct 2, 2022 •

edited

Loading

Saransh-cpp commented Oct 3, 2022 •

edited

Loading

DilumAluthge Oct 4, 2022 •

edited

Loading

DilumAluthge Oct 4, 2022 •

edited

Loading

Saransh-cpp Oct 4, 2022 •

edited

Loading

Saransh-cpp commented Oct 5, 2022 •

edited

Loading

mcabbott commented Oct 5, 2022 •

edited

Loading

Saransh-cpp commented Oct 6, 2022 •

edited

Loading