Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupytext in a pre-commit disagreement with trailing-whitespace and prettier #580

Open
tordbb opened this issue Jul 30, 2020 · 13 comments
Open

Comments

@tordbb
Copy link

tordbb commented Jul 30, 2020

Hi,

I am trying to run jupytext as a first step in pre-commit, before running among others trailing-whitespace and prettier.

My goal is for jupytext to generate .md files based on .ipynb files, since these are easier to version control than .ipynb.
The idea is that pre-commit will pass if no new .md file is added, and if no existing .md file is changed by jupytext.

The way I implemented this seems to cause a problem with two other hooks, trailing-whitespace and prettier. The way jupytext formats the .md files cause trailing-whitespace and prettier to fail and adjusting the code.
Is there any way to adjust jupytext or the way I use it, so that these other hooks do not fail?

This is my file .pre-commit-config.yaml which lives in the root directory:

# NOTE: The versions can be updated by calling
#        pre-commit autoupdate
repos:
  - repo: local
    hooks:
      - id: jupytext
        name: jupytext
        entry: jupytext --to .md notebooks/*.ipynb
        files: .ipynb
        language: python

  - repo: https://github.com/psf/black
    rev: 19.10b0
    hooks:
      - id: black

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.1.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer

  - repo: https://gitlab.com/pycqa/flake8
    rev: 3.8.3
    hooks:
      - id: flake8
        additional_dependencies:
          - flake8-bugbear
          - flake8-docstrings

  - repo: https://github.com/prettier/prettier
    rev: 2.0.5
    hooks:
      - id: prettier
        args: [--prose-wrap=always, --print-width=88]

  - repo: local
    hooks:
      - id: pylint
        name: pylint
        entry: pylint
        language: system
        types: [python]
        additional_dependencies: ["-r requirements.txt"]

If I update an .ipynb file and run pre-commit, this happens:

image

Immediately re-running pre-commit causes all other notebooks to be updated by jupytext,.

@tordbb tordbb changed the title Jupytext in a pre-commit disagreement with trailing-whitespace and prettier Jupytext in a pre-commit disagreement with trailing-whitespace and prettier Jul 30, 2020
@tordbb tordbb closed this as completed Jul 30, 2020
@tordbb
Copy link
Author

tordbb commented Jul 30, 2020

Seems like the problem was that .ipynb files were not checked or modified by trailing-whitespace and prettier, which created a unhelpful loop that happens every time pre-commit is being run:

  1. Jupytext imports whitespace-errors from dirty .ipynb file into .md file, and pre-commit sees this as jupytext failing.
  2. trailing-whitespace fixes the whitespace-error , and pre-commit sees this as trailing-whitespace failing.

So the solution, I guess, is either to disband the goal mentioned above, or to run all hooks also on .ipynb files. Not sure which I should go for.

@mwouts
Copy link
Owner

mwouts commented Jul 30, 2020

Hi @tordbb , thanks for taking the time to report this.

I must say that I am not sure yet what is the best way to use pre-commit hooks on notebooks (see also #432 , #446). This is a topic on which I'd like to make progress, but, just like you did here, many times I encountered difficulties in applying pre-commit hooks on pairs of files.

Maybe you could consider doing the following:
a) Apply the pre-commit hook to .md files only, and let Jupytext's plugin for Jupyter do the sync to the .ipynb file, using paired notebooks
b) Or, add an additional step at the end of your pre-commit hook: jupytext --to ipynb --update *.md to update the .ipynb file - but again, I am not sure how well the pre-commit hook works with files that you don't include in the commit - or maybe you can git add and the reset the .ipynb file in the hook itself like we do here?
c) Or, apply the pre-commit hooks to .ipynb files only, using jupytext --pipe-fmt markdown --pipe [cmd]

@tordbb
Copy link
Author

tordbb commented Jul 31, 2020

Thanks again for your help!
I found a viable solution by following your suggestion of adding another jupytext hook at the end.

The problem turned out to be how prettier did two changes to .md notebooks that were not being imported to .ipynb notebooks by jupytext:

  • Changing single ticks into double ticks:
    image

  • Adding spaces between cell headlines and the cells:
    image

When I made prettier exclude all files in the notebooks directory from inspection, everything worked out great. Ideally I wanted to include ipynb files, but did not find a regex that only excluded .md files inside notebooks/. Feel free to share if you can think of one.

I am not sure if jupytext should adapt the formatting that prettier idolizes, but that may be up to you.

See my final .pre-commit-config.yaml below:

# NOTE: The versions can be updated by calling
#        pre-commit autoupdate
repos:
  - repo: local
    hooks:
      - id: jupytext
        name: jupytext --to md
        entry: jupytext --to .md notebooks/*.ipynb
        files: .ipynb
        language: python

  - repo: https://github.com/psf/black
    rev: 19.10b0
    hooks:
      - id: black

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.1.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer

  - repo: https://gitlab.com/pycqa/flake8
    rev: 3.8.3
    hooks:
      - id: flake8
        additional_dependencies:
          - flake8-bugbear
          - flake8-docstrings

  - repo: https://github.com/prettier/prettier
    rev: 2.0.5
    hooks:
      - id: prettier
        args: [--prose-wrap=always, --print-width=88]
        # Avoid prettier from harmfully changing .md notebooks
        exclude: notebooks/

  - repo: local
    hooks:
      - id: pylint
        name: pylint
        entry: pylint
        language: system
        types: [python]
        additional_dependencies: ["-r requirements.txt"]

  - repo: local
    hooks:
      - id: jupytext
        name: jupytext --to ipynb
        entry: jupytext --to ipynb --update notebooks/*.md
        files: .md
        language: python

@tordbb tordbb reopened this Jul 31, 2020
@mwouts
Copy link
Owner

mwouts commented Jul 31, 2020

Thank you @tordbb for sharing this! At some point I'd like to write a quick post or documentation page on how to best use pre-commit hooks, may I reuse your config above?

a regex that only excluded .md files inside notebooks/

Nice challenge... let me think about it...

I am not sure if jupytext should adapt the formatting that prettier idolizes

How that's interesting... Jupytext uses yaml to encode the YAML header, so I am a bit surprised that the quotes don't match those of prettier.. do you have any idea how prettier parses the YAML, and which one is the most standard?

Regarding the extra blank line, Jupytext should include a blank line between any two cells (and thus match prettier, right?)
When you don't get a line in between, this is because the cell has a cell metadata lines_to_next_cell=0, which is itself coming from the text file, if at some point the markdown cell and the code cell were not separated by a blank line.

@tordbb
Copy link
Author

tordbb commented Aug 3, 2020 via email

@Skylion007
Copy link
Contributor

Skylion007 commented Aug 18, 2020

@tordbb @mwouts I quite like my solution to using JuPyText as a pre-commit hook which can be found below:

-   repo: local
    hooks:
    -   id: jupytext-sync
        name: Sync scripts and notebooks
        files: '^examples/tutorials/(colabs|nb_python)/(.*\.py|.*\.ipynb)$'
        entry: jupytext --update-metadata '{"jupytext":{"notebook_metadata_filter":"all", "cell_metadata_filter":"-all"}, "accelerator":"GPU"}' --set-formats 'nb_python//py:percent,colabs//ipynb' --pipe black --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" --pipe 'isort -' --pipe-fmt 'py:percent' --sync
        pass_filenames: true
        additional_dependencies:
            - 'jupytext==1.5.2'
            - black
            - isort
        always_run: false
        language: python

This has some benefits:

  • It will automatically install jupytext
  • It will run if the ipynb or the python file is modified. Meaning if the linters update the python file then those changes will be synced back to the notebook. I usually put JuPyText at or near the end for this reason
  • I can specify exactly which I want updated
  • Occasionally you can get into update loop where one will try to update the other since both sync directions are included, but this can be easily fixed by deleting one file.
  • The pipe commands ensure the linters are also run on the notebook, but they aren't strictly necessary. They are mostly to help get out of the loop mentioned before.

TLDR: Use --set-formats and --sync will solve a lot of your issues @tordbb since which file is newer will overwrite the changes in the other.

@mwouts
Copy link
Owner

mwouts commented Aug 18, 2020

Thank you @tordbb and @Skylion007 . This is getting interesting! When time permits I will experiment a bit more with this myself, and write a post with all due credit to your examples !

@Skylion007 , do we agree that --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" was motivated by #562? With Jupytext from master (to be shipped in 1.6.0) you won't need this any more (and spaces are now allowed in pipe commands 😃 ).

Also I see you use --pipe 'isort -', don't you experience issues like #553 ? On my side I now use --pipe 'isort - --treat-comment-as-code "# %%" --float-to-top' (requires isort in dev version, as these two options were introduced very recently by @timothycrosley, the author of isort).

@Skylion007
Copy link
Contributor

@mwouts Yeah, but my pre-commit hook downloads from pip so until it's published I can't remove that workaround. 👍 I do experience issues like #553, but it hasn't broken anything, yet. Also waiting for that fix to be merged.

@Skylion007
Copy link
Contributor

@mwouts More than that, I would like to code up a simple pre-commit yaml for the repo. One that doesn't need to use the current the --pre-commit arg and uses the nice features baked into pre-commit.com

@mwouts
Copy link
Owner

mwouts commented Aug 30, 2020

I would like to code up a simple pre-commit yaml for the repo

Oh, that would be awesome!

@grst
Copy link
Contributor

grst commented Jul 8, 2022

I am also having an issue with jupytext and prettier generating conflicting formatting.
In my case, the issue is caused by our .editorconfig setting the default indentation to four spaces also for yaml files.
That's to say that unless jupytext respect all sort of config files when generating the markdown, there's no way to achieve perfect consistency with prettier or other formatting tools.

image

I think the best would be to pipe the output through prettier as well, as suggested for black:

-args: ['--pipe', 'black']
+args: ['--pipe', 'black', '--pipe', 'prettier -w {}']

However, it is the markdown file that would need to be piped through prettier, and I don't think one can have multiple --pipe-fmt in the same command:

jupytext --sync --pipe-fmt py:percent --pipe black --pipe-fmt md --pipe 'prettier -w {}' 

@mwouts
Copy link
Owner

mwouts commented Aug 15, 2022

Hi @grst , I do like the suggestion of allowing multiple --pipe-fmt arguments. We could implement that (I'd require that --pipe-fmt would be either of length 1, or the same length as --pipe), but I was trying to see if I could write a test, and I see that we already have an example of a pre-commit hook where Jupytext is called twice, with two different --pipe-fmt:

- id: jupytext
args: [--sync, --pipe-fmt, ipynb, --pipe, 'pandoc --from ipynb --to ipynb --markdown-headings=atx', --show-changes]
additional_dependencies:
- nbformat==5.0.8 # because pandoc 2.11.4 does not preserve yet the new cell ids
- id: jupytext
args: [--sync, --pipe, black, --show-changes]
additional_dependencies:
- black==22.3.0 # Matches black hook below
- nbformat==5.0.8 # for compatibility with the pandoc hook above

Do you think the same approach could be used to call prettier (say, in addition to black) when doing jupytext --sync?

@grst
Copy link
Contributor

grst commented Aug 16, 2022

I haven't tried it, but as long as it's two separate steps in pre-commit, I'd expect it to suffer from the same problem: If the first step generates a different formatting than the second one they cannot pass at the same time (pre-commit fails if a file got modified).

That being said, in the above example it could work, i.e. the first jupytext command only modifies an ipynb file (e.g. formatting it with black), the markdown is only updated in the second step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants