Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File modification times and symbolic link ".ipynb seems more recent than .py" #696

Closed
siddalp-actual opened this issue Dec 28, 2020 · 9 comments · Fixed by #719
Closed
Milestone

Comments

@siddalp-actual
Copy link

I'm a very newbie with jupytext and git, so I set up an odd pattern of working which seems to have exposed an issue.
I've installed jupytext and created a notebook for trying it out in the source tree I usually use for my jupyter projects.
I successfully paired the notebook with a .py file (I use jupyterlab so the command palette 'pair notebook with light script')
So I have

 jupyter/notebook.ipynb
 jupyter/notebook.py

All works fine.

Now I went off piste as follows:

  • move the notebook.py file to a project in my github environment
  • put a symbolic link to that github/notebook.py file in my jupyter directory, so now I have:
jupyter/notebook.ipynb
jupyter/notebook.py -> github/notebook.py

Again, all works fine with git seeing just the input cells of my notebook and I get nice succinct diffs (great, thank you).

However, when I come to reload the notebook.ipynb or notebook.py, I get the file sync error.

It would appear that the modification timestamp of the notebook.py symbolic link is being compared, rather than the modification timestamp of the file it points at.

(my reasoning for using the symbolic link to the .py file was that I wanted to version control the text and code portions of the notebook using git & github, while hiding from git the .ipynb file, .ipynb_checkpoints and any other files containing private data that I might have in my jupyter project tree. I can work around by adding entries to .gitignore to explicitly hide, which doesn't feel quite so safe as using links to explicitly expose)

@mwouts
Copy link
Owner

mwouts commented Dec 28, 2020

Hi @siddalp-actual , thanks for sharing this!

Well if I am correct we are getting the timestamp from the original context manager, through
notebook.services.contents.filemanager.FileContentsManager._base_model. That function seems to get the timestamp with this code:

info = os.lstat(os_path)
last_modified = tz.utcfromtimestamp(info.st_mtime)

Do you know how to get the timestamp of the file pointed by the symbolic link? Maybe then we could make a PR on jupyter/notebook that would solve your issue.

Alternatively, you could

  • set outdated_text_notebook_margin to a large number (infinite), but you lose the corresponding security - if you modify the .ipynb file without having Jupytext activated, those changes will be lost the next time you open the notebook with Jupytext on (so use this carefully)
  • or, and maybe that is a better option, you could pair your py and ipynb files to different subfolders like in the third example here, i.e. pair your .ipynb notebooks in the folder notebooks with text files under another folder scripts, by adding this to your jupytext.toml file:
# Pair notebooks in subfolders of 'notebooks' to scripts in subfolders of 'scripts'
default_jupytext_formats = "notebooks///ipynb,scripts///py:light"

@siddalp-actual
Copy link
Author

Thanks Marc,
I have created a notebook with a recreation of the issue and the gist of a fix here. A link can point to a link... so really ought to have a WHILE construct with some max depth protection against infinite loops.

I will also look into your suggestion of using the triple / root in default_jupytext_formats

@mwouts mwouts added this to the 1.9.2 milestone Jan 14, 2021
@mwouts
Copy link
Owner

mwouts commented Jan 14, 2021

Thanks @siddalp-actual .

I've added this to the next milestone - my objective here is to add a test that uses the jupytext.toml file documented above, and in which scripts is a symbolic link to another folder.

@siddalp-actual
Copy link
Author

thanks Marc, but I don't think the linked folder, on it's own, will recreate the issue. My understanding is that when the notebook.py file is a symbolic link, it is the 'old' modification time in that symbolic link which leads to the problem. However, if I were to put notebook.py into a scripts folder, then the inode for the .py file would contain the correct modification time, regardless of whether the scripts folder is where I think, or redirected via link to somewhere else.

@mwouts
Copy link
Owner

mwouts commented Jan 15, 2021

Hi @siddalp-actual , we do agree - there should be no issue with symbolic links to folders. That's why I recommend using symbolic links to folders, instead of symbolic links to files 😄

A precise description of what it takes to use paired files in symlink folders is done in the test test_paired_files_and_symbolic_links to be added at https://github.com/mwouts/jupytext/pull/719/files (passes on Linux, not on Windows - apparently pytest's tmpdir object does not implement the symbolic links on that platform).

Let me recall that fixing the timestamp of symbolic links to files needs to be done in the Jupyter Core programs - as discussed above Jupytext takes the timestamp from the (parent) contents manager.

@siddalp-actual
Copy link
Author

thank you again Marc, I now entirely understand that you are solely adding tests to verify that symlinks to folders work correctly, and that this is the recommended way to tackle use cases such as mine. I wont bug you any more on this issue ;-) But will leave it for your PR to close.
Regards, Pete

@mwouts
Copy link
Owner

mwouts commented Jan 18, 2021

Sure, no problem! Yes I just need to adjust the test on Windows, and I'll take the opportunity of the PR to add a word about this question in the FAQ.

By the way, I was thinking about what brought you to try using symbolic links, and if I am correct you wanted to include the .py file in your GitHub project, and keep an .ipynb file somewhere else, is that correct?

Are the two folders under your Jupyter root, or not? Is it correct that the symbolic links are required only if you want to have either the text or the ipynb file outside of the Jupyter root - otherwise one could use the pairing in trees with three /// as documented at https://jupytext.readthedocs.io/en/latest/config.html#configuring-paired-notebooks-globally ?

@siddalp-actual
Copy link
Author

Marc, thank you for your interest. Warning, long post with further details on my use case, what I've tried and what I think is going to work for me. I hope it helps.

You are correct: I have two project trees. One of git projects, and one of personal projects. Imagine I work on projectHybrid which I realise has code I want to expose on github, but the data etc that it runs against is personal to me. I'm trying to expose the .py part of a notebook in projectHybrid via github, but not the .ipynb part.

github/
  projectA/
    ...
  projectB/
  projectHybrid/
    interestingStuff.py
  etc

and

Dropbox/
  projects/
    projectX/
    projectY/
    projectHybrid/
      interestingStuff.ipynb
      personalData.db
    projectZ/    

I initially tried a hard-link of the .py file into the git tree, but the git mechanisms break them. Then I tried a soft-link and found this issue.

I start Jupyter with a --notebook-dir=<stuff>/Dropbox/projects, and I think Jupytext is finding .jupytext.yaml configuration in either my ~/.config directory, or specific project folders (eg where I override the script type from .py to .js depending on project language)

I don't think the /// paring works directly in my case. (I did manage to use it to move a notebook into a notebooks folder outside of my git project with project level .jupytext configuration, but TBH I think the doc needs an example of the difference between the /// and // configuration cases)

I'm currently heading down a road of each project having sub-folders named notebooks and src where I pair respectively the .ipynb and .py parts. When I want to expose some code, I move the src' sub-folder into the git tree, and in the Jupyter tree I create a symbolic link to that src` folder. (It looks like I can even cross-link the other way too, so I can open the notebook under either tree, have a single copy of data, and have the partitioning of public and private data). Here's the diagram:

github/
  projectHybrid/
    .gitignore (notebooks/*)
      notebooks/ -> is a link to <jupyter projects>/projectHybrid/notebooks
      src/
        interestingStuff.py

and

Dropbox/
  projects/
    projectHybrid/
      notebooks/
        interestingStuff.ipynb
      src/ -> is a link to <gitprojects>/projectHybrid/src
      personalData.db

@mwouts
Copy link
Owner

mwouts commented Jan 23, 2021

Thank you @siddalp-actual for the detailed explanation! Sure I completely agree that symbolic links give you a lot of freedom in how you can distribute/segregate your files, more that what you can get with /// (files will remain under a common root, which needs to be under the Jupyter root).

Yes I'll see how I can update the documentation on ///. Maybe it will be an opportunity to indicate that // may become deprecated soon. The difference is that /// can handle nested subfolders, while // cannot.

and I think Jupytext is finding .jupytext.yaml configuration in either my ~/.config directory, or specific project folders

Oh if you want to double check which config file is being used, see the Python commands at
https://jupytext.readthedocs.io/en/latest/config.html#possible-locations-and-formats

specific project folders (eg where I override the script type from .py to .js depending on project language)

For that one you can use the auto extension... You could have for instance

default_jupytext_formats = "ipynb,auto:percent"

in your main jupytext.toml file and this will have the effect to pair your Javascript notebooks to js:percent files, and Python notebooks to py:percent files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants