Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GriffeLoader breaks dealing with hidden .venv folder #185

Closed
rsenseman opened this issue Jul 24, 2023 · 19 comments
Closed

GriffeLoader breaks dealing with hidden .venv folder #185

rsenseman opened this issue Jul 24, 2023 · 19 comments

Comments

@rsenseman
Copy link

rsenseman commented Jul 24, 2023

Describe the bug

During my CI a .venv folder is created locally to run tests. The presence of this .venv folder then breaks the docs generation.

When I attempt to run mkdocs build when the .venv folder is present, I get the following error message:

ERROR    -  Error reading page '<redacted filepath>/<redacted filename>.md': ''
Traceback (most recent call last):
  File "/Users/rsenseman/Development/<redacted project name>/docs/.venv/lib/python3.8/site-packages/griffe/loader.py", line 550, in _get_or_create_parent_module
    parent_module = parent_module[parent_part]
  File "/Users/rsenseman/Development/<redacted project name>/docs/.venv/lib/python3.8/site-packages/griffe/mixins.py", line 31, in __getitem__
    return self.members[parts[0]][parts[1:]]  # type: ignore[attr-defined]
KeyError: ''

This is unintuitive because the error message makes it look like there is some issue with the legitimate file <redacted filename>.md, but in reality what is breaking is some code in the GriffeLoader class that as far as I can tell is attempting to built out the directory structure of the repo.

Here's what the file structure of the Python project looks like with this .venv folder:
Screenshot 2023-07-24 at 4 35 41 PM

I'm not exactly sure why the .env folder specifically causes an issue, but regardless hidden/dot folder should just be ignored. I tried to use the mkdocs-exclude plugin to see if the exclusions would work with mkdocstrings but that was unsuccessful.

Debug logs
Relevant logs with sensitive project information excluded:

ERROR    -  Error reading page '<redacted filepath>/<redacted filename>.md': ''
Traceback (most recent call last):
  File "/Users/rsenseman/Development/<redacted project name>/docs/.venv/lib/python3.8/site-packages/griffe/loader.py", line 550, in _get_or_create_parent_module
    parent_module = parent_module[parent_part]
  File "/Users/rsenseman/Development/<redacted repo name>/docs/.venv/lib/python3.8/site-packages/griffe/mixins.py", line 31, in __getitem__
    return self.members[parts[0]][parts[1:]]  # type: ignore[attr-defined]
KeyError: ''

To Reproduce
Steps to reproduce the behavior:

  1. Have virtual environment .venv folder in Python project files.
  2. Run mkdocs build
  3. This will raise the error

Expected behavior

I would expect all contents of hidden repos to be ignored.

System (please complete the following information):

  • griffe version: 0.32.3
  • Python version: 3.8.13
  • OS: mac

Additional context

I can implement a fix for this after opening the issue

@pawamoy
Copy link
Member

pawamoy commented Jul 25, 2023

Hello, thanks for the report. Unfortunately this is not enough to reproduce the problem. Creating a .venv in my projects does not break the mkdocs build command.

I would have to see how you configure mkdocstrings, how your package is structured and how you inject documentation into your docs to have a better idea on what's going on 🙂

If you are able to build a minimal reproduction example that would be best, of course.

@rsenseman
Copy link
Author

The problem with the .venv folder is that it's full of .py files that the GriffeLoader tries to parse. Instead of trying to describe our work project, I made a small repo that replicates the problem: https://github.com/rsenseman/mkdocs_simple

After cloning the repo, from <your_path>/mkdocs_simple, run:

# Step 1: run some Python code via Poetry. Poetry will create the `.venv` folder and associated virtual environment files
cd mkdocs-simple && poetry run python mkdocs_simple/add.py

# Step 2: Build the docs. This will generate the error described above
cd ../docs && poetry run mkdocs build

@pawamoy
Copy link
Member

pawamoy commented Jul 25, 2023

Could you update the repo so that Poetry uses an in-project venv? I'm not sure to remember how to configure it to do that.

@pawamoy
Copy link
Member

pawamoy commented Jul 25, 2023

Also, the issue seems to be that you add the root folder of the repository to the paths of the Python handler. But the actual package is one folder below it, in mkdocs-simple again. With such a folder structure you should define paths: [../mkdocs-simple].

@rsenseman
Copy link
Author

The poetry related files for the project are in the mkdocs-simple folder

The only reason I did it this way for this simple example is it mirrors the actual project I'm working on that has multiple projects nested below the main repo-level project.

Once the docs are built, everything works fine and renders as expected, the issue arrises during the build phase of the docs if the virtual environment files are present.

@pawamoy
Copy link
Member

pawamoy commented Jul 25, 2023

Have you tried my suggestion above?

rsenseman pushed a commit to rsenseman/griffe that referenced this issue Jul 26, 2023
When searching module subpaths to load files for parsing, skip hidden folders that start with the `.` character.

Issue mkdocstrings#185: mkdocstrings#185
@rsenseman
Copy link
Author

We do not want to change the poetry venv creation strategy or rearrange the project structure.

Do you find it agreeable that the finder should not examine files in hidden folders? I think this is correct for two reasons:

  • It is the expected behavior. Hidden folders that are ignored by everything except the tools that created them are not expected to get picked up in other tools' workstreams
  • It will provide a performance improvement by preventing the finder process from delving into a bunch of unnecessary Python files

I opened a PR to that effect that will preempt this issue: #188

Let me know what you think of this implementation

@pawamoy
Copy link
Member

pawamoy commented Jul 26, 2023

I was referring to this comment.

(My comment about Poetry venvs was about updating your example repo so that I could replicate, because Poetry does not always create venvs in the project folder. I was not asking you to change your setup)

Let me try to rephrase.

Here's your layout:

root/
    docs/
        mkdocs.yml
    project-one/
        project_one/  # actual package
        pyproject.toml

In mkdocs.yml, you have:

plugins:
- mkdocstrings:
    handlers:
      python:
        paths: [".."]

The paths are relative to the config file parent directory, so .. ends up being the root/ directory.
In your docs files, you inject documentation for project-one.project_one.add.
So Griffe searches for a package called project-one in the root folder and finds it. Then it iterates on every Python module it can find in this directory, recursively. This is the expected behavior.

What's wrong here, or at least unsupported, is that you inject docs for something that is not an actual Python package: project-one is a folder. The actual Python package is project-one/project_one.

Instead, you must set paths: [../project-one] in mkdocs.yml and inject docs with project_one.add.

Let me know what you think!


Thanks for the PR, but I will not merge it since it wouldn't fix anything for a venv named venv.

@rsenseman
Copy link
Author

rsenseman commented Jul 26, 2023

I see, I thought the poetry.toml file would correctly setup the virtualenv config. This can also be set by running:

poetry config virtualenvs.create true
poetry config virtualenvs.in-project true

The only problem I see with specifying the inner folders as paths is you will run into ambiguous selectors, for example with the structure:

root/
    docs/
        mkdocs.yml
    project-one/
        project_one/  # actual package
        tests/
            test_queries.py
        pyproject.toml
    project-two/
        project_two/  # actual package
        tests/
            test_queries.py
        pyproject.toml

I don't know what the behavior is if you try to select the test_queries.py file in either project with ::: tests.test_queries

If somebody wants to name a folder venv that's fine, that may be part of their project that they want documented. I think there is a more broad issue/solution here which is simply to ignore folders that are already considered "hidden".

@rsenseman
Copy link
Author

Do you think what I'm describing is a valid expected behavior? That folders named starting with a . should be ignored?

@pawamoy
Copy link
Member

pawamoy commented Jul 26, 2023

I don't know what the behavior is if you try to select the test_queries.py file in either project with ::: tests.test_queries

The first specified path would take precedence in both cases of regular and namespace packages, probably.

Do you publish API docs of your tests?

@rsenseman
Copy link
Author

Yes, this is a great benefit of "autodocs". In the old days I wouldn't bother with publishing docs for tests, but mkdocs makes it easy to have a landing page for the tests

@pawamoy
Copy link
Member

pawamoy commented Jul 27, 2023

I see, thanks :)

The issue with this is that it goes against the concept of creating an objects inventory. Obviously one cannot add duplicate objects in such an inventory, not even cross reference them reliably within the project docs. And if you prefix these objects with the parent folder name to make them unique, then their paths/ids are wrong, and others won't be able to get automatic cross references for them.

So in the end I'm inclined to say that this is an unsupported use case.

A workaround would be to put your test folders inside the Python packages, and maybe exclude them from the distributions you build (sdist/wheel).

root/
    project-one/
        project_one/
            __init__.py
            tests/
                __init__.py
                etc.

@rsenseman
Copy link
Author

I just wanted to suggest what I think is a clear improvement to the project. It even extends an existing patter in the code: where you exclude __pycache__ folders, I extend it to ignore hidden folders.

All your comments are ignoring the original intent of the issue to nit pick some reason why our project structure is untenable even though we are already successfully using mkdocs and mkdocstrings. The only thing that doesn't work is the finder gets tripped up when it runs into virtualenv related folders and files; sorting our why exactly this happens would be nice, I'm not sure exactly why it fails.

Next steps for me:

  1. If you want to go back and forth about our project structure and not address the intent of the issue, I will leave the issue open for someone else to take a look at it.
  2. If you think what I'm describing is a valid expected behavior, that folders named starting with a . should be ignored, I am happy to help implement it.
  3. If you think what I'm describing is an invalid expected behavior, that folders named starting with a . should not be ignored, I will close the issue and get on with my life.

In the cases of 1 and 3, I would ask that you add a more thoughtful error message than

ERROR    -  Error reading page '<redacted filepath>/<redacted filename>.md': ''
Traceback (most recent call last):
  File "/Users/rsenseman/Development/<redacted project name>/docs/.venv/lib/python3.8/site-packages/griffe/loader.py", line 550, in _get_or_create_parent_module
    parent_module = parent_module[parent_part]
  File "/Users/rsenseman/Development/<redacted repo name>/docs/.venv/lib/python3.8/site-packages/griffe/mixins.py", line 31, in __getitem__
    return self.members[parts[0]][parts[1:]]  # type: ignore[attr-defined]
KeyError: ''

which is hard to for the developer to debug. Something more helpful about why the file finding gets tripped up would be greatly appreciated.

@pawamoy
Copy link
Member

pawamoy commented Jul 28, 2023

It's true that I have answered your questions with other questions a few times. Sorry about that.

Now I wouldn't say I am nitpicking. I can understand that from your side, you focus on treating the symptom, i.e. modifying the code so that your use-case is better supported. You should however understand that I'm more interested in treating the cause. It may seem like ignoring dot-files/directories is an improvement, but doing that would actually validate a use-case that I consider unsupported. Accepting this change is opening the door to more requests for ignoring other files and folders,, but this is not how Griffe was designed. Then later, I'll get issues telling me that using Griffe that way does not generate a correct inventory in mkdocstrings and prevents cross-referencing objects. Maybe you don't care about that today, but you probably will in the future, or someone else will. You probably also know that blindly accepting changes (even one-liners) from contributors who don't know much about the project is generally not a good idea. There's a maintenance cost to everything. I'll be the one paying it. So yes, I think back and forth is needed, and this is generally the process that is followed in open-source projects. We discuss about an issue before designing the necessary changes and sending pull requests. So my answer to your previous question "do you think ignoring hidden folders is a good idea" is, no, I don't think so. That would be option 3.

You do make a point about improving the error message though, or improving handling of the error, and that's what I will do: I'll probably catch the key error that is triggered by trying to get a member with an empty name, coming from dot-splitting the member name (.venv -> ['', 'venv']). The rest of the code will already ignore all found modules inside this folder because it doesn't have a top-level __init__.py module.

@rsenseman
Copy link
Author

Fair enough. Thank you for considering the change and providing a thoughtful response to the pros and cons in addition to identifying the actual root cause of the issue.

@pawamoy
Copy link
Member

pawamoy commented Sep 1, 2023

Hey, sorry for the delay. Looking at the current state of the code, it should actually already be handled:

            try:
                parent_module = parent_module.get_member(parent_part)
            except KeyError as error:
                if parent_module.is_namespace_package or parent_module.is_namespace_subpackage:
                    next_parent_module = self._create_module(parent_part, [module_filepath])
                    parent_module.set_member(parent_part, next_parent_module)
                    parent_module = next_parent_module
                else:
                    raise UnimportableModuleError(f"Skip {subpath}, it is not importable") from error

That's weird because the exception handler was already there in 0.32.3 I believe. Can you confirm the issue still happens with latest version of Griffe?

@pawamoy
Copy link
Member

pawamoy commented Sep 1, 2023

OK I was able to confirm. The traceback was incomplete. I don't understand what's happening even with the full traceback so am currently debugging.

@rsenseman
Copy link
Author

Nice! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants