Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better documentation for how files are included in built artifacts #2015

Closed
1 task done
AstraLuma opened this issue Feb 10, 2020 · 38 comments
Closed
1 task done

Better documentation for how files are included in built artifacts #2015

AstraLuma opened this issue Feb 10, 2020 · 38 comments
Labels
area/build-system Related to PEP 517 packaging (see poetry-core) area/docs Documentation issues/improvements

Comments

@AstraLuma
Copy link

  • I have searched the issues of this repo and believe that this is not a duplicate.

Issue

While helping with licensezero/cli#3, I was looking for "how to do package data with poetry" but found the docs were unclear or at least extremely implicit about it.

@AstraLuma AstraLuma added the area/docs Documentation issues/improvements label Feb 10, 2020
@finswimmer
Copy link
Member

Hello @astronouth7303,

could you please add more details about what you want to add to a package and/or what information you are missing?

fin swimmer

@AstraLuma
Copy link
Author

What setuptools calls package data: Non-module files.

The specific discussion was about licensezero.json, but this varies from static web assets to templates to UI definitions.

@thejohnfreeman
Copy link
Contributor

From what I understand, every file in your package directory will be included in the package, unless it is ignored by your .gitignore. If you want to include a file that is ignored by .gitignore, e.g. generated source code in my case, then add it to [tool.poetry].include.

@meshy
Copy link

meshy commented Feb 26, 2020

@thejohnfreeman that's not working for me -- if I explicitly include a file that's in .gitignore, I'm seeing this when running poetry build:

[IndexError]
list index out of range

Specifically, the sdist step seems to succeed, but the wheel step fails.

EDIT:

While the sdist step does not appear to throw an error, the built .tar.gz does not include the file as directed.

@thejohnfreeman
Copy link
Contributor

Do you have a link to the failing example, maybe in a branch on GitHub?

@meshy
Copy link

meshy commented Feb 27, 2020

@thejohnfreeman yes! Thanks for the quick reply.

Here's the branch: https://github.com/meshy/django-schema-graph/tree/poetry
And here's the related PR: meshy/django-schema-graph#12

EDIT:

I've noticed that there's a stacktrace leading to an IndexError in the failed CI build on that branch. Perhaps that will help diagnose this issue.

@meshy
Copy link

meshy commented Apr 29, 2020

Here's the stacktrace from the failed CI build -- hopefully it'll help:

Traceback (most recent call last):
  File "/.../pip/_vendor/pep517/_in_process.py", line 257, in <module>
    main()
  File "/.../pip/_vendor/pep517/_in_process.py", line 240, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
  File "/.../pip/_vendor/pep517/_in_process.py", line 110, in prepare_metadata_for_build_wheel
    return hook(metadata_directory, config_settings)
  File "/.../poetry/masonry/api.py", line 39, in prepare_metadata_for_build_wheel
    builder = WheelBuilder(poetry, SystemEnv(Path(sys.prefix)), NullIO())
  File "/.../poetry/masonry/builders/wheel.py", line 44, in __init__
    super(WheelBuilder, self).__init__(poetry, env, io)
  File "/.../poetry/masonry/builders/builder.py", line 64, in __init__
    self._module = Module(
  File "/.../poetry/masonry/utils/module.py", line 68, in __init__
    PackageInclude(
  File "/.../poetry/masonry/utils/package_include.py", line 15, in __init__
    self.check_elements()
  File "/.../poetry/masonry/utils/package_include.py", line 37, in check_elements
    root = self._elements[0]
IndexError: list index out of range

@thejohnfreeman
Copy link
Contributor

Sorry I didn't see this earlier. I could have solved this for you back in February! You need to change some lines in your pyproject.toml:

packages = [{ include = "schema_graph" }]
include = ["schema_graph/static/schema_graph/main.js"]

The include setting under the [tool.poetry] section is what I was talking about when I said [tool.poetry].include (and I had linked to the documentation).

@meshy
Copy link

meshy commented Apr 29, 2020

Oh excellent! -- I didn't quite get that the first time round. Thank you for clarifying!

@brianbruggeman
Copy link

@thejohnfreeman

How do I include a full set of sub folders like botocore?

@thejohnfreeman
Copy link
Contributor

@brianbruggeman include accepts globs (example).

@hyliu1989
Copy link
Contributor

hyliu1989 commented Nov 13, 2020

@thejohnfreeman It seems like the include is using relative paths relative to the root directory of the pyproject.toml. Could you give or add an example in the include and exclude section for what the relative path is based?

For example,

dummy_folder/
    pyproject.toml
    CHANGE.log
    my_package/
        __init__.py
        my_data.csv

You would specify include = ['my_package/my_data.csv'] in pyproject.toml. However, this is not compatible with setup.py and would poetry fail when the user specifies include = ['CHANGE.log'] because only the things in my_package/ will be installed to site-packages?

@hyliu1989 hyliu1989 mentioned this issue Nov 13, 2020
2 tasks
@sinoroc
Copy link

sinoroc commented Nov 14, 2020

@hyliu1989
As far as I can tell, the paths in include and exclude are all relative to the location of pyproject.toml.
I know it is not comparable with the flexibility of setuptools. What are you missing in Poetry compared to setuptools? For files to be installed, they have to belong to a package. One notable exception is the license file, which is somehow installed as well, even though it is not part of any package. So a file right next to pyproject.toml such as your CHANGE.log will not be installed.

@hyliu1989
Copy link
Contributor

hyliu1989 commented Nov 15, 2020

Thanks for the clarification. However, contrary to your explanation,

  1. poetry actually puts the CHANGE.log into the site-packages/. This is the case when putting include = ["CHANGE.log"] in the pyproject.toml (2) in the following file structure. There will end up being env_path/site-packages/my_dependency/ and env_path/site-packages/CHANGE.log! This is quite unexpected.
  2. poetry does not throw any complains if I specify a non-existing files. For example, putting include = ["CHANGE.log", "XXX_not_exist"] in pyproject.toml (2) below. This will make a developer who was converted from using setup.py think that their old way of specification works while it does not.

I produced this case with poetry 1.1.4 with MacOS 10.14.6. The python version is 3.8.5.

File structure

dummy_folder/
    pyproject.toml (1)
    my_package/
        __init__.py
        my_data.csv
    lib/
        my_dependency_folder/
            CHANGE.log
            pyproject.toml (2)
            setup_cython.py
            my_dependency/
                __init__.py

Example pyproject.toml (2)

[tool.poetry]
name = "my_dependency"
version = "0.1.0"
description = ""
authors = [""]
build = "setup_cython.py"
include = ["CHANGELOG.md", "XXX_not_exist"]

[tool.poetry.dependencies]
python = "^3.7"
numpy = "^1.19.1"
cython = "^0.29.21"

[build-system]
requires = ["poetry_core>=1.0.0","cython","setuptools"]
build-backend = "poetry.core.masonry.api"

Here setup_cython.py is just some script I used following the undocumented feature mentioned in #11. I don't think it is critical to this case and can be safely removed when you try to reproduce this case.

However, I do have to specify setuptools in build-system requires in order to poetry to install the package correctly.

Example pyproject.toml (1)

I have the following chunk:

[tool.poetry.dependencies]
my_dependency = {path = "./lib/my_dependency_folder"}

@sinoroc
Copy link

sinoroc commented Nov 15, 2020

@hyliu1989

poetry actually puts the CHANGE.log into the site-packages/

You are right. Looks like the CHANGE.log file is installed. I wrote my previous comment too hastily.

It could be that everything that's in the wheel gets installed, and this is not something poetry can change. It might be that it is how wheels are standardized, you (or I) should read the standard more in details to be sure (PEP 427).

Looks like in poetry files can either be in both distributions (sdist and wheel) or none at all and that's it. Which is not ideal from my point of view. Maybe it is possible to get more flexibility, and I just do not know how yet.

poetry does not throw any complains if I specify a non-existing files

It might be a feature so that files created during custom build steps are added to the wheel. If it is not the case, then yes maybe poetry should throw a warning. But what to do if a joker is used, such as include = ['my_package/data/*.txt']? I do not know.

However, I do have to specify setuptools in build-system requires in order to poetry to install the package correctly.

I am not following this bit. Would you mind sharing more details about this?


Personal opinion:

Looks likes a lot of flexibility is missing in my opinion. It should be possible to specify files to be included in either the sdist, or wheel, or both, or none. Even more so when a project has custom build steps (as it seems to be your case).

  • it should be possible to have files that are placed in the sdist but not in the wheel (i.e. not installed)
    • typically you would want to have the test suite in the sdist, but not install it, so tests should not be in the wheel
  • the other way around, it should also be possible to have files that are in the wheel, but not in the sdist
    • it might be needed if a project has custom build steps, for example typically you may want to have some gettext *.po files that are transformed into *.mo files during a custom build step (i.e. build takes sdist as input and gives a wheel as output), so the sdist should contain only the *.po files and the wheel should contain only the *.mo files.

See how I do this with setuptools for example:

@hyliu1989
Copy link
Contributor

I am not following this bit. Would you mind sharing more details about this?

It is the same issue as #3153, which my workaround is to add setuptools in the build.requires.

@sinoroc
Copy link

sinoroc commented Nov 17, 2020

Correction...

Example project here: https://github.com/sinoroc/poetry-gh-2015

Project file structure:

$ tree Thing
Thing
├── CHANGELOG.rst
├── LICENSE.txt
├── pyproject.toml
├── README.rst
├── test
│   └── test_unit.py
└── thing
    ├── data
    │   ├── file.all
    │   ├── file.bin
    │   ├── file.not
    │   └── file.src
    └── __init__.py

All files are in the git repository except *.bin data files (supposed to be a build artifact).

We want the *.src data files and CHANGELOG.rst in the sdist only. We want *.bin data files in the wheel only. We want *.all data files in both the sdist and the wheel. We do not want any of the .not data files in the distributions.

We also want the test package, but only in the sdist.

.gitignore

/thing/data/*.bin

pyproject.toml:

packages = [
    { include = 'thing' },
    { include = 'test', format = 'sdist' }
]
include = [
    { path = 'CHANGELOG.rst', format = 'sdist' },
    { path = 'thing/data/*.bin', format = 'wheel' },
    { path = 'thing/data/*.src', format = 'sdist' },
]
exclude = [
    { path = 'CHANGELOG.rst', format = 'wheel' },
    { path = 'thing/data/*.src', format = 'wheel' },
    'thing/data/*.not',
]

Content of the sdist:

$ python3 -m tarfile -l dist/Thing-0.1.0.tar.gz 
Thing-0.1.0/CHANGELOG.rst 
Thing-0.1.0/LICENSE.txt 
Thing-0.1.0/README.rst 
Thing-0.1.0/pyproject.toml 
Thing-0.1.0/test/test_unit.py 
Thing-0.1.0/thing/__init__.py 
Thing-0.1.0/thing/data/file.all 
Thing-0.1.0/thing/data/file.src 
Thing-0.1.0/setup.py 
Thing-0.1.0/PKG-INFO 

Content of the wheel:

$ python3 -m zipfile -l dist/Thing-0.1.0-py3-none-any.whl 
File Name                                             Modified             Size
thing/__init__.py                              1980-01-01 00:00:00            0
thing/data/file.all                            1980-01-01 00:00:00            0
thing/data/file.bin                            1980-01-01 00:00:00            0
thing/data/file.src                            1980-01-01 00:00:00            0
thing-0.1.0.dist-info/LICENSE.txt              1980-01-01 00:00:00            8
thing-0.1.0.dist-info/WHEEL                    2016-01-01 00:00:00           83
thing-0.1.0.dist-info/METADATA                 2016-01-01 00:00:00         1515
thing-0.1.0.dist-info/RECORD                   2016-01-01 00:00:00          577

Somehow the thing/data/file.src appears in the wheel, which is unexpected and not what we want.

These things are undocumented, or I could not find it. I somehow found out about it from #2789 (and this code).

Example project: https://github.com/sinoroc/poetry-gh-2015

@thejohnfreeman
Copy link
Contributor

All that's left for this use case is an option for a post-install script that can put data files where they need to be on the system, outside the site-packages directory (and a corresponding uninstall script to remove them).

@zeelot
Copy link

zeelot commented Mar 8, 2021

hey I think I might be blocked by this too. or something similar. I'm just testing poetry as an option today and simply not getting what I need and I'm extremely confused by the behavior I am seeing. The exact example in the docs here: https://python-poetry.org/docs/pyproject/#include-and-exclude seems like a super dangerous and bad practice and I see no docs about how to avoid it. If I have files at the root of my repo that I want included, they're just going to be dumped into the root of site-packages/ and stomp over w/e else is there. Is the example in the docs showing people how to drop a CHANGELOG.md file at the root of site-packages/? That doesn't seem like something anyone would ever want to do, so what is this feature used for? I must be missing something obvious so any help would be appreciated!

What I'm looking for is an option to include files with the ability to specify the destination path in the wheel. Can someone help me understand if this is related to this issue or some other known thing?

Here's my directory structure:

├── pyproject.toml
├── README.md
├── resources
│   ├── foo1.txt
│   ├── foo2.txt
│   └── foo3.txt
└── src
    └── thing
        ├── __init__.py
        ├── one.py
        └── two.py
  • how do I include the README.md file without it ending up in site-packages/?
  • how do I include the resources/* directory without it being at the root of site-packages/?

I specifically don't control that resources directory as it's used by more than the python side of things. but I would like to make sure it installs into a directory inside my distribution folder on target machines because I want to make sure I'm not breaking anything. in setup.py, the source and destination of files is configurable so I can at least attempt to be a good citizen.

@sinoroc
Copy link

sinoroc commented Mar 8, 2021

@zeelot

The easiest is to make the directory structure of your source code repository look exactly like the directory structure of your packages. Maybe use symbolic links to achieve this, should work perfectly fine with git.

Other thing I want to add is that what you are trying to do is not a "bad practice" but still not the best practice. There are legitimate reasons for what you mention as "seems like bad practice", one might have very good reasons to want to add some files directly to site-packages, examples are *.pth files and of course top-level modules (not packages). Now there also files that you want in the sdist (those are not necessarily installed) and the files you want in the wheel (those are usually installed, but not always). All that to say: things are complicated, there is no one size fits all.

To conclude, maybe show you pyproject.toml and we can start from there to help you get where you want, if the solution with symbolic links does not fit your requirements.

@zeelot
Copy link

zeelot commented Mar 12, 2021

Thanks for the response, @sinoroc. Unfortunately, I can't make my repo structure match the structure that I want in my package because my repo is used for more than python. Here's an example of my current pyproject.toml file:

[tool.poetry]
name = "thing"
version = "0.1.0"
description = ""
packages = [
    {include = "thing", from = "src"},
]

# these get placed in an unacceptable place but I cannot move them in my repo
include = ["resources/**/*", "README.md"]

[tool.poetry.dependencies]
python = ">3.7"
PyYAML = "^5.4.1"
# Dev Extras
tox = { version = "^3.23.0", optional = true }
pytest = { version = "^6.2.2", optional = true }

[tool.poetry.extras]
dev = ["tox", "pytest"]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

This is currently a blocker for me if it's not something I can fix with configuration because I cannot deploy a package that will simply throw stuff into site-packages/resources but these files are not owned by me in the repo and contain configs that cross languages/tools.

It seems like, in order to unblock myself, I should write a script that shuffles files around into a different structure during the packaging process in some tmp directory. Is that what some folks do?

@zeelot
Copy link

zeelot commented Mar 12, 2021

I guess it's not clear to me why I am able to specify a custom location of my python (with from) but I can't do that for any of the other files. I get the need to have the ability to place things in the root of site-packages, but that doesn't seem like it should be the default and only supported behavior.

@cmungall
Copy link

We are trying to get this to work and we share @tribals frustration.

Some of the solutions suggested in this thread:

  1. Use [tool.poetry].package
    • tried this and it still pollutes the top level
  2. some people seem to have had success using this undocumented syntax: { path = "MY_FILE", format = "sdist"}
  3. we haven't tried symlinks but this seems fraught with issues, as mentioned above

It seems the best solution is outlined here #5539

In our case, we want files in src/schema to be available in our_package, where we keep our python in src/our_package

We add this to our pyproject.toml

[tool.poetry.build]
generate-setup-file = false
script = "build.py"

caution: it seems this is a new undocumented feature and may change. I also don't know the fully implications of setting generate-setup-file to false

We then have:

from distutils.dir_util import copy_tree

if __name__ == "__main__":
    copy_tree("src/schema", "our_package/schema")

I don't love this though, as it leaves the local tree with duplicate copies of the files, which may cause confusion when developers locally run code that uses pkgutils

I would really love to see first-class support for this in poetry, or at least have the documentation advertise the problems with poetry.include polluting the top level.

@clintonroy
Copy link
Contributor

poetry aims to create wheels. Things like data_files doesn't work in wheels, hence there's no hook poetry can use for files destined outside of python package area. If you want this capability, the only place that will happen is upstream of poetry.

@abn
Copy link
Member

abn commented May 21, 2022

I suspect there are two camps here. One that wants data files that need to be installed outside the system site packages and the other that just wants the include files to be placed into the package directory in the wheel during build. The former has many thorns when it comes to universal wheels, and tools that install wheels do not really support it consistently across the board. The latter is less controversial I reckon.

However it should also be noted that this will need to be handled somehow for editable installs as well since adding these directories to the .pth file is insufficient and might actually cause more problems than you solve.

@cmungall I'm also curious, is there a reason why src/schema is not placed as src/package/schema in your source tree. This is better for editable installs as well.

@cmungall
Copy link

is there a reason why src/schema is not placed as src/package/schema in your source tree. This is better for editable installs as well.

Many of our repos are not python-centric; the schemas (specified as linkml are language independent. By forcing the source to be in src/package/schema we are forcing pythonic conventions on a broad community of data modelers.

@neersighted neersighted changed the title Package data? Better documentation for how files are included in built artifacts Oct 4, 2022
@neersighted neersighted added the area/build-system Related to PEP 517 packaging (see poetry-core) label Oct 4, 2022
@finswimmer
Copy link
Member

Closing this, because several things seems to be discussed here and the OP hasn't come back to tell if they question is answered.

Feel free to open a new ticket if you have any explicit question.

@finswimmer finswimmer closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2022
@AlecThomson
Copy link

Hey there - I've come here via Google seeking a replacement for package_data in setup.py. I'm encountering the same problem of the solutions mentioned in this thread putting data into site-packages/.

Other than the build script above, is it possible to configure included files to be placed in the site-packages/package directory? My use case would be accessing static resource files using pkg_resources.resource_filename at runtime.

@sinoroc
Copy link

sinoroc commented Oct 18, 2022

@AlecThomson Place your data files inside an importable package instead of placing them at the root. This is straightforward.

@AlecThomson
Copy link

Ok @sinoroc. I had actually been attempting to follow some suggestions above in this thread -- symlinking my required files into the package dir e.g.

$ tree Thing
Thing
├── data
└── thing
    ├── data -> ../data
    └── __init__.py

Which was placing data into site-packages/. Moving data directly rectified the issue, however. Symlinks (at least with relative paths) would seem to cause issues.

feluelle added a commit to astronomer/astro-sdk that referenced this issue Jan 10, 2023
After reading python-poetry/poetry#2015, it seems the best solution is to include the "include" directory directly in the project.
feluelle added a commit to astronomer/astro-sdk that referenced this issue Jan 10, 2023
# Description

## What is the current behavior?

See the referenced issue for details.

closes: #1440 

## What is the new behavior?

After reading python-poetry/poetry#2015, it
seems the best solution is to include the "include" directory directly
in the project.

## Does this introduce a breaking change?

No.

### Checklist
- [x] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
utkarsharma2 pushed a commit to astronomer/astro-sdk that referenced this issue Jan 17, 2023
# Description

## What is the current behavior?

See the referenced issue for details.

closes: #1440 

## What is the new behavior?

After reading python-poetry/poetry#2015, it
seems the best solution is to include the "include" directory directly
in the project.

## Does this introduce a breaking change?

No.

### Checklist
- [x] Created tests which fail without the change (if possible)
- [ ] Extended the README / documentation, if necessary
carlio pushed a commit to pylint-dev/pylint-plugin-utils that referenced this issue May 20, 2023
…section, as this had unintended consequences of installing them into the site-packages directory instead of along side the package code. This is related to python-poetry/poetry#2015 . Since the LICENSE is available in the dist-info directory along side the package, then this is sufficient.

Additionally, the python version classifiers will be automatically added by poetry from the version range specification in the tool.poetry.dependencies section, so no need to explicitly define them.
carlio pushed a commit to prospector-dev/requirements-detector that referenced this issue May 20, 2023
delta003 added a commit to spiraldb/ziggy-pydust-template that referenced this issue Sep 11, 2023
every file in your package directory will be included in the package,
unless it is ignored by your .gitignore (see
python-poetry/poetry#2015)

since we gitignore *.so files, we need to explicitly include them when
building wheel
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/build-system Related to PEP 517 packaging (see poetry-core) area/docs Documentation issues/improvements
Projects
None yet
Development

No branches or pull requests