Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library Usage Comes With Almost 1 GiB of Dependencies #213

Closed
janhicken opened this issue May 21, 2024 · 5 comments
Closed

Library Usage Comes With Almost 1 GiB of Dependencies #213

janhicken opened this issue May 21, 2024 · 5 comments

Comments

@janhicken
Copy link
Contributor

When adding the datacontract-cli package as a dependency to a Python project, a lot of transitive dependencies get added. After adding the dependency, my application's Docker image grew from 330 MiB to 1.2 GiB in size.

My application only uses SodaCL in conjunction with a PostgreSQL database, however other frameworks like pyspark (340 MB), pyarrow (123 MB) and deltalake (75 MB) are integrated as well.

Would it be possible to split the packages per target technology like Soda does it? Instead, maybe Extras can be used for this as well.

@RobertLD
Copy link
Contributor

@simonharrer I think it's likely worth splitting a lot of these packages into optional imports aka extras?

@jochenchrist
Copy link
Contributor

I think this is a fair point now, and we should add extras.

@RobertLD
Copy link
Contributor

I think this is a fair point now, and we should add extras.

Drafting out the changes here #234

RobertLD pushed a commit to RobertLD/datacontract-cli that referenced this issue May 31, 2024
…ble by technology type

- Add an all option to dependencies
RobertLD pushed a commit to RobertLD/datacontract-cli that referenced this issue May 31, 2024
jochenchrist added a commit that referenced this issue Jun 4, 2024
* #213 Split optional deps into extras to shrink dep tree

* #213 Further split optional dependencies as much as possible by technology type

- Add an all option to dependencies

* Correct order of optional dependencies

* #213 Update readme explaining all optional extras incl. all

* Make some imports local

* Update Dockerfile to use all extras

* Update changelog

---------

Co-authored-by: Robert DeRienzo <RDeRienzo@voloridge.com>
Co-authored-by: jochen <jochen.christ@innoq.com>
RobertLD pushed a commit to RobertLD/datacontract-cli that referenced this issue Jun 4, 2024
- This ought to remove some larger dependencies
- Add additional extra to readme file
@RobertLD
Copy link
Contributor

RobertLD commented Jun 4, 2024

@jochenchrist follow-up on this. Moving deltalake into an extra cut out 200MB of those larger deps I mentioned in the previous PR. I posted a PR here #240 #242

(I remade the PR because rebasing is hard haha)

I think 1.5Gb -> 300mb ought to be enough to close this issue out

RobertLD pushed a commit to RobertLD/datacontract-cli that referenced this issue Jun 4, 2024
RobertLD pushed a commit to RobertLD/datacontract-cli that referenced this issue Jun 4, 2024
RobertLD pushed a commit to RobertLD/datacontract-cli that referenced this issue Jun 4, 2024
jochenchrist pushed a commit that referenced this issue Jun 5, 2024
Co-authored-by: Robert DeRienzo <RDeRienzo@voloridge.com>
@jochenchrist
Copy link
Contributor

I'd agree, with 1/5 of the dependency size, the library is much more efficient now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants