-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2876] [Feature] add possibility to copy dbt local packages instead of make it symlink #8223
Comments
Thanks for suggesting this idea @iamtodor ! We always want to be careful with Chesterton's fence, and I don't know the original rationale that went into the decision to utilize symlinks for local packages in abdcdef. But I'm guessing that it was to avoid duplicating data. If that were the case, then it seems totally fine to me to replace it with a deep copy instead since it wouldn't take up any more additional space than if it had been installed from a remote source. In fact, doing a deep copy seems like it would just bring local packages in line with the way other install methods are written to the file system so it seems safe rather than risky. I'm wondering if using a deep copy might even be more safe when using I'm going to label this as |
Hello @dbeatty10 Thank you for the reply! I would like to clarify that I do not propose to remove this logic. What I propose is to add the opportunity to control the logic: let's say by default DBT creates symlinks. However, I am able to change this default behavior to install packages with Does it seem reasonable to you? |
#7447 proposes to just try creating a symlink and fall back to the deep copy in the case that it doesn't work (i.e., raises an error because it isn't supported on the underlying system). This seems like it would solve the problem without the user needing to add any configuration. Would that approach work for your use case, @iamtodor? |
Thank you @dbeatty10 for keeping your attention to this issue. The proposed way wouldn't work for us due to the fact we run So for us would be really convenient to have the configuration option that we could tweak |
@gshank may I ask you to re-open the issue? As I have explicitly confirmed that solution coming from #7447 would not fit our needs. |
@gshank @dbeatty10 Our Jenkins CI/CD OS allows us to create the symlink, hence #7447 is not applicable. The issue is while doing copy from the Jenkins machine to GCS. I pointed out these points in my very first message: The reasons for this aim are:
Please, re-open the issue as it still exists. These are 2 very different use cases. Shall you have more concerns or questions please let me know. |
Re-opening per request. Ideally, we'd be able to implement a solution that is both simple for a user and also simple from a code maintenance perspective. To achieve both of those simultaneously, I'm attracted to just removing the symlinks altogether. Doing so would duplicate the code for installed packages, but dbt packages don't tend to take up that much space. |
This is exactly what I had on my mind, but I thought it might be not the very best possible solution, that's why I didn't mention and presented it to you. Usually, we write packages quite slim, hence I do not see any potential pitfalls in this particular implementation. |
@graciegoheen labeling this as
See discussion above for more details and justification. Assuming you will tap in others to weigh in as-needed 🙏 If desired, I can do a more detailed write-up of pros/cons. |
@dbeatty10 I have found this issue a bit by coincidence, but I was terrified by the proposal of removing symlinks for local packages... It's absolutely critical functionality when working with packages in monorepo setup, when you always want to refer to the state of packages at the current commit and not run To resolve the issue and not break any workflows - just introduce the similar option like in *To be more precise - in the Python case it's a bit more sophisticated than relying literally on symlinks: That's also why I think the current fallback approach should be completely removed - there's a significant difference between symlinking and copying files, these two mechanisms simply shouldn't be mixed when running exactly the same command. Here I would again refer to Python editable installs, when you explicitly decide which mechanism you want to use. So to sum up:
|
Is this your first time submitting a feature request?
Describe the feature
Hello team,
This is a feature request that comes from the following issues/discussions:
packages.yml
are converted to absolute rather than relative symlinks #4538In this piece of code https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/deps/local.py#L62
we are evaluating whether symlink can be setuped. In case the OS does not support symlink (i.e. Windows) we do a deep copy.
We would like to be able to do a dbt local packages deep copy on Linux machines, for instance.
The reasons for this aim are:
gsutils rsync
does not support symlinks as well asrsync
does not support symlinksGoogle Cloud Storage
does not support symlinksDescribe alternatives you've considered
As a workaround, I had to write custom Python code, that iterates thru all the packages we have, find the symlink, and make a deep copy of all the content.
Who will this benefit?
In my humble opinion, at least, GCP users would benefit a lot
Are you interested in contributing this feature?
There is a chance, with all the context and pitfalls to be provided
Anything else?
Probably, no. However, if I miss something please feel free to ask all the questions and raise all the concerns you might be having
The text was updated successfully, but these errors were encountered: