Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atlantis unlock fails to delete plan of environment with project name #3845

Closed
the-nando opened this issue Oct 11, 2023 · 27 comments · Fixed by #4192
Closed

Atlantis unlock fails to delete plan of environment with project name #3845

the-nando opened this issue Oct 11, 2023 · 27 comments · Fixed by #4192
Labels
bug Something isn't working regression Bug introduced in a new version

Comments

@the-nando
Copy link

the-nando commented Oct 11, 2023

#3750 changed the behaviour of atlantis unlock which used to trigger a deletion of the whole directory where the PR is checked out in favour of only deleting the plan file.
The delete lock command however doesn't handle configuration with project names configured, which causes issues: https://github.com/runatlantis/atlantis/blob/v0.26.0/server/events/delete_lock_command.go#L60

Reproduction Steps

Atlantis version: v0.26.0

Given an atlantis.yaml:

---
version: 3
projects:
  - name: dev 
    dir: clusters/dev/config
    workflow: dev
    terraform_version: v1.5.7
    autoplan:
      enabled: true
      when_modified:
      - "*.tf*"

When an atlantis unlock is issued:

{"level":"info","ts":"2023-10-11T15:53:45.296Z","caller":"events/unlock_command_runner.go:35","msg":"Unlocking all locks","json":{"repo":"<repo>","pull":"1004"}}
{"level":"info","ts":"2023-10-11T15:53:45.300Z","caller":"events/working_dir.go:418","msg":"Deleting plan: /home/atlantis/.atlantis/repos/<repo>/1004/default/clusters/dev/config/default.tfplan","json":{}}
{"level":"warn","ts":"2023-10-11T15:53:45.300Z","caller":"events/delete_lock_command.go:68","msg":"Failed to delete plan: remove /home/atlantis/.atlantis/repos/<repo>/1004/default/clusters/dev/config/default.tfplan: no such file or directory","json":{},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*DefaultDeleteLockCommand).DeleteLocksByPull\n\t/tmp/atlantis-v0.26.0/server/events/delete_lock_command.go:68\ngh.neting.cc/runatlantis/atlantis/server/events.(*UnlockCommandRunner).Run\n\t/tmp/atlantis-v0.26.0/server/events/unlock_command_runner.go:37\ngh.neting.cc/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\t/tmp/atlantis-v0.26.0/server/events/command_runner.go:328"}
{"level":"error","ts":"2023-10-11T15:53:45.300Z","caller":"events/unlock_command_runner.go:40","msg":"failed to delete locks by pull remove /home/atlantis/.atlantis/repos/<repo>/1004/default/clusters/dev/config/default.tfplan: no such file or directory","json":{"repo":"<repo>","pull":"1004"},"stacktrace":"github.com/runatlantis/atlantis/server/events.(*UnlockCommandRunner).Run\n\t/tmp/atlantis-v0.26.0/server/events/unlock_command_runner.go:40\ngh.neting.cc/runatlantis/atlantis/server/events.(*DefaultCommandRunner).RunCommentCommand\n\t/tmp/atlantis-v0.26.0/server/events/command_runner.go:328"}

The plan file is actually named dev-default.tfplan after fmt.Sprintf("%s-%s.tfplan", projName, workspace).
A workaround is to remove project names from the Atlantis config and rely only on dir names.

@the-nando the-nando added the bug Something isn't working label Oct 11, 2023
@jamengual
Copy link
Contributor

@X-Guardian do you think you could check if this is the case?

@X-Guardian
Copy link
Contributor

Yes, the Atlantis locks controller has no knowledge of projects.

@X-Guardian
Copy link
Contributor

Several tickets have been raised bout this same issue, so here is a clarification of the current situation:

The file name of the terraform plan file produced by Atlantis for each project differs depending upon the following conditions:

  1. If there is no repo level atlantis.yaml file or within the atlantis.yaml file the project that is being planned has no name property, then the plan file will be called <workspace>.tfplan.
  2. If within the atlantis.yaml file the project has a name property, then the plan file will be called <projectName>.<workspace>.tfplan

you can see the code for this here:

// GetPlanFilename returns the filename (not the path) of the generated tf plan
// given a workspace and project name.
func GetPlanFilename(workspace string, projName string) string {
if projName == "" {
return fmt.Sprintf("%s.tfplan", workspace)
}
projName = strings.Replace(projName, "/", planfileSlashReplace, -1)
return fmt.Sprintf("%s-%s.tfplan", projName, workspace)
}

The problem is that the locks controller functions have no knowledge of the project name, so if this is set, the deletion of the plan file will fail in the unlock functions. To fix this would need the locks controller functions plus other higher level functions updating with details of the project including the project name.

I expect there are more bugs within the locks controller when multiple projects are specified for the same directory and different project names.

Easy workaround for now is if you are not using multiple projects for the same directory within your atlantis.yaml, don't specify the name property for the project, as it is not adding any value and will cause this issue.

@jamengual
Copy link
Contributor

@GenPage

@velvetzhero
Copy link

velvetzhero commented Dec 8, 2023

We have multiple projects for the same directories, and we want to use atlantis to lock prs, it's very convenient..
So I've tested around, and while it's true that it prints out a simple Failed to delete PR locks output, and the logs points to the wrong tfplan file ( while it should be for us too: prj-name-workspace.tfplan) :

{"level":"error","ts":"2023-12-08T13:45:05.176Z","caller":"events/unlock_command_runner.go:40","msg":"failed to delete locks by pull remove /atlantis/repos/XXX/ops/1969/staging/terraform/XXX/staging.tfplan │
│ no such file or directory","json":

The lock is removed!
Can you confirm the problem is not effectively about the lock, but the plan being removed ? What are the implication of running a pre_hook to remove the plan it if it exists ? Just not to get the error message I mean.
My second question may sounds a bit naive, but I am looking for a quick workaround as you may have noticed..Would using an extra_args such as -out=<workspace>.tfplan could do it ?

Not a big deal really, just trying my luck frankly..

@aston-r
Copy link

aston-r commented Dec 14, 2023

@X-Guardian

Easy workaround for now is if you are not using multiple projects for the same directory within your atlantis.yaml, don't specify the name property for the project, as it is not adding any value and will cause this issue.

Unfortunately this workaround does not suit me. Without project name it is really inconvenient since the path is really long and in gh issue summary(CI block check) contains base path that is common for all all projects + ...As a result I do not know what the project this(Not informative). Also previously i used atlantis plan -p and without project name the command is different and need to pass directory and workspace.

Moreover I liked that Atlantis removed the whole workspace during unlock(when I need to free disk space) and for me it would be better to control via option(like -cleanup-worksapce-on-unclock): #3751

Are you planing to fix this?

@jakubjakubik
Copy link

Easy workaround for now is if you are not using multiple projects for the same directory within your atlantis.yaml, don't specify the name property for the project, as it is not adding any value and will cause this issue.

The workaround doesn't cover my use case as well. Is there any way to get to the previous behaviour?

@jamengual
Copy link
Contributor

For those who have issues with the way that unlock is working now, I will ask that if you have the time and golang knowledge to fix it, please create a PR for it.
#3751 fixed other issues with unlock that people took for granted, but in fact, this behaviour was not correct from the beginning.

We are trying to avoid adding more flags so we could add this to the repo.config if is needed.

@grimm26
Copy link
Contributor

grimm26 commented Jan 9, 2024

For those who have issues with the way that unlock is working now, I will ask that if you have the time and golang knowledge to fix it, please create a PR for it. #3751 fixed other issues with unlock that people took for granted, but in fact, this behaviour was not correct from the beginning.

We are trying to avoid adding more flags so we could add this to the repo.config if is needed.

Sure. Shall I create a revert of #3751?

@X-Guardian
Copy link
Contributor

No

@grimm26
Copy link
Contributor

grimm26 commented Jan 9, 2024

No

OK, meanwhile I will have to rollback to 0.25.0 because I have a pileup of PRs that won't unlock that I keep having to unlock manually.

@albertorm95
Copy link
Contributor

No

OK, meanwhile I will have to rollback to 0.25.0 because I have a pileup of PRs that won't unlock that I keep having to unlock manually.

+1 with this, we use multiple projects in the same directory and its not rmving the plan file, we will need to rollback to 0.25.0 😭

@velvetzhero
Copy link

velvetzhero commented Jan 11, 2024

I expect there are more bugs within the locks controller when multiple projects are specified for the same directory and different project names.

If it was a quick fix, it would have been done already. Better to rollback to 0.25 and wait for someone to pick this up someday...

@grimm26
Copy link
Contributor

grimm26 commented Jan 22, 2024

FYI, I do not use the name parameter and I still hit this issue. Snippet from my root level atlantis.yaml

version: 3
projects:
- dir: align/production/align-home
  terraform_version: v1.3.8
  workflow: terragrunt
  autoplan:
    enabled: true
    when_modified:
    - '*.hcl'
    - '*.tf'
    - atlantis.yaml

@GenPage
Copy link
Member

GenPage commented Jan 26, 2024

Hello everyone, we will revert #3751. I approved the change and actually was worried about this very issue. We have been working to better understand the differences between locks and filesystem operations. I apologize for the disruptions.

The code is spread out across the lock controllers, command controllers, and a special library for file system operation for "WorkingDir". I appreciate everyones patience as we continue to streamline and optimize these operations. It makes more sense to properly support project locks across the codebase before making these changes.

@GenPage
Copy link
Member

GenPage commented Jan 26, 2024

To clarify, this revert may not result in a "full revert" but there will be at a minimum, a revert in functionality. We are still debating options internally on the core team. Most likely solution is a new PR that will keep both behaviors, gated by a configuration flag.

@dimisjim
Copy link
Contributor

dimisjim commented Mar 15, 2024

this issue seems to still be there in latest v0.27.2
When trying "atlantis unlock", I get a reply: "Failed to delete PR locks"

but lock is properly unlocked:
image

My atlantis repo config looks like this:

version: 3
projects:
- name: main
  workflow: main
  dir: .
  autoplan:
    when_modified: ["*.tf", "env-vars/main.tfvars"]

workflows:
  main:
    plan:
      steps:
      - init:
          extra_args: ["-backend-config=backend-configs/main.hcl", "-reconfigure", "-upgrade"]
      - plan:
          extra_args: ["-var-file=env-vars/main.tfvars"]
    apply:
      steps:
      - apply

@jamengual
Copy link
Contributor

jamengual commented Mar 15, 2024 via email

@dimisjim
Copy link
Contributor

@jamengual Indeed, this did not happen in a new PR, ensuring that it was opened after atlantis was upgraded to 0.27.2

Thanks!

@jamengual
Copy link
Contributor

I'm glad to hear is working

@ritaCanavarro
Copy link

Hi @jamengual :)

We just upgraded our Atlantis version to get the fix for this issue and unfortunately it came back after some time, i.e, we tried to run the atlantis unlock on a PR and then we got the Failed to delete PR locks message. Checking our Atlantis logs, we found the following:

failed to delete locks by pull remove /atlantis-data/repos/*****/****/1006/default/staging/*****.tfplan: no such file or directory

Have you seen any error similar to this one since this fix is live?

@dimisjim
Copy link
Contributor

I've also noticed this happening, but only on a closed PR

@ritaCanavarro
Copy link

I've also noticed this happening, but only on a closed PR

Thanks for sharing. In our case, the PR had only been opened the day before :/

@jamengual
Copy link
Contributor

the unlock was run on a project that had a different project/workflow configuration?

@amontalban
Copy link

Just upgraded to 0.27.2 from 0.25.0 today and saw the same Failed to delete PR locks error while they disappeared from Atlantis UI. Unfortunately, I do not have access to the FS because we use EFS and ECS.

@ritaCanavarro
Copy link

the unlock was run on a project that had a different project/workflow configuration?

No, in this case it was on the same project.

@jippi
Copy link
Contributor

jippi commented May 3, 2024

There is a fair chance #4502 will fix this (and a couple of other ensure stuff is deleted issues we've seen)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression Bug introduced in a new version
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.