Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libgit2: decommission unmanaged transport #819

Merged
merged 5 commits into from
Jul 20, 2022

Conversation

aryan9600
Copy link
Member

@aryan9600 aryan9600 commented Jul 7, 2022

Decommission libgit2 unmanaged transport and remove the related feature
gate, making managed transport the only available option.

Update initRepoWithRemote() so that it overwrites the remote url with
the provided url if the remote already exists, instead of erroring out.

Attach context to http requests to enforce timeouts.

Relates to: #782
Fixes: #697 (since the flaky test was a part of the unmanaged transport)

Signed-off-by: Sanskar Jaiswal jaiswalsanskar078@gmail.com

Comment on lines -33 to -40

// GitManagedTransport implements a managed transport for GitRepository
// objects that use the libgit2 implementation.
//
// When enabled, improves the reliability of libgit2 reconciliations,
// by enforcing timeouts and ensuring libgit2 cannot hijack the process
// and hang it indefinitely.
GitManagedTransport = "GitManagedTransport"
Copy link
Member

@pjbgf pjbgf Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of phasing out feature gates we probably have two options:

  1. Completely remove the option (as per the current change).
  2. Warn an option is no longer supported and remove it a few minors down the line.

The former is the easiest, but it will cause folks upgrading with said option to fail indefinitely until they change their setup. The latter is more work, and would be needed for post GA.

Given how short-lived this feature gate was, and that we are pre-GA. I would favour the former approach, with the caveat that we should add a note on the release.

main.go Outdated Show resolved Hide resolved
@pjbgf
Copy link
Member

pjbgf commented Jul 7, 2022

As part of this, we probably want to rename the tests with ManagedTransport_ as that is now the only option. And also ensure that no tests calls to InitManagedTransport(), but instead that takes place at test setup (or we just have that in init() instead).

pkg/git/libgit2/checkout.go Outdated Show resolved Hide resolved
pkg/git/libgit2/checkout.go Outdated Show resolved Hide resolved
pkg/git/libgit2/checkout_ssh_test.go Show resolved Hide resolved
pkg/git/libgit2/checkout_test.go Outdated Show resolved Hide resolved
pkg/git/libgit2/managed/http.go Show resolved Hide resolved
@darkowlzz darkowlzz added the area/git Git related issues and pull requests label Jul 7, 2022
@aryan9600
Copy link
Member Author

aryan9600 commented Jul 8, 2022

This is what a GitRepository object's conditions looks like if the transports aren't enabled (manually tested by removing the call to managed.InitManagedTransport() in main.go):

Status:
  Conditions:
    Last Transition Time:  2022-07-08T05:32:43Z
    Message:               libgit2 managed transport not initialized
    Observed Generation:   1
    Reason:                Libgit2TransportNotEnabled
    Status:                True
    Type:                  Stalled
    Last Transition Time:  2022-07-08T05:32:43Z
    Message:               libgit2 managed transport not initialized
    Observed Generation:   1
    Reason:                Libgit2TransportNotEnabled
    Status:                False
    Type:                  Ready
  Observed Generation:     1
Events:
  Type     Reason                      Age   From               Message
  ----     ------                      ----  ----               -------
  Warning  Libgit2TransportNotEnabled  56m   source-controller  libgit2 managed transport not initialized

Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @aryan9600 🙇

Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @aryan9600 🙇

@darkowlzz
Copy link
Contributor

We need to update the spec docs where we mention about the managed transport feature flag - Managed transport for libgit2 Git implementation .

@aryan9600 aryan9600 force-pushed the remove-unmanaged-transport branch 2 times, most recently from 782b0d1 to b6b0886 Compare July 8, 2022 09:35
// and transport level code.
// Performing all fetch operations with the TransportOptionsURL as the URL, lets the managed
// transport action use it to fetch the registered transport options which contains the
// _actual_ target URL and the correct credentials to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description seems to be for what's happening in registerManagedTransportOptions(). It may be better to move it there.

@darkowlzz
Copy link
Contributor

I too did some manual testing.
I was interested in the case where SC restarts with existing GitRepos, and the transport initialization fails and what happens after another restart when it starts working again.

With an existing working GitRepo, when I restart with failing transport, the status shows:

  conditions:
  - lastTransitionTime: "2022-07-08T12:29:43Z"
    message: libgit2 managed transport not initialized
    observedGeneration: 2
    reason: Libgit2TransportNotEnabled
    status: "True"
    type: Stalled
  - lastTransitionTime: "2022-07-08T12:29:43Z"
    message: libgit2 managed transport not initialized
    observedGeneration: 2
    reason: Libgit2TransportNotEnabled
    status: "False"
    type: Ready
  - lastTransitionTime: "2022-07-08T12:16:43Z"
    message: stored artifact for revision 'master/b76b1a38c9c03c763dc56e996daf216606f4c088'
    observedGeneration: 2
    reason: Succeeded
    status: "True"
    type: ArtifactInStorage
  contentConfigChecksum: sha256:fcbcf165908dd18a9e49f7ff27810176db8e9f63b4352213741664245224f8aa
  observedGeneration: 2
  url: http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/podinfo/latest.tar.gz

After restarting with working transport, the status shows:

  conditions:
  - lastTransitionTime: "2022-07-08T12:31:15Z"
    message: stored artifact for revision 'master/b76b1a38c9c03c763dc56e996daf216606f4c088'
    observedGeneration: 2
    reason: Succeeded
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-07-08T12:16:43Z"
    message: stored artifact for revision 'master/b76b1a38c9c03c763dc56e996daf216606f4c088'
    observedGeneration: 2
    reason: Succeeded
    status: "True"
    type: ArtifactInStorage
  contentConfigChecksum: sha256:fcbcf165908dd18a9e49f7ff27810176db8e9f63b4352213741664245224f8aa
  observedGeneration: 2
  url: http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/podinfo/latest.tar.gz

So, the conditions are handled well. But the artifact URL is advertised even when there's nothing in the storage in stalled state. This results in a kustomization that uses this GitRepo to fail with 404 status:

{"level":"error","ts":"2022-07-08T12:34:31.572Z","logger":"controller.kustomization","msg":"Reconciliation failed after 7.484495ms, next try in 5m0s","reconciler group":"kustomize.toolkit.fluxcd.io","reconciler kind":"Kustomization","name":"podinfo","namespace":"flux-system","revision":"master/b76b1a38c9c03c763dc56e996daf216606f4c088","error":"failed to download artifact from http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/podinfo/b76b1a38c9c03c763dc56e996daf216606f4c088.tar.gz, status: 404 Not Found"}

Looks like we'll have to run reconcileStorage() before marking the object as stalled. So, maybe we should move the managed transport enabled check in reconcileSource(), that would ensure we don't advertise a non-existing artifact.

@aryan9600
Copy link
Member Author

aryan9600 commented Jul 9, 2022

Moving the managed transport enabled check to reconcileSource() addresses the issue posted above by @darkowlzz

GitRepo conditions when managed transport enabled check passes:

Status:
  Conditions:
    Last Transition Time:   2022-07-09T07:28:55Z
    Message:                stored artifact for revision 'master/6555980e9667cd46cad2b99b22d854d4fa0b087a'
    Observed Generation:    1
    Reason:                 Succeeded
    Status:                 True
    Type:                   Ready
    Last Transition Time:   2022-07-09T07:28:55Z
    Message:                stored artifact for revision 'master/6555980e9667cd46cad2b99b22d854d4fa0b087a'
    Observed Generation:    1
    Reason:                 Succeeded
    Status:                 True
    Type:                   ArtifactInStorage
  Content Config Checksum:  sha256:fcbcf165908dd18a9e49f7ff27810176db8e9f63b4352213741664245224f8aa
  Observed Generation:      1
  URL:                      http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/bb/latest.tar.gz

GitRepo conditions after the controller restarts and the check isn't satisfied:

Status:
  Conditions:
    Last Transition Time:   2022-07-09T07:30:10Z
    Message:                libgit2 managed transport not initialized
    Observed Generation:    1
    Reason:                 Libgit2TransportNotEnabled
    Status:                 True
    Type:                   Stalled
    Last Transition Time:   2022-07-09T07:30:10Z
    Message:                libgit2 managed transport not initialized
    Observed Generation:    1
    Reason:                 Libgit2TransportNotEnabled
    Status:                 False
    Type:                   Ready

@pjbgf
Copy link
Member

pjbgf commented Jul 14, 2022

As part of the decommissioning, we should move into using the base imagegolang-with-libgit2-only and remove the references to libssh2 and openssl from Dockerfile and Makefile.

@aryan9600
Copy link
Member Author

aryan9600 commented Jul 14, 2022

As part of the decommissioning, we should move into using the base imagegolang-with-libgit2-only and remove the references to libssh2 and openssl from Dockerfile and Makefile.

This change will be made in a follow up PR, as this PR is already targeting quite a few things now.

@stefanprodan
Copy link
Member

Please hold merging until we release 0.25.10, this PR should be included in SC 0.26.0

@darkowlzz darkowlzz added the hold Issues and pull requests put on hold label Jul 14, 2022
@pjbgf pjbgf added this to the GA milestone Jul 14, 2022
@darkowlzz darkowlzz removed the hold Issues and pull requests put on hold label Jul 20, 2022
Copy link
Contributor

@darkowlzz darkowlzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks a lot.

@darkowlzz
Copy link
Contributor

@aryan9600 can you rebase please?
Hopefully that'll fix the CI failures and then we can merge this.

Decommission libgit2 unmanaged transport and remove the related feature
gate, making managed transport the default.

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Update initRepoWithRemote() so that it overwrites the remote url with
the provided url if the remote already exists, instead of erroring out.

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Attach context to HTTP requests at the transport level to honour
timeouts.

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
…on knowledge

Signed-off-by: Sanskar Jaiswal <jaiswalsanskar078@gmail.com>
@darkowlzz darkowlzz merged commit a6072c3 into fluxcd:main Jul 20, 2022
@pjbgf pjbgf mentioned this pull request Aug 16, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/git Git related issues and pull requests
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Fix test x509Callback/Valid_certificate_authority_bundle before upgrade to go 1.18
5 participants