Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) #608

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

dgkf
Copy link
Contributor

@dgkf dgkf commented May 1, 2021

As described in #604, the current gitlab_remote makes use of API endpoints that are not available to tokens generated for use within gitlab CI (stored in the $CI_JOB_TOKEN env var), throwing errors when these tokens are used.

This PR adds code to first ping the API at a generic endpoint (querying for /version). If that request fails and isTRUE(getOption("remotes.gitlab_git_fallback", TRUE)), a git_remote is returned.

If git2r is available, a credentials object is created from the auth_token. Otherwise, the token is embedded in the url in the form of http://gitlab-ci-token:$TOKEN@example.com/namespace/project.git.

This allows install_gitlab to be used within CI jobs on non-public deployments of GitLab without the creation and embedding of personal tokens. Pipeline engineers need only to run export GITLAB_PAT=$CI_JOB_TOKEN prior to installing remotes.

Changelog

  • install_gitlab will defer to using install_git when authentication doesn't provide adequate API access to download a source archive
    • will create a git2r::cred_user_pass if git2r is available
      • Username is set to gitlab-ci-token. When providing a PAT, GitLab ignores the username unless one is using a CI_JOB_TOKEN token within a CI job, in which case it must be gitlab-ci-token. Because of this, it covers both scenarios to pass gitlab-ci-token in both cases. Unfortunately I wasn't able to find any documentation to reference for this behavior, it was only narrowed down through testing.
    • otherwise, will embed authentication in the git url (http://<username>:<password>@host.com/repo.git)
  • Because urls may contain access tokens, I wrote some handlers to strip these from messages and from the "Remote*" fields in DESCRIPTION (Design Feedback Request: is it preferred to keep the full url for updates, or to exclude the password so that it isn't leaked through things like renv?)
    • git() updated to take an optional display_args command to provide output using censored git url as to not display access tokens in console output. This is used in remote_download.xgit_remote to display git commands without printing passwords to console.
    • parse_git_url() updated to also extract a username and password, though it might be worth taking on a dependency to handle url parsing since this is regex is getting pretty involved
    • git_anon_url() introduced to strip out username and password components from a url
    • git_censored_url() introduced to replace the password component with asterisks
  • Added a bunch of tests for url parsing, anonymization and censoring
  • Added some tests for falling back to a git_remote when the GitLab host API requests fail
  • Added git_fallback = getOption("remotes.gitlab_git_fallback", TRUE) parameter to install_gitlab
  • Updated NEWS to describe new behavior

This is an initial pass just to experiment with implementation. Please let me know if this looks like a reasonable approach, and then I can polish this PR with

  • Documenting new behaviors
  • Add tests
  • Get tests working in CI

@jimhester
Copy link
Member

jimhester commented Jun 2, 2021

My main worry with making the GitLab remote more complex is our team doesn't use GitLab, so it is possible this will break in the future without us realizing it.

We would definitely need some tests to avoid this.

@dgkf
Copy link
Contributor Author

dgkf commented Jun 3, 2021

Thanks @jimhester - I'm happy to add in tests as much as possible. If you think the implementation looks sound, then I can get to work on tests and updating docs. I was just hesitant to invest more time fleshing out the peripheral bits until getting some impressions on the approach.

@dgkf dgkf changed the title Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) WIP: Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) Jun 25, 2021
@dgkf dgkf changed the title WIP: Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) Adding git_remote fallback for gitlab_remote use without full API access (Resolves #604) Jul 9, 2021
@dgkf
Copy link
Contributor Author

dgkf commented Jul 30, 2021

@jimhester - this PR is ready whenever you have an opportunity to take a look. The only CI errors are ones that also exist on master. Overall, the design feels a bit clunky, but I'm struggling to come up with anything better.

To trace through the changes, it is easiest to start with install_gitlab's call out to gitlab_to_git_remote, and then look at the uses of $url in install_git.R as the url may contain a url-embedded username and token.

Just to highlight a critical design choice:

Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:12345@gitlab.com/...) when storing remote_metadata or printing to console?

  • the good: this would allow remotes::update_packages to update a package which requires authentication
  • the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file.

For now, I chose to scrub the username and password from the url before this is added to a DESCRIPTION file to prioritize safety of access tokens over the update experience.


The self-hosted GitLab issues are currently a big pain point at my org, so some help in moving this forward would be greatly appreciated.

@statnmap
Copy link
Contributor

statnmap commented Sep 23, 2021

Thank you for this PR. This is a must when working on private GitLab instances.
I approve its improvements.

I tried it in a CI instance with the following classical use cases I guess. The PR solves the problems encountered with current version of {remotes}. This can be accepted as is.

Use CI_JOB_TOKEN set up with {git2r}

Clone and install_local()

  • current {remotes} OK
  • PR OK
tempclone <- tempfile(pattern = "conjdown")
dir.create(tempclone)

git2r::clone(url = "https://git.lab.sspcloud.fr/propre-conj/conjdown",
             local_path = tempclone,
             credentials = git2r::cred_user_pass(username = "gitlab-ci-token",
                                                 password = Sys.getenv("CI_JOB_TOKEN"))
)

remotes::install_local(tempclone)

install_git()

  • current {remotes} FAIL
  • PR OK
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
  remotes::install_git("https://myprivategitlab.com/user/repos")

install_gitlab()

  • current {remotes} FAIL
  • PR OK with message
  remotes::install_gitlab(host = "https://myprivategitlab.com",
                          repo = "user/repos",
                          auth_token = Sys.getenv("CI_JOB_TOKEN"))

message:

auth_token does not have scopes 'read-repository' and 'api' for host
  'https://myprivategitlab.com" required to install using
  gitlab_remote.
Attempting git_remote

install from another package DESCRIPTION file with git2r creds and git::

  • current {remotes} FAIL
  • PR OK

DESCRIPTION file

Imports: 
    repos
Remotes:
    git::https://myprivategitlab.com/user/repos"
options(remotes.git_credentials = git2r::cred_user_pass("gitlab-ci-token", Sys.getenv("CI_JOB_TOKEN")))
remotes::install_deps(dependencies = TRUE)

Set GITLAB_PAT

install_gitlab()

  • current {remotes} FAIL
  • PR OK with message
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
  remotes::install_gitlab(host = "https://myprivategitlab.com",
                          repo = "user/repos")

message:

Using GitLab PAT from envvar GITLAB_PAT
auth_token does not have scopes 'read-repository' and 'api' for host
  'https://myprivategitlab.com" required to install using
  gitlab_remote.
Attempting git_remote

install from another package DESCRIPTION file with GITLAB_PAT and gitlab::

This is a try. I know this is not the aim of this PR, but that could be a future enhancement, maybe.

  • current {remotes} FAIL
  • PR FAIL.

DESCRIPTION file

Imports: 
    repos
Remotes:
    gitlab::https://myprivategitlab.com/user/repos"
Sys.setenv(GITLAB_PAT = Sys.getenv("CI_JOB_TOKEN"))
remotes::install_deps(dependencies = TRUE)

Error

Error: Unknown remote type: gitlab
  Invalid git repo specification: 'https://myprivategitlab.com/user/repos'
Execution halted

@statnmap
Copy link
Contributor

Do you think that it could be a good idea to allow gitlab_pat() to also look for CI_JOB_TOKEN environment variable if GITLAB_PAT is empty ? This may solve a lot pain using CI.

@dgkf
Copy link
Contributor Author

dgkf commented Sep 27, 2021

Thanks for considering this PR, @jimhester.

Just wanted to highlight this bit in the PR thread for your consideration. I tried my best to dig into how remotes/renv use the Remotes* fields in the DESCRIPTION file, but wasn't totally sure what the preferred solution would be for access tokens in urls and want to make sure it was brought to your attention in case there are any security concerns with how it's handled.

Design Feedback Request: is it preferred to keep the full url including username and password (https://dgkf:12345@gitlab.com/...) when storing remote_metadata or printing to console?

  • the good: this would allow remotes::update_packages to update a package which requires authentication
  • the bad: this might put a user at risk of leaking an access token through things like execution logs if printed to console or an renv lockfile if included in a DESCRIPTION file

Currently user-specific url components are stripped to minimize any printing/saving of tokens.

@jimhester
Copy link
Member

I guess we should strip them, though it would then break updating packages later.

However if you still set the GITLAB_PAT when you run update_packages() would the update work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants