Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go Modules version and dependency support! #2528

Merged
merged 11 commits into from
Jun 17, 2020
Merged

Conversation

tiegz
Copy link
Contributor

@tiegz tiegz commented Jun 5, 2020

This adds support for "Releases" and "Dependencies" for Go Modules, which most Go projects are using now. It switches us from "go-search.org" (which doesn't work anymore) to "pkg.go.dev" (the new Go Module directory) and "proxy.golang.org" (the official Go Module proxy).

Example from k8s:

Screenshot 2020-06-05 13 04 22

@tiegz tiegz requested a review from katzj June 5, 2020 17:31
number: info["Version"],
published_at: info["Time"].presence && Time.parse(info["Time"])
}
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only questionable part -- it's a choice between two options:

  1. this, where we make a superfast request for each version to get it's actual published_at
  2. or, make a single request to the pkg.go.dev page, and get each version with a Date-only-version of its published_at

🤷‍♂️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach here of doing a request per version. This is a background job and so would rather get the full and correct info

Copy link
Contributor Author

@tiegz tiegz Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@katzj I just switched this to use the pkg.go.dev scraping instead, because I noticed pkg.go.dev will index packages that aren't technically Go Modules (i.e. don't have go.mod), e.g. https://pkg.go.dev/github.com/BurntSushi/toml , which works, whereas the Go proxy url fails 😕 :

 $ curl http://proxy.golang.org/github.com/BurntSushi/toml/@v/list
bad request: invalid escaped module path "github.com/BurntSushi/toml"

Seem ok?

@tiegz tiegz requested a review from jsonperl June 5, 2020 19:07
Use chronic library to parse version timestamps like "1 day ago"
Copy link
Member

@katzj katzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we want to be re-looking at all go packages after this goes in to make the data more current/correct? I could even see trying to clean up some of the old stuff

end

def self.install_instructions(project, version = nil)
"go get #{project.name}"
end

def self.project_names
get("http://go-search.org/api?action=packages")
# Currently the index only shows the last <=2000 modules from the date given. (https://proxy.golang.org/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About how many a day are there usually?

Copy link
Contributor Author

@tiegz tiegz Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh, just realized this is actually package-specific (not module-specific), and it's version-specific too. So this list is gonna change [very frequently]. Left a note about that in the Go issue above tho.

end

def self.project(name)
get("http://go-search.org/api?action=package&id=#{name}")
if pkg_html = get_html("https://pkg.go.dev/#{name}?tab=doc")
go_module, _latest_version = pkg_html.css('a[data-test-id="DetailsHeader-infoLabelModule"]').first&.text&.split('@', 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fragile... how often is name != go_module? Because then we could just use the version list from the proxy and figure out latest version ourselves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently indexing both packages+modules, so name could be either the package or the parent module.

This part's not to get the latest_version tho, it's just to strip the latest_version from this module name. I might be wrong and it's not there, so I'll circle back in a bit and see if the splitting is necessary.

# NB this requires a quick request for each version, but we could alternatively fetch from
# https://pkg.go.dev/mod/#{go_module}?tab=versions and the '.Versions-commitTime', selector,
# but the dates are a combination of date-only or natural language (e.g. '1 day ago')
versions = get_raw("http://proxy.golang.org/#{go_module}/@v/list")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
versions = get_raw("http://proxy.golang.org/#{go_module}/@v/list")
versions = get_raw("https://proxy.golang.org/#{go_module}/@v/list")

number: info["Version"],
published_at: info["Time"].presence && Time.parse(info["Time"])
}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach here of doing a request per version. This is a background job and so would rather get the full and correct info

app/models/package_manager/go.rb Show resolved Hide resolved
@tiegz tiegz changed the title Go Modules version support! Go Modules version and dependency support! Jun 6, 2020
@tiegz
Copy link
Contributor Author

tiegz commented Jun 6, 2020

Will we want to be re-looking at all go packages after this goes in to make the data more current/correct? I could even see trying to clean up some of the old stuff

Def! I'm afraid of overwhelming pkg.go.dev tho -- maybe something like this?

Project.where(platform: "Go").find_each.with_index do |p, i|
  PackageManagerDownloadWorker.perform_at((i * 1.second).from_now, platform, name)
end

With 1.8 million Go packages, that'd take about 21 days.

The slowest thing in an update() I think is fetching each go.mod file for each version, about 2 sec per request. As that go Issue above says, we can fetch deps from pkg.go.dev, but they're only for the latest version :\

@tiegz tiegz requested a review from katzj June 9, 2020 14:26
Copy link
Member

@katzj katzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love all the html parsing as changes to their css will obviously break us... but better than where we are now.

We should think about how to detect a change and make it so that we know that things are broken but I'm okay doing that in a follow-up

@tiegz tiegz merged commit 30e0602 into master Jun 17, 2020
@tiegz tiegz deleted the tiegz/go-mod-version-support branch June 17, 2020 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants