Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New source: SageMath distribution #1118

Closed
mkoeppe opened this issue Jan 31, 2021 · 18 comments
Closed

New source: SageMath distribution #1118

mkoeppe opened this issue Jan 31, 2021 · 18 comments

Comments

@mkoeppe
Copy link
Contributor

mkoeppe commented Jan 31, 2021

Would there be interest in adding the SageMath distribution? https://www.sagemath.org/

About 300 packages, see https://github.com/sagemath/sage/tree/develop/build/pkgs
Continuously maintained for about 15 years. Releases 2-3 times a year. "develop" branch updates every 1-2 weeks.

Plain text metadata, trivial to parse. (Also, a python library that parses it is available at https://github.com/sagemath/sage/tree/develop/build/sage_bootstrap)

We are adding a mapping to repology's package names in plain text files named distros/repology.txt in https://trac.sagemath.org/ticket/31114

@AMDmi3
Copy link
Member

AMDmi3 commented Jan 31, 2021

Would there be interest in adding the SageMath distribution?

Sure, but unfortunately it currently doesn't look very usable.

  • Versions contain custom suffixes, such as .p0. I assume these are package revisions and maybe we can strip these as long as these do not intersect with upstream versions in the same format.
  • There's no way to parse either homepage or download URLs
    • The former reside in .rst files in unstructured format, are optional and may be mixed with unrelated URLs
    • The latter, as far as I can see, can be obtained from some checksum.ini files which contain upstream_url, but too little packages have these (which makes my curious - you do have information on checksums, but don't always have information on how to retrieve tarballs, it this so?)
  • There's no way to distinguish python modules from other software and map them to python:* names
  • Dependencies contain variable expansions (this is not critical as repology doesn't use depends yet, but this is planned)

We are adding a mapping to repology's package names in plain text files

I'd very much prefer for avoid this. It introduces unwanted feedback loop and makes name normalization rules behave in unexpected ways which is hard to fix. Also it makes Repology dependent on the repository - I'll have less freedom in changing naming schemes and have to synchronize massive changes in repology and sagemath, which is impossible to do in a clean way, and would pollute histories/feeds.

In fact, naming discrepancies are not a problem at all - there are not too many packages, most of them will match names with repology projects, and for others rules can be introduced on repology side as usual. It's even positive as these rules may be useful for other repositories. Distinguishing (python and other) modules is critical though, but if you're willing to introduce repology project names to your repository I don't think it would be too hard to introduce a flag which conveys this info instead. In fact, we can go on with introducing repology names, but I'll only use the fact that "python:" (or other) prefix is present.

Summarizing show stoppers and possible ways to fix:

  • Versioning: OK if genuine upstream versions can be obtained by simply stripping revisions
  • Naming: OK if we can distinguish python modules (and other modules if there are any)
  • URLs: OK if we can reliably parse either upstream homepages or downloads

I'll write an experimental parser for SageMath for the time being.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jan 31, 2021

Thanks for the quick reaction!

  • Versions contain custom suffixes, such as .p0. I assume these are package revisions and maybe we can strip these as long as these do not intersect with upstream versions in the same format.

That's right.

  • There's no way to parse either homepage or download URLs
    • The former reside in .rst files in unstructured format, are optional and may be mixed with unrelated URLs

We are generating pages for each of the package at:
https://doc.sagemath.org/html/en/reference/spkg/PACKAGENAME.html
for example
https://doc.sagemath.org/html/en/reference/spkg/jupyter_client.html
In the next release the contents of these pages will be enriched.
These are intended as package homepages.

Sage hosts the package tarballs at http://files.sagemath.org/spkg/upstream/ (and its mirrors). This uses the tarball field of checksums.ini.

  • The latter, as far as I can see, can be obtained from some checksum.ini files which contain upstream_url, but too little packages have these (which makes my curious - you do have information on checksums, but don't always have information on how to retrieve tarballs, it this so?)

That's right, we are adding upstream_url whenever we upgrade a package - this field is new (from 2020) and not all packages have picked it up yet. This is for use by SageMath distribution developers, not users.

  • There's no way to distinguish python modules from other software and map them to python:* names

Python packages can be recognized by the file install-requires.txt, which contains the pypi name (with possible version constraints)

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jan 31, 2021

We are adding a mapping to repology's package names in plain text files

I'd very much prefer for avoid this.

OK, let's ignore this at least for now.

@AMDmi3
Copy link
Member

AMDmi3 commented Feb 1, 2021

We are generating pages for each of the package at

This is useful for another purpose, but Repology needs parsable upstream URLs to match related projects under different names and to split unrelated packages under the same name.

That's right, we are adding upstream_url whenever we upgrade a package - this field is new (from 2020) and not all packages have picked it up yet. This is for use by SageMath distribution developers, not users.

If there's an ongoing trend on adding these and these can be added proactively in certain cases it should be fine. For instance, these packages in sagemath need some kind of URL to be classified properly:

  • atlas
  • bliss
  • ecm
  • gambit
  • mpc
  • rw
  • surf

Python packages can be recognized by the file install-requires.txt, which contains the pypi name (with possible version constraints)

This doesn't seem to be true in all cases. For instance, these are python modules but lack install-requires.txt

  • wheel
  • texttable

In general the repository is parsed fine and produces tolerable amount of incorrect versions and unmatched packages so if these 9 cases are improved I can add it right away.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 1, 2021

Great! I'll work on this in https://trac.sagemath.org/ticket/31321

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 1, 2021

And https://trac.sagemath.org/ticket/29152 adds the missing upstream_url fields for rw, cliquer, meataxe in the course of package updates

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 1, 2021

https://trac.sagemath.org/ticket/30350 will remove atlas

@AMDmi3
Copy link
Member

AMDmi3 commented Mar 14, 2021

Timeout. Please reopen when named issues are resolved.

@AMDmi3 AMDmi3 closed this as completed Mar 14, 2021
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 14, 2021

And https://trac.sagemath.org/ticket/29152 adds the missing upstream_url fields for rw, cliquer, meataxe in the course of package updates

Quick update: This ticket has been merged in the latest beta (= HEAD of the develop branch).

I'll reopen when also https://trac.sagemath.org/ticket/31321 is merged.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 14, 2021

I'll work on this in https://trac.sagemath.org/ticket/31321

This ticket is now merged in the new beta just released (= HEAD of the develop branch).

Might be worth taking another look.

@AMDmi3 AMDmi3 reopened this Mar 15, 2021
@AMDmi3
Copy link
Member

AMDmi3 commented Mar 15, 2021

It looks mostly good now, so I'm deploying it today.

@AMDmi3 AMDmi3 closed this as completed in 0294729 Mar 15, 2021
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 15, 2021

Great, thanks a lot!

AMDmi3 added a commit to repology/repology-webapp that referenced this issue Mar 15, 2021
@AMDmi3
Copy link
Member

AMDmi3 commented Mar 15, 2021

Done. You might consider adding homepage/download information for remaining unclassified packages:

https://repology.org/projects/?search=-unclassified&maintainer=&category=&inrepo=sagemath&notinrepo=&repos=&families=1&repos_newest=&families_newest=

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Mar 15, 2021

Awesome. Yes, will do (most likely in the course of package upgrades).

@mkoeppe
Copy link
Contributor Author

mkoeppe commented May 10, 2021

We now have the first stable version of the SageMath distribution (on the master branch) that includes the above fixes. Would it be possible for repology to distinguish between the SageMath stable repository (master branch) and development repository (develop` branch) from now on?

@AMDmi3
Copy link
Member

AMDmi3 commented May 12, 2021

Done.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented May 12, 2021

Thank you!

Note the "News highlights" link to SageMath is now broken

@AMDmi3
Copy link
Member

AMDmi3 commented May 13, 2021

Nice catch, fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants