Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New package name fields, take 2 #944

Closed
AMDmi3 opened this issue Nov 6, 2019 · 7 comments
Closed

New package name fields, take 2 #944

AMDmi3 opened this issue Nov 6, 2019 · 7 comments

Comments

@AMDmi3
Copy link
Member

AMDmi3 commented Nov 6, 2019

Follow up to #439. Though keyname helped to solve some tasks, it has proven itself ambiguous, so the next idea is to drop it in favor of different fields with more concrete meaning.

  • trackname - name used to track package between repository updates (Track package lifetimes through keynames #527)
  • refname - name used to reference the package from outside (Implement endpoints based on original package name repology-webapp#66, wikidata bot). There may be multiple refnames, but these should not be mixed up. At least we'll need source reference name and binary reference name, but we may need general reference name if source/binary terms do not apply, or we may need other types. We can start with 3 optional fields (and partial indexes in the DB, so there's much overhead for each new field).

Needless to mention, #931 is required so we can easily route and reroute repository-specific name types into our name classification.

@AMDmi3
Copy link
Member Author

AMDmi3 commented Nov 8, 2019

Also list these fields on repository/fields page in admin mode

@AMDmi3
Copy link
Member Author

AMDmi3 commented Nov 11, 2019

To retain (some of) compatibility and not invent new long names, let's use name, srcname and binname as "refnames". Each of these is optional but at least on of them should be provided (eventually). The goal is to make trackname mandatory too and deprecate basename and keyname.

AMDmi3 added a commit to repology/repology-webapp that referenced this issue Nov 12, 2019
AMDmi3 added a commit that referenced this issue Nov 12, 2019
This includes *bsd, gentoo, npackd, wikidata, libregamewiki and arch

- Switch from name/basename/keyname to name/srcname/binname
- Fix pkglinks accordingly
- Remove 'base' extrafield from Arch, it's no longer needed
AMDmi3 added a commit that referenced this issue Nov 12, 2019
Since we no longer have reliable name, find a best matching package in parsed packages for each sample and compare sample with it
@AMDmi3
Copy link
Member Author

AMDmi3 commented Nov 13, 2019

Ok, it's gradually becoming more consistent. The next questions are:

  • ❓ Should trackname really be mandatory? E.g. could it happen that we don't want to track some repositories due to absence of stable names, or we are always OK acting on a best effort basis here?
    • 💡 Probably yes, at least for now
  • ❓ Should we use name/srcname/binname as 3 fixed fields, or should we go with more flexible pattern, e.g. store custom names in a dict (and maybe use repository-specific names, such as origin/FreeBSD, fmri/OpenIndiana etc.
    • 💡 I lean towards the former (and maybe even shrink this to 2 columns, see below), as using custom names require extra work of discovering and syncing them (and they may still be ambiguous, think name/pkgname/package name/pname), hinder stable package-specific URLs, prevent uniform handling of repositories and take more space in the DB (as jsonb with extra space taken by ken names).
  • ❓ Which of name and srcname should we use in ambiguous cases, e.g. when it's not clean whether source or binary names are provided by the repository, or there's no difference between source and binary names?
    • It looks like most repositories for now use either name or srcname + binname pattern. So we could only use 2 fields.
    • That's simpler architecturally, but not quite correct from the user standpoint, as src and bin do not apply to all sources (for instance, web sites).
    • However, using name where there ARE packages, but no src/bin dichotomy is not correct either.
    • Not all sources are converted yet, it may turn out the some may need a generic name (in form of some internal ID) in addition to package names (openindiana? nix?) - let's finish that and think this over again.

AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Nov 13, 2019
AMDmi3 added a commit that referenced this issue Jan 21, 2020
While here, switch from shlex to re for severe improvement in parsing
speed and switch to PackageMaker context manager
@AMDmi3
Copy link
Member Author

AMDmi3 commented Jan 22, 2020

Mostly done. Remaining thing is to revisit name vs srcname dichotomy. Probably name should remain for non-repositories (news sites), and most repositories which still set name should be switched to srcname unless their naming schema is unknown (nix?).

AMDmi3 added a commit that referenced this issue Jan 22, 2020
Remove all mentions of keyname and basename, make trackname mandatory
@AMDmi3
Copy link
Member Author

AMDmi3 commented Sep 19, 2022

Getting back to it, in order to finalize repology/repology-webapp#66. I've come with the following policy:

  • name is to be removed
  • either binname or srcname is mandatory
  • srcname is what it used to identify a package source, e.g. an entity editable by human. It may be a part of a path in a repository (e.g. games/0ad for gentoo and BSDs), a significant part of an URL (e.g. Q161234 for wikidata and 0_A.D. for wiki) and obviously source package name when that's clearly defined.
    • That given, there's no more problem with applying it to news sites and module collections
    • Some repos may not have a source name, such as when a repository is filled with prebuilt binary packages (chocolatey may be an example; I'm not 100% sure, however, we can start with binary name and add (identical if it makes sence) srcname later)
    • There are stull some unclear cases such as nix, which may need e.g. alternative source package name. I'll revisit it in the end.
  • binname is what is used when user installs a binary package (however note that some package managers allow to refer by source name as well, such as FreeBSD; this does not cause a contraditions).
    • Obviously, news sites and wikidata won't have a binname
    • Not completely sure for module collections. PyPI is likely fine with srcname=binname = module name, but I don't have much knowledge with other (haskage, rubygems) so I'll probably stay on the safe side and start with srcname only.
    • binnames may still be used instead of binname for some repos (e.g. Debian); in fact it would make sense to always use binnames instead of binname, but I'll leave it for later

AMDmi3 added a commit that referenced this issue Sep 21, 2022
Introduce a new type of generic mapping for both src and bin names
and add a warning when using generic name
AMDmi3 added a commit that referenced this issue Sep 21, 2022
Most set srcname, pypi and cran also set binname to the same value,
as these repositories are known to distribute binary artifacts.
AMDmi3 added a commit that referenced this issue Sep 21, 2022
AMDmi3 added a commit that referenced this issue Sep 21, 2022
AMDmi3 added a commit that referenced this issue Sep 21, 2022
AMDmi3 added a commit that referenced this issue Sep 21, 2022
AMDmi3 added a commit that referenced this issue Sep 26, 2022
- File names are not unique, so use full path as trackname and
  binname
- Extensions seem to matter, so include these in visiblename
- Split all (e.g. .tar.gz) extensions (as opposed to the last one)
  from the file name for the purpose of project name seed
AMDmi3 added a commit that referenced this issue Sep 26, 2022
AMDmi3 added a commit that referenced this issue Sep 26, 2022
AMDmi3 added a commit that referenced this issue Sep 26, 2022
AMDmi3 added a commit that referenced this issue Sep 26, 2022
AMDmi3 added a commit that referenced this issue Sep 26, 2022
AMDmi3 added a commit that referenced this issue Sep 26, 2022
@AMDmi3
Copy link
Member Author

AMDmi3 commented Mar 29, 2023

Remaining repositories (which are non-trivial to classify) are:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant