Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata file #202

Closed
finestructure opened this issue May 29, 2020 · 18 comments
Closed

Metadata file #202

finestructure opened this issue May 29, 2020 · 18 comments
Labels
enhancement New feature or request epic

Comments

@finestructure
Copy link
Member

finestructure commented May 29, 2020

We've been referring to the need of a metadata file here and there so I thought I'd add a dumping ground for things we'd like it to include

  • author info (list)
@daveverwer
Copy link
Member

Great idea:

  • Supports Linux (Boolean)
  • Home page (URL)
  • Documentation Page (URL)
  • Deprecated (Boolean) (So package authors can officially deprecate a package, without removing it)
  • Deprecated in favour of (URL) if it was superseded or merged with another project.

@finestructure
Copy link
Member Author

  • date/version cutoff for historic indexing (i.e. ability to say: don't index older than 2018 or versions prior to 2.0.0)

@MaxDesiatov
Copy link
Contributor

For posterity, here's what Apple folks say about indicating Linux support in Package.swift:

I have real concerns about establishing the idea that Linux is a monolith. ktoso is right to say that Linux does not have “versions” per-se, but a bunch of the Linux userspace is versioned, and in principle we can take a userspace-based approach to defining the Linux platform. I want to avoid us painting ourselves into a corner we can’t easily get out of again.

I hope that SPI allows rich metadata for indicating platform support, not just a single supportsLinux boolean.

@MaxDesiatov
Copy link
Contributor

Another metadata field could indicate whether a package is looking for funding/sponsorship with a link to the sponsorship page, which NPM currently supports, as far as I'm aware.

Thing I'd quite like to see is security advisories, again from my experience with NPM. Interestingly enough, the Package Registry pitch (see the "Security auditing" section) suggests storing that data in the registry. It's an interesting question overall, whether you want to store some metadata in the package, in the registry or purely in the index? (cc @mattt)

@mattt
Copy link

mattt commented Jun 7, 2020

@MaxDesiatov How registries communicate and how SPM acts upon advisories is still up in the air. swift package audit is one way to do this, but it'd be better if this were automatically resolved for you by Dependabot. I'd recommend punting on this until we all have a better idea of how this will work.

@daveverwer @finestructure A few things to consider with your metadata file:

  • Metadata changes across releases (e.g. a package can be renamed or transferred or deprecated at a particular version). This is why, for the Swift Package Registry proposal, I decided to associate metadata for each release. However, having one metadata record to reflect the current / latest release is also valid; just something to consider or mention explicitly.
  • For the non-Swift-specific fields, I'd recommend adopting a standard like Schema.org's SoftwareSourceCode for representing metadata, as we do for the Swift Package Registry proposal. It anticipates what I'd expect would be >99% of what you might want to represent, and saves you the trouble of working that out for yourself.

@daveverwer
Copy link
Member

daveverwer commented Jun 7, 2020

@daveverwer @finestructure A few things to consider with your metadata file:

  • Metadata changes across releases (e.g. a package can be renamed or transferred or deprecated at a particular version). This is why, for the Swift Package Registry proposal, I decided to associate metadata for each release. However, having one metadata record to reflect the current / latest release is also valid; just something to consider or mention explicitly.

Absolutely, our database design already stores metadata next to every version and we'd want this additional metadata to be stored in exactly the same way. I agree that the history of this is important.

  • For the non-Swift-specific fields, I'd recommend adopting a standard like Schema.org's SoftwareSourceCode for representing metadata, as we do for the Swift Package Registry proposal. It anticipates what I'd expect would be >99% of what you might want to represent, and saves you the trouble of working that out for yourself.

This is fantastic, thank you @mattt!

@daveverwer
Copy link
Member

More metadata:

  • Categories

We need to think carefully about this. Is it a predefined list? Is it just any keyword the authors want to include? Is it supplemented by GitHub tags for a project?

I think it'd be really nice to do a mix of both. Maybe have a broad category which is a fixed list, but then also let people include tags, and bring the GitHub tags into that data.

@daveverwer
Copy link
Member

I think it'd be really nice to do a mix of both. Maybe have a broad category which is a fixed list, but then also let people include tags, and bring the GitHub tags into that data.

In case it's not clear, I didn't mean both of those in a single field. They'd be separated.

There are also exciting ideas around here for auto-categorisation based on:

  • What frameworks packages import
  • What classes they use (much harder to conclude from!)

I think figuring out the frameworks could lead to some interesting auto-categorisation though.

@erica
Copy link

erica commented Jun 17, 2020

Moving the conversation here as you requested, @daveverwer

For me, the priorities are an array of tags, which are strings, probably kebab (which can be freeform or documented or whatever), and to a much lesser degree a single string abstract. Those two tweaks massively enhance discoverability and documentation.

Supports Linux (Boolean)

Addressed with tag: ubuntu-18.0.4

Home page (URL)

The hosting repo IMO is the home page.

Documentation Page (URL)

Ditto

Deprecated (Boolean) (So package authors can officially deprecate a package, without removing it)

Tag: deprecated

Deprecated in favour of (URL) if it was superseded or merged with another project.

Tag: deprecated
Abstract: "blah blah, superceded by whatever project name"

What frameworks packages import

Tags: ArgumentParser, CoreGraphics

What classes they use (much harder to conclude from!)

Not a big fan of this

Categories

Tags

Another metadata field could indicate whether a package is looking for funding/sponsorship with a link to the sponsorship page, which NPM currently supports, as far as I'm aware.

Tags: Patreon, Sponsored

Sometimes a single good hammer is as good as a few dozen individual fields.

@daveverwer
Copy link
Member

Thank you Erica!

I wasn't necessarily thinking all of these would be independent fields. I realise that I did start my original suggestions with data types, but it was more about capturing all the different things we might want to think about. Your point is very well taken though!

@erica
Copy link

erica commented Jun 17, 2020

Think about this too, try to search for my package now, which is on the SPI without using the word now. And then think about how good tagging would promote its visibility for visitors looking for such a utility.

@mattt
Copy link

mattt commented Jun 18, 2020

What @erica's proposing here can be described as a Folksonomy. And short of developing a more comprehensive taxonomy / ontology, I agree that this would probably be the best-fit solution for the problems you're most interested in solving. You get 80% of the benefit of classification with 20% of the effort.

@daveverwer @finestructure As you consider adopting this, I'd encourage you to take a look at that Wikipedia page for Folksonomies to understand the specific trade-offs you're making and challenges you're most likely to encounter.

@daveverwer
Copy link
Member

daveverwer commented Jun 18, 2020

I don't have a huge amount of time right now, but just to chime in. If search were the only issue here, that'd be one thing, but it's about more than search. Being able to give rich, actionable data on the package pages will require more structure to some of these bits of metadata.

@erica
Copy link

erica commented Jun 18, 2020

Do you have a list of goals driving your need for metadata? For example, take the mention of listing Authors -- which I think has the worthy outcomes of being able to give due credit, being able to contact individuals who participated in the development, being able to see what an individual has contributed to. What would be the driving use of this specifically within the SPI project?

@daveverwer
Copy link
Member

Do you have a list of goals driving your need for metadata?

Yes, there are two:

  1. Search - As you correctly surmised, things like categories and framework names go to drive better search relevancy.
  2. In pursuit of the main goal of the site, which is to allow people to make better decisions about the packages they choose - Giving structure to some metadata will make it easier for people to make decisions as it can be presented better on the package pages.

We want to bring as much of the information that's needed to judge the quality of a package into one place. For example, instead of having to check how many pull requests/issues there are and when the last one was closed, we bring that in automatically, right alongside information about what versions of Swift the package supports, and whether the stable release is the right one to target, or if there's actually a beta which would better suit your needs.

All of that data so far is structured as it comes from the manifest, from GitHub, and from the repository itself. There's a place for unstructured/tag-based data, but I don't think it completely replaces the need for structure.

We also want to use some of this structured data to drive a "quality score" for a package. I don't think it's clear yet whether this quality score is made public, or just used internally for search ranking (we have a version of this already) there are pros and cons to both. But, if metadata is just tag-based, it's much harder to do that. Especially when tags can be typed incorrectly or interpreted in different ways (do linux and ubuntu-18.04 get points for supporting Linux, where ubuntu1804 doesn't?). It's definitely a trade-off. -- Just a note, I'm not saying packages would definitely get an increased score for supporting Linux, it's just an illustration.

For example, take the mention of listing Authors -- which I think has the worthy outcomes of being able to give due credit, being able to contact individuals who participated in the development, being able to see what an individual has contributed to.

This is absolutely one of the areas where I'd want structured metadata. Allowing names of people who have not necessarily committed code to be credited, to allow people to define a custom link so they can be credited in the way they'd like to be credited rather than just assuming they want a link to their GitHub profile.

But it's more than that, we get platform data for Apple platforms from the package manifest, so if we are able to define Linux support as structured data rather than just a tag then we can place that information on the package page next to the Apple platforms rather than having the Apple platforms in one place, and Linux mixed in with the other tag data.

That's not a comprehensive list of where I think structured data will be necessary, and I think we can make that decision when we have a better sense of what metadata people would find useful. I'd expect some of it to be represented as tags/searchable metadata

We also need to be careful not to prematurely go towards everything being a tag, once people start filling in this file it's going to be hard to effect widespread changes to it, so I'd rather get it right before giving it a push in terms of getting it adopted.

At the same time, we don't want this metadata file to be overly onerous to fill in. I think we'll find a balance. But starting with defining what data people might like to see feels like a good place to start.

@erica
Copy link

erica commented Jun 19, 2020

Then I think the most feasible way forward is to co-locate a metadata file with Package.swift (in the same level one normally finds README, CHANGELOG, LICENSE) whose structure we define and hope it takes off. I can see where tags alone aren't going to get you where you need to go, even though they are the path of least resistance for modifying the Package spec from the formal review process.

@kiliankoe
Copy link

The hosting repo IMO is the home page.
~ @erica

As an aside, the Rust ecosystem's package index crates.io has a distinction between homepage, documentation and repository, which I personally find quite valuable. Of course many crates don't have a specific homepage or just list their repo there as well, but the option of pointing to a specific landing page for a project is quite comfortable. See the page for the popular crate serde for an example.

@daveverwer
Copy link
Member

Just a note here that I've done some clean up, aggregation and work on this today and started a new issue to track it - #435

Before launch, Sven and I were working mostly alone on this project, and chatting on a call most days. We were very much on the same page, so many of the issues here start with nothing more than a word or two. The first couple of posts in this issue are a perfect example of that, and they can be confusing to new people coming into the thread.

Now that more people are involved those few words can seem short/abrupt as they have no context. That's why I'm making a new issue to clarify our original intent. I've linked relevant comments in this thread from that new issue.

I'll close this issue, but please do feel free to move the conversation to #435.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request epic
Projects
None yet
Development

No branches or pull requests

6 participants