-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some crates in the crates.io index have invalid json #1168
Comments
Yes, we should amend the spec. |
@alexcrichton how would you feel about rewriting these hundred records? It seems like it'd be nicer to not have to have a weird exception around kind being required just because a small number of bad records were created. |
I'd personally prefer to not update old entries in the registry, presumably with Serde this is a one line fix, no? |
@alexcrichton ultimately it should be a one liner but right now it is not. Generally I'm against editing of existing crates (crates.io being append only is a really good thing) but there are a few issues present in the crates.io index where I think some light editing is a good idea. This is one such an issue, others are #1201 #1177 #1179 . |
If we want to allow editing, we would want a strict policy about when it's appropriate. At minimum, any cleanups should not affect older versions of Cargo. One thing to keep in mind is that wholesale editing of the index creates commits that further bloat the size of the repository. We might want to avail ourselves of the option to flatten the repository back to a single commit whenever we do any rewriting, but as we've never done it before, we need to investigate what the logistics of that would be. |
We discussed this in the cargo meeting today and we decided that rather than rewrite the history, we'd prefer to modify the spec for the index format. In particular, we think that when writing entries to an index, implementations MUST write the |
Having a policy on this would be a great idea. I've just seen edits both by carol and by alex so I've thought edits that are just cleaning up after a technical mishap/bug that maintain the general spirit of crates.io are okay. With "spirit" I mean that people who want to force their users to use the latest versions of their libs and are asking the crates.io team to delete their older versions are being refused. All I am asking for is an edit similar to its predecessors. A policy that states this would be a good idea. If we are talking about this, I'd also love transparency about admin interactions. E.g. why was the nom_lua crate deleted? When did admins have to delete crates due to a CoC violation? When did admins give a crate to someone else or helped someone who was locked out of their account? lobste.rs has a very nice moderation log. Discussions on this probably better fit into its own RFC, I'd love to write one if there is interest in the team. I think this particular issue can be resolved with changing the spec only, but for issues like #1177 or #1201 or #1179 there is no fitting fix in the spec... Do you really think that allowing two hashes for one (crate, version) tuple is a good idea :)? Or keeping crates in the index that can't be downloaded from S3? I think if you can't recover the crate's content, you should admit that it is unrecoverably lost, or if there was a conscious decision to delete those crates, you should delete them from the index as well.
Rewriting the history would be really bad. The index is the only public document on when and in which order crates got added or modified (the API exists as well but you can't easily bulk download the entire history like you can download the index and it is filled with repetitive and predictable content... the API is less a document and more a service). In order to keep the index small, having shallow clones is enough. Then you only get the blobs actually used by the latest commit. Also, and very importantly, one could imagine having a "time machine" feature for cargo where you say "get me to crates.io of oct 3, 2015". All it would have to do is to get the full history from github and then use the last index commit before that date instead of latest master. |
with rust-lang/rfcs#2301 and #1168 (comment) it looks like this issue is covered, so I'll go ahead and close this now. |
I'm trying to parse the crates.io index using the serde and json crates. As we have a specification now, I know what I can require! I've created some structs and am iterating over the directory hierarchy, parsing every single file.
Now I have encountered some crates where the crates.io index violates that specification.
There are 100 older (read: 2014) crates that inside their dependencies list are either lacking a "kind" attribute, or have it set to
null
. (the only crate where kind is set to null is rope, due to being yanked).Click to expand the list of affected crates
As the index repo doesn't have a bugtracker, I chose to file the bug here. The missing of that attribute seems to be a violation of that specification. If you disagree that this is a violation, I'd really love to see an amendment of the spec that explicitly states that missing or null
deps
fields are allowed, together with a statement about the value you can assume if thedeps
field is not set.The text was updated successfully, but these errors were encountered: