Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc aliases for python and Java collections #81989

Closed
wants to merge 1 commit into from

Conversation

jyn514
Copy link
Member

@jyn514 jyn514 commented Feb 11, 2021

No description provided.

@jyn514 jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools labels Feb 11, 2021
@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 11, 2021
@jyn514 jyn514 changed the title Add aliases for python collections Add doc aliases for python collections Feb 11, 2021
@jyn514 jyn514 changed the title Add doc aliases for python collections Add doc aliases for python and Java collections Feb 11, 2021
@@ -137,6 +137,7 @@ pub(super) const MIN_LEN: usize = node::MIN_LEN_AFTER_SPLIT;
/// *stat += random_stat_buff();
/// ```
#[stable(feature = "rust1", since = "1.0.0")]
#[doc(alias = "map")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you're adding map to BTreeMap, do you want to add set to BTreeSet? Either way they should show up in the results though, so not really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks. Added.

Either way they should show up in the results though, so not really necessary.

This will make it show up in the top, right now it's a long way down. image

@@ -1486,6 +1488,7 @@ impl<T, A: Allocator> Vec<T, A> {
/// ```
#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
#[doc(alias = "append")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to add, but it is unfortunate that it will pollute results for Vec::append.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the alias or Vec::append will be shown first in the results?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also could potentially be worth adding a line to Vec::append's docs like:

Not to be confused with Vec::push. Vec::append is different from .append in Python.

Copy link
Contributor

@ojeda ojeda Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the alias or Vec::append will be shown first in the results?

Aliases should always be shown last in the search results, I'd say, if they are not already.

Vec::append is different from .append in Python.

I like the idea, but I think adding references to other languages in the documentation for the Rust function is going a bit too far. Instead, I'd go for #82000, allowing a comment field for each alias too, where this comment can be written. Then, the search can show it however is best, i.e. something like:

#[doc(alias = "append", lang = "Python", comment =
    "Not to be confused with `Vec::push`. `Vec::append` is different from `.append` in Python."
)]

What do you think? I can add it to the RFC.

Copy link
Member Author

@jyn514 jyn514 Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, seems reasonable. I'll drop append from this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aliases should always be shown last in the search results, I'd say, if they are not already.

That seems really strange, why do you say that? That puts them behind very unrelated things because of fuzzy search.

image

Copy link
Contributor

@ojeda ojeda Feb 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems really strange, why do you say that? That puts them behind very unrelated things because of fuzzy search.

Ah, yeah, there are too many results [*]. What about before the fuzzy ones? i.e.:

  • Exact matches (non-aliases)
  • Exact matches (aliases)
  • Fuzzy search (any, assuming it sorts by some "similarity" index)

If we get #82000, then we could even do:

  • Exact matches (non-aliases)
  • Exact matches (Rust aliases)
  • Exact matches (non-Rust aliases)
  • Fuzzy search (any, assuming it sorts by some "similarity" index)

[*] Why is matching all that for free? Is it due to substrings (e.g. fREm -> fREe) or Hamming distance or something like that? I think for words < 4 in size, it should not match if at least 3 characters match. But anyway, this is off-topic... :-) The general point is that the search results seem like they could be improved (i.e. instead of working around it with the aliases).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, there are too many results [*]. What about before the fuzzy ones? i.e.:

I like this. Can you open an issue for it? I think it makes a lot of sense.

[*] Why is matching all that for free? Is it due to substrings (e.g. fREm -> fREe) or Hamming distance or something like that? I think for words < 4 in size, it should not match if at least 3 characters match. But anyway, this is off-topic...

Can you open an issue for that too? 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! #82131 and #82132.

@jyn514 jyn514 force-pushed the doc-alias-collections branch from cd04c59 to 1a79e76 Compare February 14, 2021 22:14
@rust-log-analyzer

This comment has been minimized.

@jyn514 jyn514 force-pushed the doc-alias-collections branch 2 times, most recently from dd41345 to 0188363 Compare February 14, 2021 22:24
@Mark-Simulacrum
Copy link
Member

Hm, it seems not bad necessarily but I'm a bit worried about eventually everything in std gaining 3-4 different aliases, some conflicting, depending on perspective. It seems like we should at least put a comment next to each one for origin language -- I wonder if aliasing as "python dict" would be a bad idea.

cc @rust-lang/libs, curious on thoughts.

@jyn514
Copy link
Member Author

jyn514 commented Feb 20, 2021

It seems like we should at least put a comment next to each one for origin language -- I wonder if aliasing as "python dict" would be a bad idea.

See also #82000.

@m-ou-se
Copy link
Member

m-ou-se commented Feb 20, 2021

Hm, it seems not bad necessarily but I'm a bit worried about eventually everything in std gaining 3-4 different aliases, some conflicting, depending on perspective. It seems like we should at least put a comment next to each one for origin language -- I wonder if aliasing as "python dict" would be a bad idea.

Yeah I feel like adding all the equivalents of everything in popular other langues quickly gets out of hand. I think I'd prefer to only add those where the word is common enough that it's often used as a regular noun/verb, like "memcpy it to some newly malloc'd buffer" or "put the pairs in a map". So for this specific PR, I'd say map, dict, and set are good examples, but I don't think we should add ArrayList.

I also think we should be extra careful with names for things that already have another related meaning in Rust. For example, depending on the languages someone might already be familiar with, the difference between an array and Vec could be pretty confusing. Adding an alias like Array to Vec might make that even more confusing. In those cases, I rather add some notes to the array and Vec documentation linking each other.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 20, 2021
@BurntSushi
Copy link
Member

I agree with @m-ou-se and would strongly opposed adding specific aliases like ArrayList to the docs. In addition to such a thing getting quickly out of hand, it could also be misleading. If we start aliasing specific data structures from other ecosystems, while there are assuredly similarities (perhaps even almost identical), there may also be differences between them. Those differences, depending on one's perspective, may or may not make aliases surprising to folks.

Using abstractions like "map" and "set" as aliases seems fine to me. Nobody owns those words.

I'd say "dict" is somewhat border line since it's very strongly associated with one ecosystem in particular. Ecosystems use lots of different names for their map types. "associative array" (PHP) and "hash" (Ruby) come to mind. If we add "dict," should we also add those? On the other hand, at least "associative array" and "dict" are fairly abstract words on their own. So I'm torn on those in particular.

@jyn514 jyn514 force-pushed the doc-alias-collections branch from 0188363 to 6591010 Compare February 23, 2021 05:32
@jyn514
Copy link
Member Author

jyn514 commented Feb 23, 2021

I removed ArrayList and append, changed Map to map (not that it matters, but it means it looks like the concept rather than the interface from Java), and added associated array wherever map is used. I added map/dict/associative array to BTreeMap. I didn't add hash because the first search result is currently for https://doc.rust-lang.org/stable/std/hash/index.html, which I think is a better fit. That module doesn't currently link to HashMap; I'll make a follow-up PR adding that.

@ojeda
Copy link
Contributor

ojeda commented Feb 23, 2021

dictionary would be a bit better (if typing dict makes the entry appear too).

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Feb 26, 2021

I think ArrayList alias is fine myself. It's a very specific name that in all programming languages it is used in it means a contiguous growable array type. Additionally, doc aliases aren't fuzzy searched for, if an user searches for ArrayLis, ArrayLit, ArryList or whatever the documentation simply won't find it, so there is no risk of user finding a Vec when searching for a term that happens to be very similar to ArrayList, it needs to be ArrayList specifically.

If we start aliasing specific data structures from other ecosystems, while there are assuredly similarities (perhaps even almost identical), there may also be differences between them. Those differences, depending on one's perspective, may or may not make aliases surprising to folks.

Well... yes, there are differences between Vec and ArrayList, I just don't think they are particularly meaningful and likely to cause confusion. For instance, resizing strategy is different. Rust's growth factor is 2, Java's growth factor is 1.5. It's a good arguement for considering aliases carefully, especially with data structures more complicated than Vec, it just doesn't really apply here.

I think I'd prefer to only add those where the word is common enough that it's often used as a regular noun/verb, like "memcpy it to some newly malloc'd buffer" or "put the pairs in a map". So for this specific PR, I'd say map, dict, and set are good examples, but I don't think we should add ArrayList.

I'm not seeing myself how ArrayList doesn't meet this criteria. C programmers (and to some extent Rust programmers due to C FFI support) will often use memcpy and malloc as regular words because those identifiers are very often encountered in C programs. Meanwhile Java programmers will often use ArrayList as a regular word because it's a very common identifier in Java programs, similarly to how Rust programmers often use "vector" as a regular word - the Rust documentation even explicitly points that out saying "pronounced 'vector'" in the first line of Vec's documentation.

Anyway, I believe adding ArrayList would help Java programmers who are learning the language. They could try searching for ArrayList if they happen to forget what the dynamically allocated type was called in Rust. Currently if an user searches for ArrayList, this is the resulting page.

image

Clicking "Try on DuckDuckGo" is not very useful, it will point the user to an array (nope, that's not what the user wanted) and LinkedList (which is honestly worse than pointing nowhere because it kinda works as a vector, but I don't think we want to point beginners to it).

@bors

This comment has been minimized.

`append` sounds strange because Rust already has append and it does
something different. The reason I added it is one of my friends knew
from python that it added a single element to the end of an array. When
they started learning Rust, rather than looking for pop(), they wrote
something like

```rust
my_vec.append(&mut vec![x]);
```

which is clearly not ideal. Hopefully this would have pointed them in
the right direction.
@jyn514 jyn514 force-pushed the doc-alias-collections branch from 6591010 to a8c8025 Compare March 2, 2021 01:21
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Mar 17, 2021
Add more links between hash and btree collections

- Link from `core::hash` to `HashMap` and `HashSet`
- Link from HashMap and HashSet to the module-level documentation on
  when to use the collection
- Link from several collections to Wikipedia articles on the general
  concept

See also rust-lang#81989 (comment).
@crlf0710 crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 20, 2021
@crlf0710 crlf0710 added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Apr 9, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Apr 21, 2021

We discussed the doc(alias) PRs in the library team meeting, and are not sure what direction we want to take here. At first it seemed good and useful to have things like doc(alias = "memcpy") on the Rust equivalents, but now that more PRs are coming in for more doc aliases, it feels like it's getting a bit out of hand and we need a policy for this first.

We didn't reach a clear conclusion about which kind of aliases we should and shouldn't have, but we did feel like adding more aliases like this is not the way to go.

If someone doesn't know about e.g. Vec and types in list, they're probably better served by a query like "rust list" on their favourite search engine, which will point them to https://doc.rust-lang.org/book/ch08-01-vectors.html and some other resources which can explain them in which ways a Vec is and isn't the kind of list they need.

Similarly for dict or anything else that doesn't map 1:1 from another language to Rust. A web search for "dict rust" will result in blog posts and Stack Overflow answers that explain things about HashMaps, and points them to suggestions for other crates that provide something closer to what they might be looking for. Just applying alias = "dict" to HashMap could be misleading, since they have different properties. (E.g. whether the keep the insertion order.)

We didn't make any final decision, but we do want some kind of clear guideline/policy that we can all agree on before accepting more aliases. Feel free to open an issue on https://github.com/rust-lang/libs-team or ping us on Zulip if you (anyone) want to work on this.

(Edit: For anyone interested, some more details/thoughts on aliases here: #81988 (comment))

@jyn514
Copy link
Member Author

jyn514 commented Apr 21, 2021

That makes sense to me. I'm not planning to put time into making a long-term policy.

@jyn514 jyn514 closed this Apr 21, 2021
@jyn514 jyn514 deleted the doc-alias-collections branch April 21, 2021 15:20
eggyal pushed a commit to eggyal/copse that referenced this pull request Jan 9, 2023
Add more links between hash and btree collections

- Link from `core::hash` to `HashMap` and `HashSet`
- Link from HashMap and HashSet to the module-level documentation on
  when to use the collection
- Link from several collections to Wikipedia articles on the general
  concept

See also rust-lang/rust#81989 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.