Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional arrays #1839

Closed

Conversation

foxtran
Copy link

@foxtran foxtran commented Jul 30, 2022

I opened #1661 and many folks thumbs it. I tried to implement some quantum chemistry (which is performance-critical) code and look which features are required for it.
Here, it is my point of view how Carbon may became popular in HPC community in comparison with C++, which has a lot application, but new Fortran applications are born and growing nowadays.

@foxtran foxtran force-pushed the proposal-multidimensional-array branch from 239368c to 03d48f0 Compare July 30, 2022 14:21
@foxtran foxtran changed the title Multidimensional array Multidimensional arrays Jul 30, 2022
@foxtran foxtran marked this pull request as ready for review July 30, 2022 16:03
@emlai
Copy link

emlai commented Jul 30, 2022

There's some related discussion about multidimensional arrays in the array type syntax proposal #1787, starting from here.


## Proposal

We should add support of multidimensional arrays in Carbon via syntax extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with #1787 (see this comment), we've put this proposal in a procedurally awkward position, because we haven't yet adopted a proposal for one-dimensional arrays yet. That being the case, I'm not sure what the best way forward is, but it might make sense to defer this proposal until we've adopted a one-dimensional array proposal that this can build on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply and thank you for your review! As I can see, #1787 is just about array initialization syntax. Should I create a new proposal for 1D arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I create a new proposal for 1D arrays?

I think either you or @asoffer should; see the discussion in the #process channel on Discord.

proposals/p1839.md Outdated Show resolved Hide resolved
proposals/p1839.md Outdated Show resolved Hide resolved
proposals/p1839.md Outdated Show resolved Hide resolved
Comment on lines +82 to +85
```carbon
var x: [i32; :, :];
```
For avoiding Undefined Behavior, `x` has shape `(0, 0)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does : mean "deduce this dimension from the initializer", or does it mean "this dimension can vary at run-time"? I'd strongly prefer the first meaning, but if that's the case, this code should be a compile error, just like var x: auto; is a compile error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that dimensions may vary at run-time. The most simple example is string (C# example):

string s = "Hello";
s += ", world!";

In Fortran, it oftenly used for allocating arrays when sizes are becoming known:

type(atom) :: atoms(:)
integer :: Natoms
Natoms = get_atoms_len()
allocate(atoms(Natoms))

In carbon, for simplifying (Sorry for Fortran-style):

var n_atoms: i64;
var atoms: [atom; :];
n_atoms = get_atoms_len();
allocate(atoms, shape = ( n_atoms ))

It can be rewritten as:

var n_atoms: i64 = get_atoms_len();
var atoms: [atom; n_atoms];

But I would prefer to have such constructions since they may be actively used for class fields.

Copy link
Contributor

@geoffromer geoffromer Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it would make the type of var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5)); very different from the type of var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5));.

  • shape(x) would presumably be a compile-time constant, but shape(y) could not be.
  • x could store all its elements directly in the object (like a std::array does), but y has to be implemented with a pointer to heap-allocated memory.
  • The generated code for x[2, 1] can consist of a single memory access, but I believe the generated code for y[2, 1] requires at least two memory accesses, even if you don't count the indirection through the pointer mentioned above.
  • x provides pointer stability, whereas y does not: &x[0] will never change, but &y[0] could have different values over the course of y's lifetime.

Those differences are so basic that I think it would be misleading to use such similar syntax to represent them both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my point of view, var x: [i32; 3, 2] = ((0, 1, 2), (3, 4, 5)); and var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5)); are the same. In y, : is used for skipping direct mention of dimensions' lengths. So,

  • shape(x) and shape(y) would presumably be a compile-time constant.
  • x and y could store all its elements directly in the object (like a std::array does)
  • x and y provides pointer stability

I did not get why accessing for y is as y[2][1], not y[2,1]. It is not a nested array, so accessing will be via single memory access. For nested array, you are right about two memory accesses.

Let me separate var y: [i32; :, :] = ((0, 1, 2), (3, 4, 5)); and var z: [i32; :, :];. In the case of y, : is a syntax sugar for skipping direct mention of dimensions' lengths. For z, : means that this array can be allocated/reallocated. And for z array, it is true what you wrote about y, except (3).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move this discussion to your 1D array proposal, but to answer your questions:

I did not get why accessing for y is as y[2][1], not y[2,1].

Sorry, that was a typo. I've fixed it now.

It is not a nested array, so accessing will be via single memory access. For nested array, you are right about two memory accesses.

Even if it's not nested I believe it needs two memory accesses. Let's assume the 2D array is implemented as a 1D array in row-major order. That means the element at row i and column j is located at index i + N * j in the underlying array, where N is the number of columns. So in order to access that element, I need to know the number of columns. If the number of columns can vary at run-time, that means I need to load the number of columns from memory before I can access the element. On the other hand, if the number of columns is fixed at compile time, I can avoid doing that load.

x[:, 0] = x[:, 1]; // x = ((3, 4, 5), (3, 4, 5));
```

### Iterators (?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use cases for this seem likely to be rare enough that named functions would be clearer, and would avoid the need for a new core-language syntax. In particular, I'm concerned about the ... syntax conflicting with variadics.

@@ -0,0 +1,331 @@
# Multidimensional arrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a piece of high-level feedback, I'd encourage you to look for ways to make this proposal smaller. A lot of the features you're proposing here seem like they could be follow-up proposals once the main multidimensional array feature is in place. Some of these features seem like they could be controversial, or at least require substantial discussion, and I don't want the main proposal to get bogged down.

proposals/p1839.md Outdated Show resolved Hide resolved
}
```

#### Elemental functions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the background and rationale for this feature. For example, what problems does it solve? Can those problems be solved with libraries instead of language features? Is there precedent for this feature in other languages?

This might be a good piece to split out into a separate proposal.

}
```

### Standard library
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like another good piece to postpone to a future proposal. I'm not sure if we even have the right people on the project to properly review a full library design for multidimensional arrays, so it might be better to focus for now on the core-language functionality.

@jonmeow jonmeow added proposal A proposal proposal rfc Proposal with request-for-comment sent out labels Aug 24, 2022
@jonmeow jonmeow marked this pull request as draft August 24, 2022 18:08
@jonmeow
Copy link
Contributor

jonmeow commented Aug 24, 2022

I relabeled this as a proposal because its only content is a proposal file. After a closer look, I see it's been a few weeks and geoffromer's requests for more explanation haven't been addressed. I've converted this back to draft so that you have time to address comments and add more detail to the proposal.

@jonmeow
Copy link
Contributor

jonmeow commented Aug 24, 2022

Note, if you have any questions, #proposal-prs-and-process on Discord may be a good place to discuss proposal writing.

@github-actions
Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please comment or remove the inactive label.
This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

@github-actions github-actions bot added the inactive Issues and PRs which have been inactive for at least 90 days. label Nov 23, 2022
@github-actions
Copy link

github-actions bot commented Dec 8, 2022

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it.
This PR was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

@github-actions github-actions bot closed this Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Issues and PRs which have been inactive for at least 90 days. proposal rfc Proposal with request-for-comment sent out proposal A proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants