Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework of gl memory / buffer handling #2822

Merged
merged 19 commits into from
Jun 20, 2019
Merged

Rework of gl memory / buffer handling #2822

merged 19 commits into from
Jun 20, 2019

Conversation

kyren
Copy link
Contributor

@kyren kyren commented Jun 12, 2019

Allocates native buffers to back memory as soon as allocate_memory is called,
hal buffers become logical sub-ranges of native buffers.

This is still WIP, there is a lot of room for improvement here. Separate INDEX
memory could be reserved only for webgl, there could be separate upload /
download memory, and the end of buffer sub-ranges could be checked with asserts
in debug mode.

Fixes #2812
PR checklist:

  • make succeeds (on *nix)
  • make reftests succeeds
  • tested examples with the following backends: gl
  • rustfmt run on changed code

@kyren
Copy link
Contributor Author

kyren commented Jun 12, 2019

This is still highly WIP, I just went ahead and created the PR to get feedback.

Notably, this doesn't fix wgpu-rs with rendy yet, rendy errors (when trying the wgpu-rs cube example) with NoSuitableMemory(0, CPU_VISIBLE). I need to look into exactly why that is, and it might either need further gfx changes or potentially rendy changes as well. (Edit: actually, never mind, it might be wgpu asking for an INDEX + non-INDEX buffer, which would make sense!) (Edit 2: never mind, I just don't know how to properly use bitflags, working on it...)

@kyren
Copy link
Contributor Author

kyren commented Jun 12, 2019

Okay, so the cube example in wgpu-rs is still broken, and it has something to do with the fact that you can't call glTexSubImage2D with a buffer bound to GL_PIXEL_UNPACK_BUFFER that is also currently memory mapped.

In the cube example, this is coming from uploading texture data to a new buffer and calling copy_buffer_to_texture on that buffer. I believe this ends up being allocated on a rendy-memory "dynamic" heap, which actually does not unmap its buffers when DynamicBlock::unmap is called. HOWEVER, even if you patch rendy's DynamicBlock::unmap to call device.unmap_memory, the cube example will still not work.

I don't know how opengl mapping of a buffer range really works, I guess you can't unmap a buffer range, you can only unmap all of the mapped buffer ranges at once? Does opengl support mapping multiple different ranges of a single buffer simultaneously?

Curiously, if you patch rendy memory to call gfx unmap_memory on DynamicBlock::unmap as described above, the cube example errors with the exact same error! AND, even more curiously, if you unmap the buffer AGAIN as a hack in gl/src/queue.rs handling of Command::CopyBufferToTexture, the call to glTexSubImage2D will not error. I don't think I'm missing a call to glMapBufferRange anywhere, so my current theory is that glUnmapBuffer unmaps the most recent call to glMapBufferRange, but not all of them.

In any case, I think the best way forward might be not to use the rendy dynamic allocator with the opengl backend and instead substitute the dedicated one? I'm not really super sure!

@kyren kyren changed the title [DRAFT] WIP rework of gl memory / buffer handling [DRAFT] rework of gl memory / buffer handling Jun 13, 2019
Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very well written! I got a few suggestions here, but most importantly we need the persistent mapping for the cpu-visible buffers to become useful.

// Active index type, set by the current index buffer.
index_type: Option<hal::IndexType>,
// Active index type and buffer range, set by the current index buffer.
index_type_range: Option<(hal::IndexType, Range<u64>)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use hal::buffer::Offset typedef here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, done!

@@ -1088,7 +1087,9 @@ impl command::RawCommandBuffer<Backend> for RawCommandBuffer {
}

unsafe fn dispatch_indirect(&mut self, buffer: &n::Buffer, offset: buffer::Offset) {
self.push_cmd(Command::DispatchIndirect(buffer.raw, offset));
let (raw_buffer, range) = buffer.borrow().as_bound();
assert_eq!(range.start, 0, "buffer offset unsupported in indirect draw");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually support indirect buffer offset, it's a part of DispatchIndirect variant.
What we do not support is draw_indexed_indirect when index buffer offset is non-zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, okay that makes sense, I was confused between DispatchIndirect and draw_indirect and draw_indexed_indirect. OpenGL doesn't support draw_indexed_indirect at all right now so that's not an issue.

gl.buffer_storage(target, buffer.requirements.size as _, None, flags);
gl.bind_buffer(target, None);
} else {
let flags = if cpu_can_read && cpu_can_write {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing these flags appears to be a regression. It's fine to catch up to this after the fact, since your change is much more important, but need to keep it on the radar.

Copy link
Contributor Author

@kyren kyren Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the gl backend was before, allocated memory had n::Memory properties hard-coded to memory::Properties::CPU_VISIBLE | memory::Properties::CPU_CACHED, which means that it was always really allocated with glow::MAP_READ_BIT | glow::MAP_WRITE_BIT or glow::DYNAMIC_DRAW, the rest of it seemed to be dead code?

Now, it's still hard-coded, but since the allocation takes place in the same place that the n::Memory flags are set, there's nothing to check to to see what flags to use when allocating.

I agree it would probably be better if you could control the buffer flags more than this, but it should have the same behavior as it did before.

};
},
// Memory type for uses other than images and INDEX
hal::MemoryType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to also expose a device-local memory type for buffers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounded right to me too, but I didn't know enough to know how to do it correctly. If we have a device-local buffer type, what should the gl backend do when copying from a cpu-visible buffer (backed by a real native buffer) to a device-local buffer? I imagine the correct answer is nothing, but what happens then when the user tries to copy from a fake device-local buffer to e.g. an image buffer? This might be another case where if I knew more about how HAL works it would make more sense to me, but as of right now I need it explained a bit slower :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, now that I think about it I suppose just having an actual, real buffer without any memory mapping would be the right thing to do, even if it needlessly duplicates memory on platforms with emulated memory mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I pushed something that works, it at least fixes the wgpu-rs shadow example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from one buffer to another would be going through glCopyBufferSubData.

what happens then when the user tries to copy from a fake device-local buffer to e.g. an image buffer

Device-local buffer is just a GL buffer (or a sub-range of it, if we have a buffer associated with Memory) without MAP_XXX flags. Copying from it, to it, using as index/vertex, all those things are done the same way as usual.

Copy link
Contributor Author

@kyren kyren Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes perfect sense, and that's where I ended up.

properties: Properties::CPU_VISIBLE
| Properties::COHERENT
| Properties::CPU_CACHED,
heap_index: 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't need to be a separate heap index. We should have 0 for device local types and 1 for cpu visible types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes perfect sense, done.

@@ -612,7 +612,7 @@ impl CommandQueue {
r.image_extent.height as _,
glow::RGBA,
glow::UNSIGNED_BYTE,
0,
r.buffer_offset as i32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use the offset in CopyTextureToBuffer as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! I did in fact forget about CopyTextureToBuffer, and it's now waiting on grovesNL/glow#13

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but it requires depending on a git rev of glow again.

@kyren
Copy link
Contributor Author

kyren commented Jun 13, 2019

I think getting this to work on webgl will require mapping without COHERENT and relying on explicit flushing. Currently working on it.

bors bot added a commit to grovesNL/glow that referenced this pull request Jun 13, 2019
13: Add support for pixel buffer offset in get_tex_image r=grovesNL a=kyren

Necessary for this gfx PR: gfx-rs/gfx#2822

Co-authored-by: kyren <kerriganw@gmail.com>
@kyren
Copy link
Contributor Author

kyren commented Jun 13, 2019

I'm very unsure about whether the non-coherent gl buffer mapping is correct...

@kyren
Copy link
Contributor Author

kyren commented Jun 14, 2019

Okay, I have tested this PR via wgpu-rs and can confirm the cube example works again!

I have also tested this PR with several other required patches and can confirm this allows wgpu-rs to work on webgl again as it did before! Only simple translucent quad is tested, and this requires several additional in-flight patches to wgpu-rs, wgpu-native, and gfx-* (start here if you're reading along and are interested).

@kyren
Copy link
Contributor Author

kyren commented Jun 14, 2019

Now the wgpu-rs shadow example works again (to the same extent it used to work, with some graphical errors).

@@ -23,7 +23,7 @@ bitflags = "1"
log = { version = "0.4" }
gfx-hal = { path = "../../hal", version = "0.2" }
smallvec = "0.6"
glow = { version = "0.2.0" }
glow = { git = "https://github.com/grovesNL/glow", rev = "054aabc1bfbad472e4a08bc55912552d8211daee" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grovesNL what's the plan here? Are we waiting for more fixes to glow before a patch can be released?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, 0.2.0 was published a week ago but there have been a few PRs since then. We can publish whenever this branch is ready to replace the git dependency

@@ -515,7 +514,6 @@ impl d::Device<B> for Device {

Ok(n::Memory {
properties: memory::Properties::CPU_VISIBLE
| memory::Properties::COHERENT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to bring it up, but it looks like we should expose both coherent and non-coherent memory types for CPU visible buffers (where persistent mapping is available), given that GL has the corresponding MAP_FLUSH_EXPLICIT_BIT bit that we can control.

It is totally fine to delay for later. My only concern would be that some of gfx-hal users may expect at least one coherent memory type exposed, since this is a guarantee of Vulkan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add that as well.

Copy link
Contributor Author

@kyren kyren Jun 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (in personal branch), waiting on glow. (edit: merged into this PR now)

};
},
// Memory type for uses other than images and INDEX
hal::MemoryType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying from one buffer to another would be going through glCopyBufferSubData.

what happens then when the user tries to copy from a fake device-local buffer to e.g. an image buffer

Device-local buffer is just a GL buffer (or a sub-range of it, if we have a buffer associated with Memory) without MAP_XXX flags. Copying from it, to it, using as index/vertex, all those things are done the same way as usual.

if self.share.private_caps.emulate_map {
let ptr = mem.emulate_map_allocation.get().unwrap();
let slice = slice::from_raw_parts_mut(ptr.offset(offset as isize), size as usize);;
gl.buffer_data_u8_slice(mem.target, slice, glow::DYNAMIC_DRAW);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably use buffer_sub_data instead of rewriting the whole buffer

Copy link
Contributor Author

@kyren kyren Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's an awful thing to miss, I'm really sorry!

Turns out, there is no buffer_sub_data in glow, I will add that.

Copy link
Contributor Author

@kyren kyren Jun 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in personal branch, waiting on glow (edit: merged into this PR now)


if let Err(err) = self.share.check() {
panic!("Error unmapping memory: {:?} for memory {:?}", err, memory);
}
}

unsafe fn flush_mapped_memory_ranges<'a, I, R>(&self, _: I) -> Result<(), d::OutOfMemory>
unsafe fn flush_mapped_memory_ranges<'a, I, R>(&self, ranges: I) -> Result<(), d::OutOfMemory>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are also missing invalidate_mapped_memory_ranges implementation, which is required to be sensible for non-coherent memory

Copy link
Contributor Author

@kyren kyren Jun 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added glInvalidateBufferSubData to glow and used that (in personal branch), when using emulated memory maps I'm currently doing... nothing? I guess that's the right thing to do? (edit: merged now)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, flush() updates the data from mapping to GPU, while invalidate() should update the data from GPU to the mapping. It sounds like this still needs to be done with the emulated maps?

Copy link
Contributor Author

@kyren kyren Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah that makes sense. This will also require a new glow method, I'll merge into this PR once that's merged into glow. (edit: done now)

@kyren
Copy link
Contributor Author

kyren commented Jun 14, 2019

I've added an implementation of invalidate_mapped_memory_ranges, added both incoherent and coherent memory types, and changed to use glBufferSubData for flushing buffer in my personal branch of gfx. Waiting on glow to merge my PRs and I will update this PR with those changes.

@kyren
Copy link
Contributor Author

kyren commented Jun 14, 2019

I have probably added a lot of portability issues with this PR, I'm open to suggestions about what needs to be fixed right now, but I don't have a great way of testing everything I might do.

I know that with the changes I'm going to add as soon as the glow PRs are merged, the gl backend will be using glInvalidateBufferSubData which I guess is core only since 4.3. A possible improvement here is to either ONLY provide incoherent memory if that function is available, or potentially provide incoherent memory for platforms with emulated buffer mapping and coherent memory only for platforms with native buffer mapping.

I'm open to suggestions here. I don't mind doing more work on this, but I do feel like the PR has gotten a bit more complex than I was really intending from the outset (and I feel a smidge unqualified to be doing some of the things I'm doing).

@kyren kyren changed the title [DRAFT] rework of gl memory / buffer handling Rework of gl memory / buffer handling Jun 15, 2019
Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for driving this forward! I think we've narrowed down most of rough spots, there is just a few left (noted below)

size,
emulate_map_allocation: RefCell::new(std::ptr::null_mut()),
})
if (1 << mem_type.0) & IMAGE_MEM_TYPE_MASK != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this just be a comparison 1 << mem_type.0 == IMAGE_MEM_TYPE_MASK?

Copy link
Contributor Author

@kyren kyren Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this case, yes, but that assumes that there is only one type of image memory. For the other roles there are multiple types of memory that could work, so the intention was to see whether the given type was in the bit mask at all.

map_flags: 0,
})
} else {
assert!(self.share.private_caps.buffer_role_change);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, so this code can't allocate buffer memory at all if buffer_role_change is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know it's bad, I figured you'd ask about this. We could either offer a unique type of memory for each role and work differently whether or not buffer_role_change was allowed, OR we could force mmap emulation if buffer_role_change is not there. I'm fine with doing either one of those now, this change was just getting to be larger than I expected it to be so I wasn't sure what the best way forward right now was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OR we could force mmap emulation if buffer_role_change is not there.

Is this really an option though? If the user wants a buffer to support both being a vertex buffer and a uniform buffer, no amount of mmap emulation would give you that.

So it looks to me that we just need to expose this "no role change" policy in the exposed memory types and live with that, for now. Later, if we find that applications are too restricted with this, we could think of some workarounds involving copying data around to temporary buffers, but I don't want us to go there just now.

Copy link
Contributor Author

@kyren kyren Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just to clarify, you're saying that the gl backend should have memory types based on no role changes at all, or that this should be dynamic depending on buffer_role_change? Doing the first one is... relatively easy, but the second one is a more involved change. It's easier for me to do the first one and has a higher probability of me getting it right, but if you think the that the gl backend should definitely support multi purpose buffers if available, I can do that. (If it's dynamic, I might have to move a lot of the buffer type logic around, because most of those mask constants will no longer be constant. As a plus, this would also enable mixing index buffers when not on webgl).

Edit: And yeah, you're right that mmap emulation doesn't help. Native gl buffers are still be attached to memory rather than hal buffers, and hal buffers are attached to sub-ranges of native buffers, so no matter what you will hit the buffer role change limitation.

However, I remember now why I thought this: it DOES solve one of the buffer role change problems I ran into on webgl prior to this PR (but not the new ones related to buffer sub-ranges). Currently, gfx-hal binds native buffers to the PIXEL_PACK_BUFFER target during Device::map_memory, which runs afoul of the index role change limitations on webgl. Does this mean that the gl platform doesn't work currently on !buffer_role_change platforms, or is the PIXEL_PACK_BUFFER target special somehow on non-webgl and allowed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be dynamic depending on buffer_role_change

We definitely need the exposed memory types being different, based on the buffer_role_change capability.

is the PIXEL_PACK_BUFFER target special somehow on non-webgl and allowed

We only have to use this binding when doing texture-buffer copies. Technically, this is subset of TRANSFER_DST capability (while PIXEL_UNPACK_BUFFER is a subset with TRANSFER_SRC). So with this in mind there is pretty much 1:1 correspondence between a buffer binding point ("role") and buffer::Usage.

When the implementation just needs to bind a buffer, regardless of the particular bind point, it should pick the bind point that the buffer supports. For example, if the user created a buffer with only VERTEX usage, we'll use the GL_ARRAY_BUFFER binding point for it, etc.

} else {
assert!(self.share.private_caps.buffer_role_change);

let target = if (1 << mem_type.0) & INDEX_MEM_TYPE_MASK != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, could we compare directly instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more than one type of index memory, so the intention here is to check whether the given memory type is any of the index memory types. Is there a clearer way to write this?

glow::ARRAY_BUFFER
};

let is_device_local = (1 << mem_type.0) & DEVICE_LOCAL_MASK != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this flag is needed in both branches, let's move this assignment up to the beginning of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and I went ahead and gave all of the properties clearer names at the top of the function because I felt that was easier to follow.


Ok(n::Memory {
properties: memory::Properties::CPU_VISIBLE
| memory::Properties::CPU_CACHED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future, we'd want to expose "read" memory type independent of the "write", but it doesn't need to happen here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed!

// Buffers are only allowed to be INDEX usage XOR another type of usage, they are not
// allowed to have both INDEX and non-INDEX usage.
if !(buffer::Usage::INDEX | buffer::Usage::TRANSFER_SRC | buffer::Usage::TRANSFER_DST).contains(usage) {
(0, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's error! log here for the users to find out what happened earlier :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

// TODO: Access
gl.buffer_data_u8_slice(target, mapped, glow::DYNAMIC_DRAW);
let _ = *Box::from_raw(raw);
gl.buffer_sub_data_u8_slice(memory.target, 0, &allocation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be doing this on unmap. In Vulkan and gfx-hal, mapping as an operation doesn't actually involve any synchronization. I.e. the only thing that synchronizes is:

  • submit() for coherent memory (implicitly synchronizes all active coherent mappings)
  • explicit flush() and invalidate() for non-coherent memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for being patient with me and helping me learn all of the different memory semantics for modern gfx apis!

@kyren
Copy link
Contributor Author

kyren commented Jun 17, 2019

Thank you for driving this forward! I think we've narrowed down most of rough spots, here is just a few left (noted below)

You're very very welcome, thanks for being patient with me and teaching me a lot about modern gfx APIs in the process 😛

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I suppose there is only getting glow published and maybe reordering the memory types that's left to do here? Or do you think we should land this as is now and follow-up?

// omitting the GL_MAP_FLUSH_EXPLICIT_BIT flag.
if !self.0.private_caps.emulate_map {
memory_types.push(
// Coherent CPU_VISIBLE memory for INDEX buffers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move these to the beginning of memory_types. In Vulkan there is an interesting requirement that a memory type with more flags should be exposed first (i.e one with flags A | B should come before A).

Copy link
Contributor Author

@kyren kyren Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes the memory type masks dynamic based on capabilities, darn.

kyren added 11 commits June 19, 2019 04:26
Allocates native buffers to back memory as soon as allocate_memory is called,
hal buffers become logical sub-ranges of native buffers.

This is still WIP, there is a lot of room for improvement here.  Separate INDEX
memory could be reserved only for webgl, there could be separate upload /
download memory, and the end of buffer sub-ranges could be checked with asserts
in debug mode.
Currently requires master branch of `glow` from git
I believe this is necessary for emulated mapping to work properly, as emulated
mapping cannot be coherent.

Currently unimplemented for emulated mapping, will implement soon.
@kyren
Copy link
Contributor Author

kyren commented Jun 19, 2019

Alright, I suppose there is only getting glow published and maybe reordering the memory types that's left to do here? Or do you think we should land this as is now and follow-up?

I guess it would be better to fix the memory types thing and the brokenness of !buffer_role_change before merging...

* Handles the case where `buffer_role_change` capability is false by generating
  memory types for every buffer role.

* Does not add separate INDEX memory on non-webgl platforms.
@kyren
Copy link
Contributor Author

kyren commented Jun 19, 2019

Okay, I have a new system for handling memory types that is dynamic based on capabilities rather than the old simplistic system. I've also fixed what I think are several bugs that I caught in review, so there's kind of a lot of new changes.

This should fix handling the !buffer_role_change case, improve the backend on non-webgl, and order the memory types according to vulkan requirements.

I haven't tested this with the webgl backend just yet because I need to rebase onto my personal branch to try it out and stash what I'm currently working on in the project that uses it. I'll report back once I've done that.

Edit: Everything seems to work fine under webgl as well!

Move type mask logic into gfx_backend_gl::Share so that it can be shared
Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last questions/concerns :)

if is_readable_memory {
map_flags |= glow::MAP_READ_BIT;
}
if !is_coherent_memory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the condition appears wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! It definitely is wrong, it used to control the explicit flush bit before I figured out that it should control the coherent flag instead, and I forgot to reverse it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it was hiding another bug, which is now fixed 😥

properties: memory::Properties::DEVICE_LOCAL,
buffer: None,
size,
target: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this field behind the buffer option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I almost did that before actually...

// Alignment of 4 covers indexes of type u16 and u32
(INDEX_MEM_TYPE_MASK, 4)
}
let alignment = if usage.contains(buffer::Usage::INDEX) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just return 4 unconditionally here? It's a reasonable alignment to adhere to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, that seems reasonable.

properties: memory::Properties::CPU_VISIBLE | memory::Properties::CPU_CACHED | memory::Properties::COHERENT,
heap_index: CPU_VISIBLE_HEAP,
});
buffer_role_memory_types.push(hal::MemoryType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we missing the CPU-visible non-cached one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what, I actually did all this still with the assumption that there would only be R/W memory types this patch and not write-only ones (and leave that for a later patch). I guess though, with the way everything else is set up, the ONLY THING left to do is just add an entry here and it will work? I guess that's really all that's left to do so it's silly not to do it!

memory_types.push((
buffer_role_memory_type,
MemoryRole::Buffer {
allowed_usage: buffer::Usage::TRANSFER_SRC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use buffer::Usage::all() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure thing I just forgot it existed :P

* Fix inverted condition around coherent memory, and fix the bug that it was
  hiding (also need MAP_COHERENT_BIT on storage flags!)
* Move buffer target with raw buffer inside the Option in native::Memory
* Hard-code buffer alignment of 4
* Actually go ahead and add cpu-visible non-cached coherent memory type, since
  it will now work by just adding it to the list!
* Simplify using buffer::Usage flags a bit
@kyren
Copy link
Contributor Author

kyren commented Jun 20, 2019

Okay all of your new comments are addressed. Based on the discussion in gitter (I think I have this right?) I added a new memory type (set) that is cpu-visible, non-cached, and coherent, so there are now also non-cached memory types available (non-cached implies coherent, right?).

@kvark
Copy link
Member

kvark commented Jun 20, 2019 via email

bors bot added a commit that referenced this pull request Jun 20, 2019
2822: Rework of gl memory / buffer handling r=kvark a=kyren

Allocates native buffers to back memory as soon as allocate_memory is called,
hal buffers become logical sub-ranges of native buffers.

This is still WIP, there is a lot of room for improvement here.  Separate INDEX
memory could be reserved only for webgl, there could be separate upload /
download memory, and the end of buffer sub-ranges could be checked with asserts
in debug mode.

Fixes #2812
PR checklist:
- [x] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends: gl
- [ ] `rustfmt` run on changed code


Co-authored-by: kyren <kerriganw@gmail.com>
@bors
Copy link
Contributor

bors bot commented Jun 20, 2019

Build succeeded

@bors bors bot merged commit 01807ec into gfx-rs:master Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GL panics with "No buffer has been bound yet, can't map memory!"
3 participants