Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically round CnC file size to increment #387

Closed
bcaimano opened this issue Aug 17, 2017 · 14 comments
Closed

Automatically round CnC file size to increment #387

bcaimano opened this issue Aug 17, 2017 · 14 comments

Comments

@bcaimano
Copy link

As is, the CnC file size seems to be determined by the individual sizes of four buffers and its header. Unfortunately, my version of hugetblfs only seems to allow passing truncate sizes that are multiples of the hugepage size (in my case, 2MB). This causes Aeron to be unable to construct its cnc file on a hugetblfs mount. I would expect that this condition is not unique to hugetblfs but might be seen on other less common file systems. I would like the option to round the CnC file size to a specific increment, hopefully through the same system properties system as other values like the buffer sizes mentioned above.

@mjpt777
Copy link
Contributor

mjpt777 commented Aug 25, 2017

This also needs to happen for log buffers. A log is 3 times term length plus a trailer which is a 4k page in size.

@bcaimano
Copy link
Author

bcaimano commented Sep 6, 2017

I did a little investigation of my own (sorry for the silence).

The log/term length seems a tricky thing to change. In several places, either log length is determined from term length or term length is determined from log length. It seems like it might make sense to start storing the term length in the log. The math becomes then much easier: you can determine the the log length from the term length and round that to the increment. To get the term length from a log, you just read it from the meta information.

The Loss buffer is also an independent file and thus potentially out of increment, however it can be set manually so people can figure that out.

No idea if my thoughts here align with the Aeron design philosophy.

@tmontgomery
Copy link
Contributor

@bcaimano yes, you are correct. The layout will need to change to account for padding. One option is to simply pad out and mark the real length as the last 8 bytes in the file. That would preserve all the previous logic. But it is ugly.

We want to do several improvements in this area. Such as refcnt logbuffers, etc. So, sorry for the silence, we've just been busy.

mjpt777 added a commit that referenced this issue Sep 28, 2017
tmontgomery added a commit that referenced this issue Sep 28, 2017
tmontgomery added a commit that referenced this issue Sep 28, 2017
tmontgomery added a commit that referenced this issue Sep 29, 2017
…. Enforce filePageSize on logbuffers. Move metadata to end of file. For logbuffers requiring multiple mappings, make sure that all mappings are on offsets of filePageSize. Read term length and page size from metadata before mapping on clients.
tmontgomery added a commit that referenced this issue Sep 30, 2017
…. Unmap if termLength (or pageSize) is invalid. Modify alignment for 1GB termLength. Add validation of pageSize to be between 4KB and 1GB and a power of 2. Align logmetadata length to min page size of 4KB.
tmontgomery added a commit that referenced this issue Oct 1, 2017
…ew alignment and termLength field in logbuffers.
tmontgomery added a commit that referenced this issue Oct 2, 2017
@bcaimano
Copy link
Author

bcaimano commented Oct 3, 2017

I've done a quick test of the Java driver on 3e36a0c with our usual config and a hugepages mount. It seems that truncate is failing on a size param of 8,392,704 (8 * 1,024 * 1,024 + 4,096). My quickest guess is that the metadata is getting added onto the end of the file after the size is determined.

@tmontgomery
Copy link
Contributor

tmontgomery commented Oct 3, 2017

@bcaimano did you set aeron.file.page.size to the hugepages page size?

Edit: For example,

$ java -cp aeron-all/build/libs/aeron-all-1.5.0-SNAPSHOT.jar -Daeron.file.page.size=2097152 io.aeron.driver.MediaDriver

Produces logbuffers that are multiples of 2MB in length. In this case, the logbuffers are:

$ ls -l /var/folders/_p/pkchjzk95vg057my92f1drp00000gn/T/aeron-tmont/publications/
total 102400
-rw-r--r--  1 tmont  staff  52428800 Oct  3 14:45 UDP-00000000-0-7f000001-40123-a922db5-a-1.logbuffer

@bcaimano
Copy link
Author

bcaimano commented Oct 4, 2017

@tmontgomery, I'll give it a try tomorrow. (Apologies, I realized about an hour ago that I was probably missing something exactly like this.)

@tmontgomery
Copy link
Contributor

@bcaimano BTW, be sure to place the -D arg before the MediaDriver class name. Java will think it is a MediaDriver program arg if not. ;)

@bcaimano
Copy link
Author

bcaimano commented Oct 4, 2017

Forming robust command line schemas is a lost art! :)

@bcaimano
Copy link
Author

bcaimano commented Oct 4, 2017

I was able to successfully start up a single server system sitting on top of hugetlbfs backed Aeron today. I'll try to verify for full multi-server tomorrow.

@bcaimano
Copy link
Author

bcaimano commented Oct 6, 2017

Okay, I've verified our full set up. Looks good.

There was one interesting semi-bug on our end due to a very low media driver timeout (10ms). It appears that while general operation is perfectly satisfied with that timeout, the initial driver connection may not be. I had to increase timeout to 1s to avoid the failure. I suspect that opening a hugepage-backed CnC file as a client provokes a massive page fault/mmap sync operation that stalls the initial connection. It ended up being a trivial fix wherein we mutate the media driver timeout after connection. I found it interesting at least.

@bcaimano
Copy link
Author

bcaimano commented Oct 6, 2017

Assuming you two don't want us to run specific tests to check for implementation hiccups or the like, I think this ticket can be closed.

@tmontgomery
Copy link
Contributor

@bcaimano you mean the timeout you intentionally set very low, correct? Not that it was too low in the client code.

@bcaimano
Copy link
Author

Yes, that is correct. I believe @sthornington wanted to use it as a canary for data disruption.

@tmontgomery
Copy link
Contributor

Closing this as I believe it is done now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants