Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix message sending of tuples sent through interfaces #1614

Merged
merged 1 commit into from
Mar 17, 2017

Conversation

Praetonus
Copy link
Member

This change fixes a case overlooked in fa8f7ad which occurred when sending a tuple containing a boxed type.

This change also adds tests for code generation of GC tracing.

PS: No changelog entry since this is covered by the one from fa8f7ad.

@SeanTAllen
Copy link
Member

@Praetonus how are the 2 changes related?

@Praetonus
Copy link
Member Author

The fix relates to how the tracing is generated.

@Praetonus Praetonus force-pushed the gentrace-tests branch 2 times, most recently from ceba3ca to c3b2d7a Compare February 24, 2017 19:24
@Praetonus
Copy link
Member Author

The compiler test suite seems to crash when running the TraceObjectDynamic test on AppVeyor. There isn't any log, would anyone on Windows be able to run the test and see if there is more information?

@chalcolith
Copy link
Member

I'll take a look.

@chalcolith
Copy link
Member

It's crashing for me in CodegenTraceTest.TraceNumberBoxed, in libponyrt/mem/pool.c:436 in a send_msg() call. ponyint_virt_alloc() is returning null. GetLastError() returns 1455: The paging file is too small for this operation to complete.

I suspect there's a memory leak somewhere, since VS2015 fills up my disk and crashes when trying to create a dump file :-)

I'm not yet very familiar with the pony runtime, so it will take me a while to try to figure out what all the threads are doing.

@Praetonus
Copy link
Member Author

@kulibali That's strange, I don't really see what could cause a memory leak in these tests, and why it would only occur on Windows. Could you try running the tests with a memory profiler, to see where all that memory is allocated?

@Praetonus
Copy link
Member Author

I think I know what's going on. The memory allocated with ponyint_virt_alloc is allocated per thread and is never freed. In a real pony program this is fine since each thread lives until the end of the program and everything is reclaimed when the program terminates. But in the JIT tests, the runtime is started and stopped for every test which results in new threads being created. Then, every new thread will call ponyint_virt_alloc at least once and the memory usage will keep growing.

I can see three different ways of fixing this problem

  • Use ponyint_virt_alloc at the global level instead of the thread level. There would be some overhead from synchronisation. I'm unsure how much, but it probably won't be small.
  • Reclaim allocated memory at thread termination. Since memory allocated by one thread can be accessed and even freed by other threads, this isn't an easy problem. We'll have to have some kind of "ownership marker" for each block allocated by ponyint_virt_alloc. I think that'd mean some overhead on ponyint_pool_free to set the owning mark for the current thread and block. It would most likely be an atomic operation so there would be additional overhead from the synchronisation. Owning marks would be cleared at thread termination and the last thread clearing its mark would free the block. We'd also have to keep track of allocated memory (i.e. by ponyint_pool_alloc) and I'm not sure how to do that efficiently.
  • Manage threads carefully. If we can make sure that memory allocated by the runtime never escapes the thread group of the runtime then we can use the above method without ownership marking. Since the thread that started the runtime isn't internal to the runtime and the pool allocator is used outside of the runtime in the test suite, this solution would need a careful audit.

Anybody have any thoughts on that?

@dipinhora
Copy link
Contributor

@Praetonus I might not be understanding the runtime correctly, but wouldn't it be possible to free all memory allocated by ponyint_virt_alloc (regardless of which thread did the allocation) as part of ponyint_sched_shutdown after all threads have been joined? Or is this the same as your third suggestion and I'm just not fully understanding things? At that point, there's no need to worry about multiple threads and their related complexities and there's no need for any ownership markers because there's only 1 thread left and it owns everything.

@Praetonus
Copy link
Member Author

@dipinhora That's roughly the same thing as my third suggestion, except that I was thinking about freeing memory before joining. This is because most of the allocation bookkeeping data is thread local and destroyed at thread exit, and setting up a global data structure to free everything from one thread is equivalent to each thread freeing the memory it allocated.

I'm currently doing some tests and it seems that 3. is the best alternative.

@dipinhora
Copy link
Contributor

@Praetonus Makes sense. Thanks for the explanation.

@Praetonus
Copy link
Member Author

It turns out it's not exactly equivalent. The memory cannot be returned to the OS before the thread joining, because some things containing shared memory (e.g. the message queues) are accessed after joining all threads. We'll have to free the memory in the main thread after joining.

@Praetonus
Copy link
Member Author

I've made a PR (#1621) which I think should solve the problem. In the end I took a completely different approach and added the ability for runtime threads to push their free memory to other threads when they terminate.

@Praetonus
Copy link
Member Author

I've rebased on latest master to pick up the changes merged from #1621.

@jemc
Copy link
Member

jemc commented Mar 5, 2017

Not sure what's wrong with the windows CI.

@sylvanc
Copy link
Contributor

sylvanc commented Mar 6, 2017

CodegenTraceTest.TraceTupleDynamic is crashing on Windows.

@Praetonus
Copy link
Member Author

It looks like #1621 didn't fix the whole problem (same thing for #1629, I think). I'll look into it.

@dipinhora
Copy link
Contributor

@Praetonus There's two possible fixes to this:

  • First, which is what I think you were trying with Push local pool memory to a global list at thread termination #1621, would be to explicitly free all memory upon termination. However, I think Push local pool memory to a global list at thread termination #1621 was only half the solution. In order to fully free the memory, there would need to be calls to ponyint_virt_free to free the memory allocated via ponyint_virt_alloc after all the threads have been joined.

  • Second, would be to change the windows implementation of ponyint_virt_alloc to not MEM_COMMIT and to instead make additional calls to VirtualAlloc to commit memory on an as needed basis as part of the pool allocation logic. The following comment from https://msdn.microsoft.com/en-us/library/windows/desktop/aa366803(v=vs.85).aspx explains the current issue you're seeing (emphasis added):

    As an alternative to dynamic allocation, the process can simply commit the entire region instead of only reserving it. Both methods result in the same physical memory usage because committed pages do not consume any physical storage until they are first accessed. The advantage of dynamic allocation is that it minimizes the total number of committed pages on the system. For very large allocations, pre-committing an entire allocation can cause the system to run out of committable pages, resulting in virtual memory allocation failures.

    With the change to not using dynamic allocation, the processes wouldn't run out of virtual memory because the entire memory wouldn't be committed immediately (or likely ever for most programs) and the behavior of memory allocation would be similar to the posix based platforms using mmap which only allocate pages on demand.

@Praetonus
Copy link
Member Author

My intent with #1621 was only to make memory managed by the pool allocator available to future threads. Returning memory to the OS is tricky because memory in free lists can move between threads.

Your second solution seems interesting but I think it would require some work to implement.

However, I'm not sure it will be necessary. With the change in #1621, threads should reuse blocks allocated by previous threads and there shouldn't be any virtual allocation after the first run of the runtime. There probably is a bug in the implementation of #1621. I'll look into it.

@dipinhora
Copy link
Contributor

@Praetonus Ah, I see. I missed that bit as I was focusing more on the freeing of memory during the pony_unregister_thread prior to the cleanup function. Returning the memory to Windows doesn't seem to be as tricky using VirtualFree as long as we know the original memory allocation address and size of the call to VirtualAlloc (which we don't currently keep track of so it would need to be tracked in order to ensure it can be used for VirtualFree). The documentation states that VirtualFree can free the full range originally allocated by VirtualAlloc regardless of the state of individual pages within the range (i.e. committed/reserved and used or not) so it wouldn't matter which thread the free block is part of because once it is freed, it is safe to forget all the blocks and start from scratch again.

It seems after #1621, ponyint_pool_thread_cleanup stores the blocks in pool_global[index] according to https://github.com/ponylang/ponyc/blob/master/src/libponyrt/mem/pool.c#L949 but the next time pool_global is initialized when gen_jit_and_run is called to start the pony program again, it is set to default values which would override whatever was saved from the previous run during cleanup according to https://github.com/ponylang/ponyc/blob/master/src/libponyrt/mem/pool.c#L107-L125.

I'm not sure of the solution, but this might be the bug preventing reuse of blocks allocated by previous threads.

dipinhora added a commit to dipinhora/ponyc that referenced this pull request Mar 9, 2017
This commit changes how the Windows memory allocation/commitment
is done by switching from committing all memory immediately
to only committing memory dynamically as it gets allocated.

This resolves the Windows errors that PR ponylang#1614 and PR ponylang#1629 have
been encountering related to `ponyint_virt_alloc` and `The paging
file is too small for this operation to complete.`
@SeanTAllen SeanTAllen force-pushed the master branch 3 times, most recently from 4ccafba to 362d1d0 Compare March 11, 2017 13:53
@SeanTAllen SeanTAllen force-pushed the master branch 3 times, most recently from 97b1943 to 21d37aa Compare March 11, 2017 14:51
@Praetonus Praetonus added the do not merge This PR should not be merged at this time label Mar 14, 2017
@Praetonus
Copy link
Member Author

I suspect there is a data race in pool_block_pull. I've added a mutex around the two concurrent pool_block_* functions to verify that assumption. If it is indeed a data race, I'll fix it without the mutex.

@dipinhora pool_global is a static variable, which means it is initialised only once at program startup. We'd have the bug you describe if pool_global was declared with _Thread_local storage.

@dipinhora
Copy link
Contributor

dipinhora commented Mar 14, 2017

@Praetonus Thanks for the clarification. I was confused because gen_jit_and_run calls gen_main which then generates a call to pony_init to initialize the runtime and I wasn't sure if that would in turn cause pool_global to be re-initialized or not.

@Praetonus
Copy link
Member Author

I think I found the problem. I'll open another PR with the fix.

@Praetonus
Copy link
Member Author

I've pushed the fix here to ensure that it is correct. If it is, I'll open the other PR.

@Praetonus
Copy link
Member Author

The tests pass on Windows, I've cancelled the Linux and OSX builds to avoid overloading Travis since it has been quite slow lately.

This change fixes a case overlooked in fa8f7ad which occurred when
sending a tuple containing a boxed type.

This change also adds tests for code generation of GC tracing.
@Praetonus Praetonus removed the do not merge This PR should not be merged at this time label Mar 17, 2017
@Praetonus
Copy link
Member Author

I've rebased on latest master after the merge of #1677. This should be good to merge if CI passes.

@SeanTAllen SeanTAllen merged commit de947cc into ponylang:master Mar 17, 2017
@Praetonus Praetonus deleted the gentrace-tests branch March 17, 2017 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants