Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap corruption issue #1929

Closed
garygumdrops opened this issue Feb 7, 2020 · 18 comments
Closed

Heap corruption issue #1929

garygumdrops opened this issue Feb 7, 2020 · 18 comments
Labels
state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated

Comments

@garygumdrops
Copy link

Hi

I've seen some heap corruption on our embedded platform inside the parser function with large-ish strings (around 6k chars).

The platform is on an iMX.RT1050 using GCC and the newlib C library. We do use FreeRTOS as well but I've tried removing that. I saw the problem when I was using the newlib library for malloc/free and also seen it when using the heap allocation routines provided by FreeRTOS. I have been using the latter more recently because I have a modified version of heap4.c that includes the use of canaries around the allocated blocks to detect buffer overruns.

It's repeatable on my platform but hard to track where the actual problem is. With the modified heap allocation I can see the area of memory that is getting overwritten and when I trace back to where that memory is being free'd, it occurs in the destroy function of basic_json, specifically line 1041 (I am using version 3.7.3). This is a while loop emptying a stack. This area of the code may be a red herring, it's difficult to pinpoint where the issue is.

@garygumdrops
Copy link
Author

Just a quick update before I disappear for the weekend :-)

I commented out the entire contents of the destroy function and I don't see any issues even after lots of parsing. I do obviously see memory leaks though after each pass.

@nickaein
Copy link
Contributor

nickaein commented Feb 7, 2020

I imagine you are referring to this line:

current_item.m_value.object->clear();

Can you comment it out to see if does have any effect? Clearing a move vector is safe but I know sometimes the embedded compilers could have some quirks.

Another test would be to comment out the code from the start of destroy() function until switch (t) (line 1001 to 1046, which are basically the changes introduced in this PR #1436. You are fine without them if your object is not deeply nested) and see if you still experience heap corruption or memory leaks?

Also, are you sure that the program doesn't run out of memory? Note that memory requirement for a parsed JSON object can sometimes be higher than JSON contents itself (#1613).

@garygumdrops
Copy link
Author

garygumdrops commented Feb 8, 2020 via email

@garygumdrops
Copy link
Author

garygumdrops commented Feb 10, 2020 via email

@nickaein
Copy link
Contributor

There might be a memory corruption somewhere else in the program which remains undetected and causes crash sometime later during execution, e.g. when calling free()/delete.

Can you run the code ASAN (and TSAN if you suspect a race condition) to get more diagnostics? I'm not sure how well they support ARM architecture though, you might have to build and run the code on x86/64.

@garygumdrops
Copy link
Author

garygumdrops commented Feb 10, 2020 via email

@garygumdrops
Copy link
Author

So I've played around a bit more this morning and if leave the first half of the destroy function untouched but comment out lines 1048 to 1078 (the switch statement) then I have no heap corruption, but just memory leaks, which is expected.

My C++ knowledge is a bit weak regarding the use of those std::allocator_trait objects, is there maybe another way I could destroy the top-level t object? I guess the fact that the while loop is running with no problems, the flattened object created in the stack local variable (and current_item as it walks through the stack) must be all cleaned up nicely. The leaked memory is about 9K per string that I'm parsing, so it doesn't take long for the heap to start noticeably going down.

@gary-metalle-rvl
Copy link

I did further reduce the memory leaks by just commenting out lines 1053, 1061 and 1069 but leaving the deallocates in.

@nickaein
Copy link
Contributor

Using ASAN/TSAN on non-x86/64 platforms can indeed be very cumbersome, if not impossible. What we did was to run the code on x86 platform (and simulate all hardware dependencies) for testing. Also, building the program in debug mode and using gdb debugger can be helpful.

I don't think commenting out different parts of destroy() function can lead to something helpful as most of the code there is required otherwise you will have memory leaks. Even ignoring the memory leaks, the program still has memory corruption which can lead to incorrect behavior.

The memory corruption is either a bug in the program itself (buffer overrun, double-free, etc) or some side-effects like stack overflow. Here are a few examples of such bugs which the program crashed not at the problematic code, but after awhile during deconstruction/free:

https://stackoverflow.com/questions/11726746
https://stackoverflow.com/questions/3945167
https://stackoverflow.com/questions/16119762
https://stackoverflow.com/questions/27230890

@gary-metalle-rvl
Copy link

I'm 90% sure it's not a corruption issue with the main program because I can run it for days with no leaks or corruption. I have one major line of interest in a try/catch block:

           try
            {
                json object = json::parse(candidate); // candidate is an std::string

                if (callback)
                {
                    callback(object);
                }
            }
            catch (detail::parse_error& ex)
            {
                LOG_WARN("JSON parse error: %s.", ex.what());
            }

If I comment out the contents of the try block then I can run indefinitely, but if I uncomment the single line that parses a string into a json object (but leave out the code that calls the callback function that does the processing) then I get heap problems. So that's just by creating a json object from a string and not using it. There is plenty of stack and heap at the time the call to parse is called.

I'd be willing to accept it's something iffy with the compiler version I'm using, but I'm not sure how much work would be involved to try a newer release of GCC, especially given it might not be the problem.

@nickaein
Copy link
Contributor

You might then record the content of each candidates before parsing (e.g. dumping to a file or sending it over network) so that the bug can be reproduced easier.

@garygumdrops
Copy link
Author

garygumdrops commented Feb 12, 2020 via email

@stale
Copy link

stale bot commented Mar 13, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Mar 13, 2020
@stale stale bot closed this as completed Mar 20, 2020
@muellerryan
Copy link

@garygumdrops Did you manage to resolve your issue? I believe I am running into the same thing on an x64 build for Windows.

@garygumdrops
Copy link
Author

garygumdrops commented Feb 20, 2023 via email

@muellerryan
Copy link

Thank you for getting back to me on a 3 year old post, it's much appreciated.

After looking around, jsoncpp was looking to be my next option, too. Thanks for confirming my gut feeling.

@nlohmann
Copy link
Owner

Sorry that we could not help you. We never had any memory issues on other platforms, so it's hard to tell if this was a library issue caused by the infrastructure around it.

@muellerryan
Copy link

I can confirm that when utilizing jsoncpp in the same project, heap corruption is no longer an issue.

I do not consider myself an advanced programmer by any means, so the memory problem may still be lurking in my code, but until I see another corruption issue with jsoncpp, I will assume this is an issue in nlohman/json. If I encounter the issue later with jsoncpp, I will update this comment with my findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated
Projects
None yet
Development

No branches or pull requests

5 participants