-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.17 beta crash mystery thread #18705
Comments
There are also a couple of shutdown hangs. Believe I've solved one already related to ManagedTexture, but here's one where EmuThread is stuck somewhere in __NetShutdown, unfortunately the stack is missing some detail:
There are a number of threads that are joined by various functions called by __NetShutdown, seems one of them is stuck. It seems to be the upnp thread:
|
And here's another one where it appears stuck in vsnprintf, or perhaps more likely the exception is getting triggered over and over:
The snprintf is just this one: line 331
} else if (index_pre) {
snprintf(instr->text, sizeof(instr->text), "%s%s%s %c%d, [x%d, #%d]!", opname[opc], signExt, sizeSuffix[size], r, Rt, Rn, SignExtend9(imm9)); which is from a loadstore, which checks out. |
Additional hang, DrainAndBlockCompileQueue vs CompileThread seem to have a possible deadlock:
|
Hmm.. the connection to the router might be stalled or have problem, thus it waited until timeout (i set the default timeout to 2000 ms as there are slow routers that need at least 1 second to be detected). |
Yeah, likely it's some kind of one-off - I only see a single report of this. I tried to see if I could find a path where there would be more than 1 timeout between two checks of the thread-exit variable, but couldn't find such a path, so not convinced there's anything we can do about it... |
Beta 2 from now on. The shutdown race condition still doesn't seem completely cured, and I got a shutdown hang I haven't seen before:
Some pool worker stacks:
Weird stuff, almost like there's a hang in the memory allocator (jemalloc) ? Or it's just stuck performing the same thing over and over somehow.. |
Another thread hang, interesting:
vs
I think this will be fixed by one of my upcoming changes. |
Report from beta 1:
Also beta 1:
|
beta 3:
Maybe just slowness in scoped storage land, a work thread has the following top of a stack (but missing the rest):
|
This crashes here: PSPPointer<SceKernelVplBlock> SplitBlock(PSPPointer<SceKernelVplBlock> b, u32 allocBlocks) {
u32 prev = b.ptr;
b->sizeInBlocks -= allocBlocks;
b += b->sizeInBlocks;
b->sizeInBlocks = allocBlocks; // << CRASH HERE
b->next = prev;
return b;
} Suspicious... Probably the block header got corrupted. Another:
Crash here: for (TexCache::iterator iter = cache_.begin(), end = cache_.end(); iter != end; ++iter) {
if (iter->second->GetHashStatus() == TexCacheEntry::STATUS_RELIABLE) {
iter->second->SetHashStatus(TexCacheEntry::STATUS_HASHING);
}
iter->second->invalidHint++;
} |
Crash in libpng, not good. libpng17 seems unmaintained, no updates since 2017 :( This is on line
|
Maybe change other png library ? |
yes, I'm thinking of trying spng instead. |
Although, I now think it's really due to a data loading race condition in GameInfoCache... Ugh. I tried spng, and it's pretty nice, just lacking the ability to specify a byte stride in encode/decode. So will probably switch to it later anyway since it's faster, but not for the 1.17 series. |
Pretty sure I've solved the png loading crash now. Here's a savestate load-from-rewind problem, hm:
|
Now with the beta program, we can do these before the release instead of after! There's enough beta testers already (I think because I once enabled registration in the past without actually having any builds) that we get a usable amount of crash reports.
I've fixed a bunch of low hanging fruit, here come the tricky ones.
First, an oldie but a goodie assert, I really don't understand this one, it should not be possible for b.originalAddress to be null in FinalizeBlock. Though on the other hande, block number should be equal to b.num, no? weird.
Next, there's another oldie I've seen before but never figured out. Might simply be some kind of memory corruption, but I think we can add some checks.
This crashes here:
So I suppose evt->type might have gotten corrupted?
(A curiosity here is how the name of the function has survived from pre-open-source Dolphin, which I took the original timing system from.. there is no fifo :) )
The text was updated successfully, but these errors were encountered: