Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port over the Exynos cacheline size fix from Dolphin. #8965

Merged
merged 1 commit into from
Sep 10, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 31 additions & 15 deletions Common/Arm64Emitter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -313,24 +313,40 @@ void ARM64XEmitter::FlushIcache()

void ARM64XEmitter::FlushIcacheSection(u8* start, u8* end)
{
if (cpu_info.sBugs.bExynos8890Invalidation)
{
// Over invalidate to force this CPU to listen.
start = m_startcode + 4096 < start ? start - 4096 : m_startcode;
end += 4096;
}

#if defined(IOS)
// Header file says this is equivalent to: sys_icache_invalidate(start, end - start);
sys_cache_control(kCacheFunctionPrepareForExecution, start, end - start);
#else
#if (defined(__clang__) && !defined(_M_IX86) && !defined(_M_X64)) || defined(ANDROID)
__clear_cache(start, end);
#else
#if !defined(_M_IX86) && !defined(_M_X64)
__builtin___clear_cache(start, end);
#endif
#endif
#elif !defined(_M_IX86) && !defined(_M_X64)
// Code from Dolphin, contributed by the Mono project.

// Don't rely on GCC's __clear_cache implementation, as it caches
// icache/dcache cache line sizes, that can vary between cores on
// big.LITTLE architectures.
u64 addr, ctr_el0;
static size_t icache_line_size = 0xffff, dcache_line_size = 0xffff;
size_t isize, dsize;

__asm__ volatile("mrs %0, ctr_el0" : "=r"(ctr_el0));
isize = 4 << ((ctr_el0 >> 0) & 0xf);
dsize = 4 << ((ctr_el0 >> 16) & 0xf);

// use the global minimum cache line size
icache_line_size = isize = icache_line_size < isize ? icache_line_size : isize;
dcache_line_size = dsize = dcache_line_size < dsize ? dcache_line_size : dsize;
Copy link
Collaborator

@unknownbrackets unknownbrackets Sep 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if we're not using the line size we just read, then this can mean a race condition, right?

For example, let's say FlushIcacheSection() is called on BIG cpu, possibly several times. When it swaps to the LITTLE cpu:

  1. If it's before we check the line size, it's fine, we'll get the right one.
  2. If it's after we invalidate, then hopefully it's handled correctly.
  3. If it's while we're invalidating, we're screwed if it doesn't handle the swap correctly. We still have the BIG size (having never been exposed to the LITTLE core.)

If it's necessary to take the lowest, then we must need to re-read the line size for every iteration of the loop. It seems to me we shouldn't cache at all?

-[Unknown]

Copy link

@lewurm lewurm Sep 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is correct (it's unlikely though). Even if you would read it on every loop iteration, there would be still a race, because the loop iteration itself isn't atomic.

The right way would be to iterate over all available cores on startup and determine the minimum cache line size. I couldn't figure out a reliable way to do that on Android. Or, since you already have infrastructure for device specific quirks, you could hard code it...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... since we save the minimum, and a core jump will wipe the icache anyway, case 3 can't possibly be an issue?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, good point! maybe we don't need to try to figure out the minimum cache line size after all then.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lewurm yeah. I removed it from our copy in a followup commit:
0b8a3e8

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, reverted it. Was convinced by some comments in the HN thread - in case the cores are used simultaneously it might still be an issue.


addr = (u64)start & ~(u64)(dsize - 1);
for (; addr < (u64)end; addr += dsize)
// use "civac" instead of "cvau", as this is the suggested workaround for
// Cortex-A53 errata 819472, 826319, 827319 and 824069.
__asm__ volatile("dc civac, %0" : : "r"(addr) : "memory");
__asm__ volatile("dsb ish" : : : "memory");

addr = (u64)start & ~(u64)(isize - 1);
for (; addr < (u64)end; addr += isize)
__asm__ volatile("ic ivau, %0" : : "r"(addr) : "memory");

__asm__ volatile("dsb ish" : : : "memory");
__asm__ volatile("isb" : : : "memory");
#endif
}

Expand Down
15 changes: 8 additions & 7 deletions Common/CPUDetect.h
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,15 @@ struct CPUInfo {
bool bXBurst1;
bool bXBurst2;

// Bugs
// Quirks
struct {
// Samsung Galaxy S7 devices (Exynos 8890) have a bug that causes invalidation to work incorrectly.
// This may be caused by interaction between the separate CPU cores.
// Padding jit blocks and over-invalidating seems to "solve" it.
// Only affects ARM64.
bool bExynos8890Invalidation;
} sBugs;
// Samsung Galaxy S7 devices (Exynos 8890) have a big.LITTLE configuration where the cacheline size differs between big and LITTLE.
// GCC's cache clearing function would detect the cacheline size on one and keep it for later. When clearing
// with the wrong cacheline size on the other, that's an issue. In case we want to do something different in this
// situation in the future, let's keep this as a quirk, but our current code won't detect it reliably
// if it happens on new archs. We now use better clearing code on ARM64 that doesn't have this issue.
bool bExynos8890DifferingCachelineSizes;
} sQuirks;

// Call Detect()
explicit CPUInfo();
Expand Down
8 changes: 0 additions & 8 deletions Core/MIPS/ARM64/Arm64Jit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -333,14 +333,6 @@ const u8 *Arm64Jit::DoJit(u32 em_address, JitBlock *b) {
if (dontLogBlocks > 0)
dontLogBlocks--;

if (cpu_info.sBugs.bExynos8890Invalidation) {
// What a waste. If we don't do both this and over-invalidate, the device crashes.
// This space won't ever get run, but it's wasted jit cache space.
for (int i = 0; i < 32; ++i) {
HINT(HINT_NOP);
}
}

// Don't forget to zap the newly written instructions in the instruction cache!
FlushIcache();

Expand Down
8 changes: 0 additions & 8 deletions GPU/Common/VertexDecoderArm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -278,14 +278,6 @@ JittedVertexDecoder VertexDecoderJitCache::Compile(const VertexDecoder &dec, int

RET();

if (cpu_info.sBugs.bExynos8890Invalidation) {
// Apparently the vertex cache hasn't been the problem, but adding this here for the same
// reasons as the standard jit.
for (int i = 0; i < 32; ++i) {
HINT(HINT_NOP);
}
}

FlushIcache();

if (log) {
Expand Down
2 changes: 1 addition & 1 deletion android/jni/app-android.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ extern "C" void Java_org_ppsspp_ppsspp_NativeApp_init
// Unfortunately, on the Samsung Galaxy S7, this isn't in /proc/cpuinfo.
// We also can't read it from __system_property_get.
if (buildBoard == "universal8890") {
cpu_info.sBugs.bExynos8890Invalidation = true;
cpu_info.sQuirks.bExynos8890DifferingCachelineSizes = true;
}

NativeGetAppInfo(&app_name, &app_nice_name, &landscape, &version);
Expand Down