Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clingo/gringo/clasp: incorrect behavior on Android 11+ #475

Closed
mbalduccini opened this issue Dec 15, 2023 · 10 comments
Closed

Clingo/gringo/clasp: incorrect behavior on Android 11+ #475

mbalduccini opened this issue Dec 15, 2023 · 10 comments
Assignees
Milestone

Comments

@mbalduccini
Copy link

mbalduccini commented Dec 15, 2023

Problem description
Clingo/gringo/clasp do not work properly on Android 11+. Grounding produces no aspif rules and solving produces empty answer sets even when given non-empty aspif files.
The issue is due clingo's use of some memory pointers in a way that is incompatible with ARM Memory Tagging Extension (MTE) support. Details can be found at https://source.android.com/docs/security/test/tagged-pointers.

This affects, for instance, clasp's StrRef (file clasp/src/shared_context.cpp). While StrRef's modification to bit 63 of the allocated pointer is only temporary and does not flow back to the call to free(), StrRef assumes that bit 63 of the pointer returned by malloc() is always 0. That is not guaranteed under Android 11+ and in fact the bit is often set to 1. As a result, clasp gets confused as to what type of data structure is stored in that memory area and ends up displaying empty answer sets. Interestingly, the underlying computation is correct to the best of my knowledge and only the displaying of the answer sets is affected. I have tested this by writing a small test C++ program that calls solve() and handles on its own the task of displaying the answer sets.

Note that this issue is not limited to Android. In principle, it may affect any ARM-based device whose OS adopts MTE.

Symptoms
The tests below are run in a termux terminal running on Galaxy Z Fold5, Galaxy S10e and Galaxy Tab S7 with the same results.
Clingo was compiled with clang 17.0.6, the stock compiler in termux. So far, I have been unable to build clingo with gcc on Android.

gringo issue

% echo "p." | build/bin/gringo
asp 1 0 0
0

clingo/clasp issue

% echo "p." | build/bin/clingo
clingo version 5.6.2
Reading from stdin
Solving...
Answer: 1

SATISFIABLE

Models       : 1
Calls        : 1
Time         : 0.001s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 0.001s

Possible solutions
A few possible solutions exist, but none is entirely pain-free.

Solution 1 (Tested)
Use a malloc() replacement such as dmalloc:

  1. Download and install dmalloc (https://dmalloc.com)
  2. Add /path/to/dmalloc/libdmalloc.a to the linker flags when you configure clingo, i.e.:
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS="/path/to/dmalloc/libdmalloc.a" -DCMAKE_EXE_LINKER_FLAGS_RELEASE="/path/to/dmalloc/libdmalloc.a"
cmake --build build
  1. Any application that uses a clingo shared object will need to also be linked against libdmalloc.a

Solution 2 (Tested)
Compile clingo statically against the static version of libc found here: https://android.googlesource.com/toolchain/prebuilts/ndk/r17/+/refs/heads/main/sysroot/usr/lib/aarch64-linux-android
Needless to say, this solution does not produce shared objects.
The steps are similar to Solution 1. I turned off python and lua and I am not sure this solution supports them.

cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release -DCLINGO_BUILD_WITH_PYTHON=Off -DCLINGO_BUILD_WITH_LUA=Off -DCLINGO_BUILD_WITH_LUA=Off -DCLINGO_BUILD_SHARED=Off -DCLASP_BUILD_WITH_THREADS=Off -DCMAKE_VERBOSE_MAKEFILE=On -DCMAKE_BUILD_TYPE=release -DCMAKE_EXE_LINKER_FLAGS="-static -L/path/to/ndk/libc/" -DCMAKE_EXE_LINKER_FLAGS_RELEASE="-static -L/path/to/ndk/libc/"
cmake --build build

Solution 3 (Not tested)
Building a 32-bit version of clingo should eliminate the problem. See here for a discussion: termux/termux-packages#7332

Solution 4 (Not tested)
Another solution provided at termux/termux-packages#7332 relies on disabling tagged pointers via code. I hypothesize that clingo/gringo/clasp could incorporate the code given in this post: termux/termux-packages#7332 (comment).
Possibly a headache for the Potassco team since it introduces architecture/OS-dependent code.

Solution 5 (Not tested)
A more seamless solution for users -- but more work for the Potassco team! ;) -- is to modify clingo/gringo/clasp and move all tags to the second byte.
Discussion: swiftlang/swift#40779

@rkaminsk
Copy link
Member

Just had a quick look. To me it seems that we should actually aim for solution 5. @BenKaufmann, can't we simply use bit 0 instead of 63 here? The bit should be guaranteed to be zero because of alignment. Or am I missing something with the string literals?

@mbalduccini
Copy link
Author

mbalduccini commented Dec 15, 2023

To be clear: StrRef is not the only location where Android's memory handling gets in the way. To the best of my knowledge, it is the only location in clasp, but there is something in gringo as well. While I haven't been able to identify the exact spot, statement auto it = atoms_.find(repr.eval(undefined, log)); in libgringo/gringo/domain.hh:486 yields incorrect results (it returns 0) on Android for program p. leading to no ground rules being emitted. And there may be a different issue leading to the lack of #show statements, I just haven't had the time to investigate. However, all of these issues disappear if the Android/bionic malloc() is replaced by a different one whose pointers have the top byte set to 0.
(I am happy to share the notes I took while tracing gringo's execution through atoms_.find() if it helps.)
To note: by enabling various diagnostic output statements in gringo and tracing through it, I came to the conclusion that, just like clasp, gringo computes the grounding correctly and only fails to emit it.

@BenKaufmann
Copy link
Contributor

Just had a quick look. To me it seems that we should actually aim for solution 5. @BenKaufmann, can't we simply use bit 0 instead of 63 here? The bit should be guaranteed to be zero because of alignment. Or am I missing something with the string literals?

No, the way that ConstString is designed, we can't use the lsb. This is because the idea is that it can be constructed from arbitrary string literals (without copying) and hence with pointers having arbitrary alignment.
E.g.

const char* foo = "12345";
auto ok  = ConstString::fromLiteral(foo); // lsb not set
auto bad = ConstString::fromLiteral(foo + 1); // lsb set

The current implementation is indeed broken and not following good practice (see e.g. https://muxup.com/2023q4/storing-data-in-pointers).
I currently don't have the time to deep dive into ConstString but from a quick glance it looks like:

  • fromLiteral() is not used at all
  • ConstString is also not otherwise constructed with Ownership_t::Retain.

Hence, maybe this could be turned into a non-issue by just dropping the (unused?) no-copy optimization.

@rkaminsk
Copy link
Member

To be clear: StrRef is not the only location where Android's memory handling gets in the way. To the best of my knowledge, it is the only location in clasp, but there is something in gringo as well. While I haven't been able to identify the exact spot, statement auto it = atoms_.find(repr.eval(undefined, log)); in libgringo/gringo/domain.hh:486 yields incorrect results (it returns 0) on Android for program p. leading to no ground rules being emitted. And there may be a different issue leading to the lack of #show statements, I just haven't had the time to investigate. However, all of these issues disappear if the Android/bionic malloc() is replaced by a different one whose pointers have the top byte set to 0. (I am happy to share the notes I took while tracing gringo's execution through atoms_.find() if it helps.) To note: by enabling various diagnostic output statements in gringo and tracing through it, I came to the conclusion that, just like clasp, gringo computes the grounding correctly and only fails to emit it.

I had a quick look, clingo indeed uses the upper bits for symbols. The good news is that this is going to change in some future release. The bad news is that this is going to take a while.

@rkaminsk
Copy link
Member

Solution 1 (Tested) Use a malloc() replacement such as dmalloc:

1. Download and install dmalloc (https://dmalloc.com)
2. Add `/path/to/dmalloc/libdmalloc.a` to the linker flags when you configure clingo, i.e.:
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS="/path/to/dmalloc/libdmalloc.a" -DCMAKE_EXE_LINKER_FLAGS_RELEASE="/path/to/dmalloc/libdmalloc.a"
cmake --build build
3. Any application that uses a clingo shared object will need to also be linked against `libdmalloc.a`

Are you sure about point 3. The clingo library manages it's memory completely by itself. It should be safe to use the shared object together with other libraries relying on different malloc implementations.

@mbalduccini
Copy link
Author

Solution 1 (Tested) Use a malloc() replacement such as dmalloc:
3. Any application that uses a clingo shared object will need to also be linked against libdmalloc.a

Are you sure about point 3. The clingo library manages it's memory completely by itself. It should be safe to use the shared object together with other libraries relying on different malloc implementations.

I am. Here is a minimal example:

#include <iostream>
#include <clingo.hh>

using namespace Clingo;

int main(int argc, char * const argv[]) {
	try {
		Logger logger = [](Clingo::WarningCode, char const *message) {
			std::cerr << message << std::endl;
		};
		Control ctl{{argv+1, size_t(argc-1)}, logger, 20};

		std::cout << "Ground" << std::endl;
		ctl.add("base", {}, "p.");
		ctl.ground({{"base", {}}});

		std::cout << "Solve" << std::endl;
		clingo_propagator_t prop = {
		  NULL,
		  NULL,
		  NULL,
		  NULL,
		  NULL,
		};
		if (!clingo_control_register_propagator(ctl.to_c(), &prop, NULL, false))
			throw std::runtime_error("could not register C propagator");

		ctl.solve();
	}
	catch (std::exception const &e) {
		std::cerr << "example failed with: " << e.what() << std::endl;
	}
}

Linked with dmalloc(), everything works fine.
Linked without dmalloc(), it terminates with the error that is typical of tagged pointer issues on Android:

Ground
Solve
Pointer tag for 0x9400007100418440 was truncated, see 'https://source.android.com/devices/tech/debug/tagged-pointers'.
Abort

What seems to trigger the problem is the registration of the propagator, or some code that is executed because of it. If you remove the registration, the error disappears. (I created a NULL propagator to keep the program small, but the error occurs even if you provide pointers to valid callback functions.)

@rkaminsk
Copy link
Member

I am not sure what is happening here. Maybe further options are required when building the libclingo.so library.

In any case, I can promise that these issues will go away with future releases. Tagging the upper bits of pointers was not a good idea in the first place. There are more and more extensions that increase the virtual address space. We have one machine that supports 57bit virtual addresses (because of some Intel extension).

@mbalduccini
Copy link
Author

Great! I am happy to run tests on Android on beta versions, just keep me in the loop.
Incidentally, is there a tentative date for the next release?

@rkaminsk
Copy link
Member

Great! I am happy to run tests on Android on beta versions, just keep me in the loop. Incidentally, is there a tentative date for the next release?

In the beginning of next year. The wip branch should be quite stable by now. There won't be any patches regarding MTE, however.

@rkaminsk rkaminsk added this to the v5.7.0 milestone Jan 10, 2024
@rkaminsk rkaminsk modified the milestones: v5.7.0, unknown Apr 22, 2024
@rkaminsk
Copy link
Member

Closing this. Future clingo versions won't touch the high bits of pointers anymore. There will still be some time before releases however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants