Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RISC-V #1702

Open
rushi47 opened this issue Dec 18, 2021 · 37 comments
Open

Support for RISC-V #1702

rushi47 opened this issue Dec 18, 2021 · 37 comments
Assignees

Comments

@rushi47
Copy link

rushi47 commented Dec 18, 2021

Hello Team,

I am fairly new to CRIU, and am quite interested in porting it to RISCV by my own but I feel kind of lost as there are lot of touch points. And am unable to wrap my head around the code.

So would love to know if are there are any plans to support for RISC-V, would love to pitch in.

Also would appreciate any help, if someone can point me to any documentation.
I use this link : https://criu.org/Category:Under_the_hood but things are sorted by Alphabetical order and not the way they are used in code or would say I am unable to find any documentation on how CRIU works. So it's difficult to read the components and try to know where it's used in code.

Would love to contribute to this project, would very much appreciate any help.

Looking forward to it.

Thank you in advance.

Here am trying few things but could very wrong and naive, apologies am very new to all systems programming as well :
criu-dev...rushi47:mod_criu

@mihalicyn
Copy link
Member

mihalicyn commented Dec 20, 2021

@rushi47 awesome idea to port CRIU in RISC-V.

Please, tell me do you have RISC-V hardware, or do you want to port and test CRIU on the virtualized RISC-V environment?

First of all, I think you need to play with compel. Just forget about CRIU itself. Try to play with the examples in compel/test folder, try to run and understand the code. Then you need to make the compel work on the RISC-V arch.
The reason is simple - compel is the concentration point of the arch-specific code in CRIU. Because it needs to dump/restore tasks registers, make remote mappings, deal with signals, and so on and so forth. So if you get compel working it will be about 80% of all job.

Feel free to reach us.

Regards,
Alex

@mihalicyn
Copy link
Member

mihalicyn commented Dec 20, 2021

The 2nd step will be criu/pie folder. It contains two interesting pieces - parasite (used on the dump stage) and restorer (used on the restore stage). Both are just PIEs which compiled using compel and injected into victim processes to do some useful things. For instance, parasite is used on the dump to call prctl() syscalls to get some victim processes information. restorer is called when we need to restore, for instance, prctl-related things or signals, aio ring, etc.
The idea is simple - all actions about dump/restore which can be done externally we do from the CRIU processes, but if some syscall or action has to be done from the victim process context when we use parasite/restorer blobs.
The code in criu/pie is mostly generic but it has some arch-specific parts (see criu/arch directory).

@rushi47
Copy link
Author

rushi47 commented Dec 20, 2021

@mihalicyn Hello Alex :)
Thank you so much for detailed explanation and extending support.

To answer your question - Yes as of now i will test it on virtualised environment. But I do have provision to get FPGA with RISCV so we can test there as well once, it runs in virtual environment.

& Thanks for this tip let me try to play with examples in compel :). Yesterday night I was looking at the flow but I think this makes more sense to make test examples run first. I will update you, once I try to run this examples with findings I get.

@rushi47
Copy link
Author

rushi47 commented Dec 20, 2021

And would love to know if any other things I should be trying or reading organically with understanding to get this thing running on RISCV as soon as we can.

@mihalicyn
Copy link
Member

you are welcome :)

offtopic: Ah, I'm also thought about the RISC-V motherboard but it's sooo expensive (and mostly just not in stock) right now... :(

@nirousseau
Copy link

nirousseau commented Dec 26, 2021

I would like to join the effort too, I have a sifive unmatched board for testing if it may help.
I will try to build compel with modifications from aarch64.

I also quite newbie to system programming, but i will try to do my best.

Opened a MR for traceability and reviewing : #1713

@rushi47
Copy link
Author

rushi47 commented Dec 27, 2021

@nirousseau Hey thanks for joining the effort.
I had branch in my fork and had done some changes already.
But i didnt open the PR, do u mind joining this one ?MR : #1714
As I already have all the changes, in your PR along with modifications ?

@rushi47
Copy link
Author

rushi47 commented Dec 27, 2021

@mihalicyn Hello Alex,

So i was trying to run compel as you suggested before and did some changes.

While compiling am facing the below issue (Have attached detailed logs below)

After reading the log with my knowledge, one strange thing am noticing is, when function : fini_sigreturn is called as per the logs :
https://github.com/rushi47/criu/blob/mod_criu/compel/plugins/std/infect.c#L84

It's supposed to go to ARCH specific : ARCH_RT_SIGRETURN definition, for this case from my assumption it should call this macro : https://github.com/rushi47/criu/blob/mod_criu/compel/arch/riscv64/src/lib/include/uapi/asm/sigframe.h#L36-L43

But from log it looks to call : ARCH_RT_SIGRETURN_NATIVE & ARCH_RT_SIGRETURN_COMPAT . After searching, i found that it only exists in x86 arch (https://github.com/rushi47/criu/blob/mod_criu/compel/arch/x86/src/lib/include/uapi/asm/sigframe.h#L180-L212).

So am not sure why, it is executing code related to x86 arch.

I am not sure if am doing anything wrong while compiling, your help will be much appreciated.

Below are the logs :

Note: Building without setproctitle() and strlcpy() support.
      To enable these features, please install libbsd-devel (RPM) / libbsd-dev (DEB).
sh: 1: pkg-config: not found
sh: 1: pkg-config: not found
sh: 1: pkg-config: not found
Note: Building without GnuTLS support
sh: 1: pkg-config: not found
Makefile.config:45: Warn: you have no libnftables installed
Makefile.config:46: Warn: Building without nftables support
  DEP      compel/arch/riscv64/plugins/std/syscalls/syscalls.d
  DEP      compel/arch/riscv64/plugins/std/parasite-head.d
  DEP      compel/plugins/std/infect.d
  DEP      compel/plugins/std/string.d
  DEP      compel/plugins/std/log.d
  DEP      compel/plugins/std/fds.d
  DEP      compel/plugins/std/std.d
  DEP      compel/plugins/shmem/shmem.d
  DEP      compel/plugins/fds/fds.d
  CC       compel/plugins/std/std.o
  CC       compel/plugins/std/fds.o
  CC       compel/plugins/std/log.o
  CC       compel/plugins/std/string.o
  CC       compel/plugins/std/infect.o
In file included from compel/plugins/std/infect.c:14:
compel/plugins/std/infect.c: In function 'fini_sigreturn':
compel/include/uapi/compel/asm/sigframe.h:181:9: error: unknown register name 'rax' in 'asm'
  181 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:209:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_NATIVE'
  209 |                 ARCH_RT_SIGRETURN_NATIVE(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r11' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r10' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r9' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r8' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'eax' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
make[1]: *** [/root/criu/scripts/nmk/scripts/build.mk:215: compel/plugins/std/infect.o] Error 1
make: *** [Makefile.compel:56: compel/plugins/std.lib.a] Error 2

@rst0git
Copy link
Member

rst0git commented Jan 3, 2022

pkg-config: not found

@rushi47 you need to install pkg-config. All build dependencies are described in https://criu.org/Installation.

@rushi47
Copy link
Author

rushi47 commented Jan 6, 2022

@rst0git Thank you for writing it out :) , i thought its unrelated but i did install it as you suggested.
But still the same error :(

root@debian:~/criu# dpkg -l | grep 'pkg-config'
ii  pkg-config                        0.29.2-1                       riscv64      manage compile and link flags for libraries

I am trying to figure out, why it's going on the same unrelated method.

root@debian:~/criu# make install-compel
Note: Building without setproctitle() and strlcpy() support.
      To enable these features, please install libbsd-devel (RPM) / libbsd-dev (DEB).
Note: Building without GnuTLS support
Makefile.config:45: Warn: you have no libnftables installed
Makefile.config:46: Warn: Building without nftables support
  DEP      compel/arch/riscv64/plugins/std/syscalls/syscalls.d
  DEP      compel/arch/riscv64/plugins/std/parasite-head.d
  DEP      compel/plugins/std/infect.d
  DEP      compel/plugins/std/string.d
  DEP      compel/plugins/std/log.d
  DEP      compel/plugins/std/fds.d
  DEP      compel/plugins/std/std.d
  DEP      compel/plugins/shmem/shmem.d
  DEP      compel/plugins/fds/fds.d
  CC       compel/plugins/std/std.o
  CC       compel/plugins/std/fds.o
  CC       compel/plugins/std/log.o
  CC       compel/plugins/std/string.o
  CC       compel/plugins/std/infect.o
In file included from compel/plugins/std/infect.c:14:
compel/plugins/std/infect.c: In function 'fini_sigreturn':
compel/include/uapi/compel/asm/sigframe.h:181:9: error: unknown register name 'rax' in 'asm'
  181 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:209:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_NATIVE'
  209 |                 ARCH_RT_SIGRETURN_NATIVE(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r11' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r10' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r9' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'r8' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
compel/include/uapi/compel/asm/sigframe.h:190:9: error: unknown register name 'eax' in 'asm'
  190 |         asm volatile(                                                   \
      |         ^~~
compel/include/uapi/compel/asm/sigframe.h:211:17: note: in expansion of macro 'ARCH_RT_SIGRETURN_COMPAT'
  211 |                 ARCH_RT_SIGRETURN_COMPAT(new_sp);                       \
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~
compel/plugins/std/infect.c:84:9: note: in expansion of macro 'ARCH_RT_SIGRETURN'
   84 |         ARCH_RT_SIGRETURN(new_sp, sigframe);
      |         ^~~~~~~~~~~~~~~~~
make[1]: *** [/root/criu/scripts/nmk/scripts/build.mk:215: compel/plugins/std/infect.o] Error 1
make: *** [Makefile.compel:56: compel/plugins/std.lib.a] Error 2

@rst0git
Copy link
Member

rst0git commented Jan 6, 2022

@rushi47 x86 and RISC-V use different assembly registers.
You can find more information about the RISC-V registers in the specifications.

@rushi47
Copy link
Author

rushi47 commented Jan 6, 2022

@rst0git yes, that's correct. My doubt is I have copied code of aarch64 as riscv - in compel/arch & I am try to compile it for RISCV, from my understanding ideally it should go to riscv and should break there, am wondering why it's going to ARCH_RT_SIGRETURN_ COMPAT & NATIVE as this definition only exists in x86 arch like i pointed before, when am compiling for RISCV.
Please correct me if am wrong with this.

@rushi47
Copy link
Author

rushi47 commented Jan 9, 2022

I think am getting there, it hit the right spot now.

root@debian:~/criu# make install-compel
Note: Building without setproctitle() and strlcpy() support.
      To enable these features, please install libbsd-devel (RPM) / libbsd-dev (DEB).
Note: Building without GnuTLS support
Makefile.config:45: Warn: you have no libnftables installed
Makefile.config:46: Warn: Building without nftables support
  GEN      compel/include/asm
  GEN      compel/include/version.h
touch .config
  GEN      include/common/config.h
  GEN      include/common/asm
  GEN      compel/plugins/include/uapi/std/asm/syscall-types.h
  GEN      compel/arch/riscv64/plugins/std/syscalls/syscalls.S
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_TERMINAL = "iTerm2",
        LC_NUMERIC = "C",
        LC_COLLATE = "C",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
  DEP      compel/arch/riscv64/plugins/std/syscalls/syscalls.d
  DEP      compel/arch/riscv64/plugins/std/parasite-head.d
  DEP      compel/plugins/std/infect.d
  DEP      compel/plugins/std/string.d
  DEP      compel/plugins/std/log.d
  DEP      compel/plugins/std/fds.d
  DEP      compel/plugins/std/std.d
  DEP      compel/plugins/shmem/shmem.d
  DEP      compel/plugins/fds/fds.d
  CC       compel/plugins/std/std.o
  CC       compel/plugins/std/fds.o
  CC       compel/plugins/std/log.o
  CC       compel/plugins/std/string.o
  CC       compel/plugins/std/infect.o
In file included from compel/include/uapi/compel/asm/sigframe.h:4,
                 from compel/plugins/std/infect.c:14:
/usr/include/riscv64-linux-gnu/asm/sigcontext.h:17:8: error: redefinition of 'struct sigcontext'
   17 | struct sigcontext {
      |        ^~~~~~~~~~
In file included from /usr/include/signal.h:288,
                 from compel/include/uapi/compel/plugins/std/syscall-types.h:12,
                 from compel/include/uapi/compel/plugins/std/syscall.h:4,
                 from compel/include/uapi/compel/plugins/std.h:5,
                 from compel/plugins/std/infect.c:1:
/usr/include/riscv64-linux-gnu/bits/sigcontext.h:25:8: note: originally defined here
   25 | struct sigcontext {
      |        ^~~~~~~~~~
In file included from compel/plugins/std/infect.c:14:
compel/include/uapi/compel/asm/sigframe.h:16:31: error: field 'fpsimd' has incomplete type
   16 |         struct fpsimd_context fpsimd;
      |                               ^~~~~~
compel/include/uapi/compel/asm/sigframe.h:18:29: error: field 'end' has incomplete type
   18 |         struct _aarch64_ctx end;
      |                             ^~~
In file included from compel/plugins/std/infect.c:14:
compel/plugins/std/infect.c: In function 'fini':
compel/include/uapi/compel/asm/sigframe.h:59:95: error: 'mcontext_t' has no member named 'pc'
   59 |  RT_SIGFRAME_REGIP(rt_sigframe)       ((long unsigned int)(rt_sigframe)->uc.uc_mcontext.pc)
      |                                                                                        ^

compel/plugins/std/infect.c:10:53: note: in expansion of macro 'RT_SIGFRAME_REGIP'
   10 | #define pr_debug(fmt, ...) print_on_level(4, fmt, ##__VA_ARGS__)
      |                                                     ^~~~~~~~~~~
compel/plugins/std/infect.c:94:9: note: in expansion of macro 'pr_debug'
   94 |         pr_debug("%ld: new_sp=%lx ip %lx\n", sys_gettid(), new_sp, RT_SIGFRAME_REGIP(sigframe));
      |         ^~~~~~~~
make[1]: *** [/root/criu/scripts/nmk/scripts/build.mk:215: compel/plugins/std/infect.o] Error 1
make: *** [Makefile.compel:56: compel/plugins/std.lib.a] Error 2

@rushi47
Copy link
Author

rushi47 commented Jan 19, 2022

[Update] Reached till parasite code, trying to figure it out -

root@debian:~/criu# make install-compel
Note: Building without setproctitle() and strlcpy() support.
      To enable these features, please install libbsd-devel (RPM) / libbsd-dev (DEB).
Note: Building without GnuTLS support
Makefile.config:45: Warn: you have no libnftables installed
Makefile.config:46: Warn: Building without nftables support
  DEP      compel/arch/riscv64/plugins/std/syscalls/syscalls.d
  DEP      compel/arch/riscv64/plugins/std/parasite-head.d
  DEP      compel/plugins/std/infect.d
  DEP      compel/plugins/std/string.d
  DEP      compel/plugins/std/log.d
  DEP      compel/plugins/std/fds.d
  DEP      compel/plugins/std/std.d
  DEP      compel/plugins/shmem/shmem.d
  DEP      compel/plugins/fds/fds.d
  CC       compel/plugins/std/std.o
  CC       compel/plugins/std/fds.o
  CC       compel/plugins/std/log.o
  CC       compel/plugins/std/string.o
  CC       compel/plugins/std/infect.o
  CC       compel/arch/riscv64/plugins/std/parasite-head.o
compel/arch/riscv64/plugins/std/parasite-head.S: Assembler messages:
compel/arch/riscv64/plugins/std/parasite-head.S:4: Error: unrecognized symbol type ""
compel/arch/riscv64/plugins/std/parasite-head.S:5: Error: unrecognized opcode `bl parasite_service'
compel/arch/riscv64/plugins/std/parasite-head.S:6: Error: unrecognized opcode `brk '
/tmp/ccJSDRmN.s: Error: .size expression for __export_parasite_head_start does not evaluate to a constant

@rushi47
Copy link
Author

rushi47 commented Jan 26, 2022

@mihalicyn hey alex will you be able to help me, with assembly for parasite code injection. I am bit confused with, how it's done for riscv

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rushi47
Copy link
Author

rushi47 commented Mar 29, 2022

I think am able to compile compel part it looks to be successful [i think at least syntactically].
@mihalicyn @rst0git Will you be able to help me test ?
https://criu.org/Compel
I am trying to follow the instructions from here but couldnt follow much

@rushi47
Copy link
Author

rushi47 commented Apr 1, 2022

@adrianreber @avagin will you be able to guys take a look ?
I am trying to test it, ld is giving me some issue :

root@debian:~/criu/compel/test/fdspy# ld -r -z noexecstack -T ../../../compel/arch/riscv64/scripts/compel-pack.lds.S -o parasite.po parasite.o ../../../compel/plugins/std.lib.a ../../../compel/plugins/fds.lib.a
ld: cannot represent machine `riscv64'

@mihalicyn
Copy link
Member

Hi @rushi47

feel free to reach me on the Gitter [https://gitter.im/save-restore/CRIU]

  1. Are you trying to run ld from the RISC-V machine or are you trying to do cross-compilation?
    Please show you changes in compel.

@mihalicyn hey alex will you be able to help me, with assembly for parasite code injection. I am bit confused with, how it's done for riscv

if it's actual - ping me. I will try to help you.

@rushi47
Copy link
Author

rushi47 commented Apr 15, 2022

@mihalicyn I am running it on RISC-V machine like native code.
I was able to fix the linker issue by this commit :
rushi47@95ba915


Breaking here now, when am trying to run test:
https://github.com/rushi47/criu/blob/mod_criu/compel/src/lib/handle-elf.c#L640

Ahh ok let me join jitter, looks like on jitter people are more responsive

@mihalicyn
Copy link
Member

mihalicyn commented Apr 15, 2022

@rushi47

I was able to fix the linker issue by this commit :

yep, this change to the linker script looks correct.

Breaking here now, when am trying to run test:

Yep, you need to support relocations for RISC-V. First of all, I suggest listing all relocation types that you've met here.
You can change goto err to break and collect all relocation types that the compiler uses in the final PIE. Then we will think about how to handle it properly.

  1. enumerate all relocation types
  2. assign proper constants for it like it done for other architectures R_RISCV_JAL, R_RISCV_ADD32, etc
    it will be really useful to take a look on
    https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/arch/riscv/kernel/module.c#L312
  3. provide an implementation for needed relocation types. We can get some inspiration from the kernel. Because when the Linux kernel loads the module into the memory it also applies relocations (because LKM is also relocatable objects).

@mihalicyn
Copy link
Member

We've private text discussion with @rushi47 about relocation types and how to handle them.

I think at first we can go with a "naive way":

#ifdef ELF_RISCV
			case R_RISCV_BRANCH:
				ptrdiff_t offset = value64 + addend64 - place;
				u32 imm12 = (offset & 0x1000) << (31 - 12);
				u32 imm11 = (offset & 0x800) >> (11 - 7);
				u32 imm10_5 = (offset & 0x7e0) << (30 - 10);
				u32 imm4_1 = (offset & 0x1e) << (11 - 4);
				*((int32_t *)where) = (*((int32_t *)where) & 0x1fff07f) | imm12 | imm11 | imm10_5 | imm4_1;

				break;

			case R_RISCV_JAL:
				ptrdiff_t offset = value64 + addend64 - place;
				u32 imm20 = (offset & 0x100000) << (31 - 20);
				u32 imm19_12 = (offset & 0xff000);
				u32 imm11 = (offset & 0x800) << (20 - 11);
				u32 imm10_1 = (offset & 0x7fe) << (30 - 10);
				*((int32_t *)where) = (*((int32_t *)where) & 0xfff) | imm20 | imm19_12 | imm11 | imm10_1;

				break;

			case R_RISCV_CALL_PLT:
				ptrdiff_t offset = value64 + addend64 - place;
				s32 fill_v = offset;
				u32 hi20, lo12;

				if (offset != fill_v) {
					pr_err("Unsupported relocation of type R_RISCV_CALL_PLT with offset != fill_v\n");
					goto err;
				}

				hi20 = (offset + 0x800) & 0xfffff000;
				lo12 = (offset - hi20) & 0xfff;
				*((int32_t *)where) = (*((int32_t *)where) & 0xfff) | hi20;
				*((int32_t *)(where+4)) = (*((int32_t *)(where+4)) & 0xfffff) | (lo12 << 20);

				break;
#endif

This code is not tested because I've no RISC-V machine yet. So @rushi47, please try to play with that. It's a "naive" translation of the kernel implementation for these relocation types.

@rushi47
Copy link
Author

rushi47 commented Apr 29, 2022

@mihalicyn thanks ton for this. As per private conversation I tested this, so compel is compiling again fine but its still throwing error I think need to add more relocation types working on it


On other hand, this is docker image can be used as emulator to test RISCV:
https://github.com/DavidBurela/riscv-emulator-docker-image

@mihalicyn
Copy link
Member

Hi @rushi47!

how things are going with RISC-V? I have some plan to play with that soon, as far as I'll get access to the real RISC-V device to test on it.

Please let me know if you have a serious plans to work on it.

@rushi47
Copy link
Author

rushi47 commented Jul 9, 2022

Hi @rushi47!

how things are going with RISC-V? I have some plan to play with that soon, as far as I'll get access to the real RISC-V device to test on it.

Please let me know if you have a serious plans to work on it.

Hey,

  • So compel was compiling fine syntactically, I would say. And
    I think majority of assembly code was in compel.

I had pause project for some time. I am quite keen about it & more than happy to work on it. So if you are starting working on it, I am more than happy to resume again. If someone can mentor me actively, I am more than happy to dedicatedly work for some hours on it daily.

felixonmars added a commit to felixonmars/archriscv-packages that referenced this issue Oct 2, 2022
Disable criu support until upstream has support
(checkpoint-restore/criu#1702)
felixonmars added a commit to felixonmars/archriscv-packages that referenced this issue Oct 2, 2022
Disable criu support until upstream has support
(checkpoint-restore/criu#1702)
@thesaadmemon
Copy link

Hello everyone,

I have been working on RISC-V recently, and was wondering if we could really advance on this feature. I would be happy to provide any RISC-V based simulations and testing.

@felicitia
Copy link

felicitia commented Mar 14, 2023

Yep, you need to support relocations for RISC-V. First of all, I suggest listing all relocation types that you've met here. You can change goto err to break and collect all relocation types that the compiler uses in the final PIE. Then we will think about how to handle it properly.

  1. enumerate all relocation types
  2. assign proper constants for it like it done for other architectures R_RISCV_JAL, R_RISCV_ADD32, etc
    it will be really useful to take a look on
    https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/arch/riscv/kernel/module.c#L312
  3. provide an implementation for needed relocation types. We can get some inspiration from the kernel. Because when the Linux kernel loads the module into the memory it also applies relocations (because LKM is also relocatable objects).

@mihalicyn I have followed this suggestion and added more relocation types I encountered (doing a break for now per your suggestion). By reusing and modifying @rushi47's previous code, I'm able to compile compel and run test in compel/test/infect. I'm working with cross compiling environment (X86 -> RISC-V) and have a RISC-V Qemu to test it. This is the result after running ./spy in compel/test/infect. The errors are expected since I haven't added all the RISC-V support yet, but wanted to see if we have an environment to test compel first. Looks like we do now! :) This runtime errors can give us a lot of insights to move forward. I'll keep working on it!

If other people are interested in this, @mihalicyn @rushi47 and I also have a group chat on Gitter to discuss the details. It's quite active :)

# ./spy
Checking the victim alive
1, want 1
42, want 42
Infecting the victim
Stopping task
Preparing parasite ctl
        LC4: Preparing seqsk for 219
Configuring contexts
Infecting
        LC4: ** delivering signal 4 si_code=1
        LC1: Error (compel/src/lib/infect.c:609): Unexpected 219 task interruptiog
[  377.932091] spy[218]: unhandled signal 11 code 0x1 at 0x0000000000000000 in li]
[  377.933221] CPU: 5 PID: 218 Comm: spy Not tainted 5.15.0 #1
[  377.933752] Hardware name: XXXXXXXXXXXXX
[  377.934211] epc : 0000003fbb153344 ra : 0000002ae5bab716 sp : 0000003fde5d2a70
[  377.934814]  gp : 0000002ae5bb1800 tp : 0000003fbb2073c0 t0 : 0000000000000000
[  377.935440]  t1 : 0000002ae5ba857c t2 : 00000000000a66e6 s0 : 0000002aec7658a0
[  377.936082]  s1 : 0000000000006480 a0 : 0000000000000000 a1 : 0000002ae5bac3b0
[  377.936698]  a2 : 0000000000001dc0 a3 : 0000000000000200 a4 : 0000000000000000
        LC3: Putting parasite blob into (nil)->(nil)
[  377.939224]  a5 : 0000000000000000 a6 : fefefefefefefeff a7 : 0000000000000040
[  377.939967]  s2 : 0000000000006480 s3 : 0000000000000001 s4 : 0000000000001fff
[  377.940817]  s5 : 0000000000002000 s6 : ffffffffffffffff s7 : 0000000000000007
[  377.941479]  s8 : ffffffffffffffda s9 : 00000000000000db s10: 0000003fde5d2a80
[  377.942121]  s11: 0000000000000001 t3 : 0000003fbb153334 t4 : 0000000000000000
[  377.942803]  t5 : 000000000000002a t6 : 0000003fbb1dcc40
[  377.943292] status: 8000000200004620 badaddr: 0000000000000000 cause: 00000000f
Segmentation fault

@felicitia
Copy link

Hello everyone,
I have been working on RISC-V recently, and was wondering if we could really advance on this feature. I would be happy to provide any RISC-V based simulations and testing.

Hello @thesaadmemon We're actively working on this feature now. What simulation and testing environment are you referring to? Can you give us some more details? Thank you very much! :)

@jiangcuo
Copy link

jiangcuo commented Jun 8, 2023

@felicitia Hello Yixue,I have two VisionFive2(4c8g) board and am currently building the riscv64 version of Proxmox VE.

@felicitia
Copy link

@felicitia Hello Yixue,I have two VisionFive2(4c8g) board and am currently building the riscv64 version of Proxmox VE.

That's great @jiangcuo ! I actually found a way to test CRIU on riscv64 QEMU and wrote the instruction since simulation is a lot easier to get started. CRIU needs to be cross compiled for riscv64 target in this way and I included the instructions for cross-compiling too if you're interested :) @mihalicyn and I are actively working on the riscv64 support for CRIU! 💪

@ancientmodern
Copy link
Contributor

@felicitia Hi Yixue, I am quite interested in using CRIU on RISC-V and have attempted to follow your instructions to perform a sample compel test on my own riscv64 QEMU. However, I encountered an issue during the cross-compilation process while running cross-compile/build_criu.sh.

...
  DEP      soccr/soccr.d
  CC       soccr/soccr.o
  AR       soccr/libsoccr.a
make[1]: 'soccr/libsoccr.a' is up to date.
criu/Makefile.packages:36: Can not find some of the required libraries
criu/Makefile.packages:37: Make sure the following packages are installed
criu/Makefile.packages:38: RPM based distros: protobuf protobuf-c protobuf-c-devel protobuf-compiler protobuf-devel protobuf-python libnl3-devel libcap-devel python3-future
criu/Makefile.packages:39: DEB based distros: libprotobuf-dev libprotobuf-c-dev protobuf-c-compiler protobuf-compiler python3-protobuf python3-future libnl-3-dev libcap-dev
criu/Makefile.packages:40: To run tests the following packages are needed
criu/Makefile.packages:41: RPM based distros: libaio-devel python3-PyYAML
criu/Makefile.packages:42: DEB based distros: python3-yaml libaio-dev libaio-dev
criu/Makefile.packages:43: *** Compilation aborted.  Stop.
make[1]: *** [criu/Makefile.packages:49: check-packages] Error 2
make: *** [Makefile:263: criu] Error 2
image

Prior to this, I had successfully executed cross-compile/build_required_deps.sh, so all necessary dependencies should be located at /scratch/cross-compile-riscv64-artifacts/riscv64_pb_install. I am unsure of what could be causing this issue. Would you happen to know of any potential causes or perhaps a workaround? I noticed in your instructions that you have successfully built all artifacts. Could that possibly be a solution to this problem?

Below is my cross-compile/config.sh for your reference:

#!/bin/bash

TOOLCHAIN_ROOT="/opt/riscv64"

# the path that contains cross compiling toolchain binaries, e.g., riscv64-unknown-linux-gnu-gcc, riscv64-unknown-linux-gnu-ld
TOOLCHAIN_PATH="$TOOLCHAIN_ROOT/bin" 

TARGET_ARCH="riscv64"

# the root directory that contains CRIU's source code
CRIU_ROOT_DIR="/scratch/yixue-criu"

# the root directory that will contain the cross compiled artifacts (initially empty)
# e.g., the RISC-V binaries of protobuf (CRIU's required package)
BUILD_ROOT_DIR="/scratch/cross-compile-riscv64-artifacts"

mkdir -p $BUILD_ROOT_DIR

# no need to change it, unless you changed the build scripts for CRIU's dependencies (e.g., build_protobuf.sh)
# the path should be consistent with the prefix specified in the build scripts (e.g., build_protobuf.sh)
INCLUDE_DIR_CC="$BUILD_ROOT_DIR/riscv64_pb_install/include"
LIB_DIR_CC="$BUILD_ROOT_DIR/riscv64_pb_install/lib"
TOOLCHAIN_INCLUDE_DIR="$TOOLCHAIN_ROOT/sysroot/usr/include"

# the directory that contains the toolchain libraries, e.g., libpthread.so.0
TOOLCHAIN_LIB_DIR="$TOOLCHAIN_ROOT/sysroot/lib"

export PATH=$TOOLCHAIN_PATH:$PATH

I greatly appreciate your efforts and look forward to your response. Thank you :)

@ancientmodern
Copy link
Contributor

Further investigation into errors when suffering from insomnia😭

By removing > /dev/null 2>&1 from the try-compile in scripts/nmk/scripts/utils.mk, I forced the Makefile to display the error messages. The key issue seemed to revolve around the -rpath option:

...
riscv64-unknown-linux-gnu-gcc: error: unrecognized command-line option '-rpath'
riscv64-unknown-linux-gnu-gcc: error: unrecognized command-line option '-rpath'
riscv64-unknown-linux-gnu-gcc: error: unrecognized command-line option '-rpath'
...

This -rpath option is defined in build_criu.sh as follows (also in build_compel_tests.sh but never been used):

CFLAGS=$(pkg-config --cflags libprotobuf-c)
CFLAGS+=" -I$INCLUDE_DIR_CC -L$LIB_DIR_CC"


LDFLAGS=$(pkg-config --libs libprotobuf-c)
LDFLAGS+=" -rpath $TOOLCHAIN_LIB_DIR"

I tried a syntax fix suggested by chatgpt, updating the last line to LDFLAGS+=" -Wl,-rpath,$(TOOLCHAIN_LIB_DIR)" (where -Wl passes following options to the linker). However, this led to the following error:

/rivos/riscv-gnu-toolchain/lib/gcc/riscv64-unknown-linux-gnu/12.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: cannot find -lselinux: No such file or directory
/rivos/riscv-gnu-toolchain/lib/gcc/riscv64-unknown-linux-gnu/12.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: cannot find -lgnutls: No such file or directory
collect2: error: ld returned 1 exit status
criu/Makefile.packages:36: Can not find some of the required libraries
...

Anyway at least I'm able to cross compile infect and rsys tests and run them on my riscv64 VM. I'm learning how compel and the arch-specified code works, hopefully I can make some contributions to this feature. Thank you so much :)

@felicitia
Copy link

@ancientmodern Ah sorry to hear about your insomnia I hope it's not because of CRIU! 😂 What you saw was exactly what's expected actually so great job!! 👏 This is very great news as you're the first one following my instructions and it looks like everything was correct!

Now some explanations of the error you saw.
... criu/Makefile.packages:43: *** Compilation aborted. Stop. etc -- This is because CRIU checks whether you installed required libraries and will abort if not. The check doesn't work for cross-compiling environment (we'll need to update the script) so it simply fails and aborts. One quick fix to bypass the check is to comment out the abort here if you know you already installed required packages.

However, you'll have other compiling issues after fixing this as well and this is a known issue. You probably already tasted how complicated the build process is when you did further investigation right? :) It wasn't easy to cross-compile CRIU that far and I'm only focusing on "compel" module as @mihalicyn suggested earlier in this issue (one of the earliest CRIU developers!). So don't worry about it for now as "compel" is already cross-compiled correctly and we're only working on "compel" for now. The short-term goal is to pass all 4 test cases under compel/test(for now, I only added the scripts to cross-compile rsys and infect test cases).

So long story short, you have everything correctly set up already! If you're interested in using CRIU for RISC-V, it's still under development and you can just watch this issue. If you're interested in contributing to the project, you can find me on Gitter to sync up the current status (just so you're not chasing the issues that I already faced :)).

@ancientmodern
Copy link
Contributor

@felicitia Thank you for your detailed explanation and assistance. I'm glad I was on the right track! One interesting thing I found is that, despite build_criu.sh consistently resulting in some sort of error, it needs to be run once before executing build_compel_test.sh in order to build the required libraries. 😂

@felicitia
Copy link

@felicitia Thank you for your detailed explanation and assistance. I'm glad I was on the right track! One interesting thing I found is that, despite build_criu.sh consistently resulting in some sort of error, it needs to be run once before executing build_compel_test.sh in order to build the required libraries. 😂

Yup, build_criu.sh cross-compiles the compel library and needs to be run first. build_compel_test.sh cross compiles compel's test cases (rsys, infect, for now). If you modify the tests (e.g., add more debugging info), you need to run make clean first (e.g., if you're working on infect, then run make clean under "infect" folder), then run build_compel_test.sh again. If you modified compel's source code (e.g., compel/src/lib/infect.c), then you need to run build_criu.sh again to re cross compile compel library first, then cross-compile the test case.

@felicitia
Copy link

felicitia commented Jun 15, 2023

Hello CRIU community, happy to report that compel is working and can pass infect, rsys, stack on RISC-V now!! 🥳

However, when testing using fdspy, it has read from pipe failed error in the end (see log below). I tested fdspy on aarch64 and x86 (natively, without cross-compiling) and they both have the exact same error. Since checkpoint/restore works fine on aarch64 and x86, I assume this error might not matter? But just wanted to share this info here if anyone is interested in investigating it.

...
Done
Closing victim stdin
Waiting for victim to die
Checking the result
Check pipe ends are at hands
Check pipe ends are connected
read from pipe failed
...

New updates on compel: There's a non-deterministic bug that only happens if read() in victim.c is already executed before compel_stop_task. After resuming the victim, read will fail and have errno 512. But if compel_stop_task happens before read (e.g., inserting a sleep(1) before read to force this order to happen), everything will be OK. @ancientmodern and I are looking into this. If anyone else is interested in contributing, you can find me on Gitter for easier communication.

@ancientmodern
Copy link
Contributor

ancientmodern commented Jun 25, 2023

New updates on compel: There's a non-deterministic bug that only happens if read() in victim.c is already executed before compel_stop_task. After resuming the victim, read will fail and have errno 512. But if compel_stop_task happens before read (e.g., inserting a sleep(1) before read to force this order to happen), everything will be OK. @ancientmodern and I are looking into this. If anyone else is interested in contributing, you can find me on Gitter for easier communication.

Update:
I found another solution which minimally affects the kernel. Instead of relying on the kernel to restart the interrupted syscall, this can be handled within the compel_get_task_regs() function, similar to how other architectures do it.

int compel_get_task_regs(pid_t pid, user_regs_struct_t *regs, user_fpregs_struct_t *ext_regs, save_regs_t save,
			 void *arg, __maybe_unused unsigned long flags)
{
        // ......
	if (regs->a7) { /* Not sure if this is adequate for syscall identification */
		/* Restart the system call */
		switch (regs->a0) {
		case -ERESTARTNOHAND:
		case -ERESTARTSYS:
		case -ERESTARTNOINTR:
                         regs->a0 = regs->orig_a0; /* The user_regs_struct structure needs to be altered */
			regs->pc -= 0x4;
			break;
		case -ERESTART_RESTARTBLOCK:
                         regs->a0 = regs->orig_a0;
			regs->a7 = __NR_restart_syscall;
			regs->pc -= 0x4;
			break;
		}
	}

	ret = save(arg, regs, fpsimd);
	return ret;
}

We only require the kernel (and riscv-gnu-toolchain) to add orig_a0 to the RISC-V registers' uapi. This seems reasonable given that many other architectures follow the same procedure (e.g. loongarch has orig_a0). I have tested this approach on my RISC-V VM, and it is now able to pass all four compel tests :)

Original reply:
@mihalicyn Hi Alex, we believe we've identified the root cause of the bug that prevents a blocking syscall from correctly restarting after an interruption. When compel uses ptrace_get_regs and ptrace_set_regs to checkpoint and restore registers, these functions only handle general purpose registers, not CSRs (control and status registers) in RISC-V. Consequently, the regs->cause register might not properly maintain EXC_SYSCALL in this kernel function (crucial for syscall restarting) after the victim process resumes:

// arch/riscv/kernel/signal.c
void arch_do_signal_or_restart(struct pt_regs *regs)
{
	struct ksignal ksig;

	if (get_signal(&ksig)) {
		/* Actually deliver the signal */
		handle_signal(&ksig, regs);
		return;
	}

	/* Did we come from a system call? */
	if (regs->cause == EXC_SYSCALL) {
		/* Avoid additional syscall restarting via ret_from_exception */
		regs->cause = -1UL;

		/* Restart the system call - no handlers present */
		switch (regs->a0) {
		case -ERESTARTNOHAND:
		case -ERESTARTSYS:
		case -ERESTARTNOINTR:
                        regs->a0 = regs->orig_a0;
			regs->epc -= 0x4;
			break;
		case -ERESTART_RESTARTBLOCK:
                        regs->a0 = regs->orig_a0;
			regs->a7 = __NR_restart_syscall;
			regs->epc -= 0x4;
			break;
		}
	}

	/*
	 * If there is no signal to deliver, we just put the saved
	 * sigmask back.
	 */
	restore_saved_sigmask();
}

Here is my log related to this issue, demonstrating that regs->cause of this read() syscall changes to EXC_BREAKPOINT (this is using a custom kernel based on v6.4-rc7):

[ 1176.127858] syscall_handler: after syscall 63: a0 = -512, orig_a0 = 0
[ 1176.128207] ~RISCV~ arch/riscv/kernel/signal.c:294:arch_do_signal_or_restart: Enter arch_do_signal_or_restart
[ 1176.128550] ~RISCV~ Before get_signal: cause = 8, a7 = 63, a0 = -512, a1 = 140737003747964
...
...
	LC3: Executing function: compel_cure_local in file: compel/src/lib/infect.c
	LC3: Executing function: compel_resume_task in file: compel/src/lib/infect.c
	LC3: Executing function: compel_resume_task_sig in file: compel/src/lib/infect.c
	LC4: Unseizing 107 into 1
[ 1176.313269] ~RISCV~ After get_signal: cause = 3, a7 = 63, a0 = -512, a1 = 140737003747964

Currently, the RISC-V ptrace capability (even in linux-next where vector support has been added), doesn't allow setting CSRs via ptrace in userspace. We could potentially submit a short patch to the Linux kernel for this, but it's uncertain whether providing cause to userspace would be deemed appropriate. Do you think there might be a better approach or a way to purely handle this in userspace as a workaround? Your input would be greatly appreciated! Thanks a lot :)

ancientmodern added a commit to ancientmodern/linux that referenced this issue Jul 18, 2023
Add orig_a0 to RISC-V ptrace's user register interface. This allows
user-space programs to access and manipulate orig_a0 through ptrace
regset APIs. In order to maintain the prefix relationship, the pt_regs
struct has been reordered accordingly. Some other archs have similar
practice, e.g. x86 exposes orig_eax to user space.

The need for this change arose during the porting of a user-space
checkpoint/restore tool to the RISC-V. The tool required access to
orig_a0 to restart interrupted syscalls, as a0 gets overwritten by the
syscall return value after execution. Detailed explanation and the
associated issue can be found at:
github.com/checkpoint-restore/criu/issues/1702#issuecomment-1605814529

Signed-off-by: Haorong Lu <haorong@rivosinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants