This is an experimental low-level translator from LLVM bitcode to JVM bytecode. Since existing tools can generate LLVM bitcode from functions written in some languages (e.g., Numba for Python, Clang for C/C++, Julia, Gollvm for Go, and Weld for cross-library optimization), it is useful to inject the bitcode into JVMs. An objective of this library is to provide not a full-fledge translator but a restricted one for easily injecting these functions into JVMs.
Note that the core component is refactored from lljvm (credit should go to the original author).
First, you need to install Numba
to generate LLVM bitcode from python functions:
$ pip install numba
You run code blow to get LLVM bitcode for a python function:
import math
from numba import cfunc
def pyfunc(x, y):
return math.log10(2 * x) + y
## Compiles the python function above and writes as LLVM bitcode
with open("pyfunc.bc", "wb") as out:
f = cfunc("float64(float64, float64)")(pyfunc)
out.write(f._library._final_module.as_bitcode())
Finally, you get a JVM class file for the python function pyfunc
:
$ ./bin/lljvm-translator ./pyfunc.bc
To check gen'd bytecode in the JVM class file, you can use javap
:
$ javap -c -s pyfunc.class
public final class GeneratedClass {
...
public static double _cfunc__ZN8__main__10pyfunc_241Edd(double, double);
descriptor: (DD)D
Code:
0: dconst_0
1: dstore 4
3: dconst_0
4: dstore 6
6: dconst_0
7: dstore 8
9: dload_0
10: ldc2_w #26 // double 2.0d
13: dmul
14: dstore 4
16: dload 4
18: invokestatic #8 // Method java/lang/Math.log10:(D)D
21: dstore 6
23: dload 6
25: dload_2
26: dadd
27: dstore 8
29: dload 8
31: dreturn
}
You can load this gen'd class file via Java Runtime Reflection and run in JVMs:
import io.github.maropu.lljvm.LLJVMClassLoader;
import io.github.maropu.lljvm.LLJVMUtils;
public class LLJVMTest {
public static void main(String[] args) {
try {
/**
* If you want to load a class from LLVM bitcode directly, you write a line below;
* Class<?> clazz = LLJVMClassLoader.currentClassLoader.loadClassFromBitcodeFile("pyfunc.bc");
*/
Class<?> clazz = LLJVMClassLoader.currentClassLoader.loadClassFromBytecodeFile("pyfunc.class");
System.out.println(LLJVMUtils.invoke(clazz, 3, 6));
} catch (Exception e) {
e.printStackTrace();
}
}
}
You can use clang
to get LLVM bitcode for C/C++ functions:
$ cat cfunc.c
#include <math.h>
double cfunc(double a, double b) {
return pow(3.0 * a, 2.0) + 4.0 * b;
}
$ clang -c -O2 -emit-llvm -o cfunc.bc cfunc.c
$ ./bin/lljvm-translator ./cfunc.bc
Then, you dump gen'd bytecode:
$ javap -c cfunc.class
public final class GeneratedClass {
...
public static double __Z5cfuncdd(double, double);
descriptor: (DD)D
Code:
0: dconst_0
1: dstore 4
3: dconst_0
4: dstore 6
6: dconst_0
7: dstore 8
9: dconst_0
10: dstore 10
12: dload_0
13: ldc2_w #8 // double 3.0d
16: dmul
17: dstore 4
19: dload 4
21: dload 4
23: dmul
24: dstore 6
26: dload_2
27: ldc2_w #13 // double 4.0d
30: dmul
31: dstore 8
33: dload 6
35: dload 8
37: dadd
38: dstore 10
40: dload 10
42: dreturn
}
You can generate LLVM bitcode as follows;
$ cat jlfunc.jl
function jlfunc(a, b)
return a * 2.0 + b
end
code_llvm(jlfunc, (Float64, Float64))
$ julia jlfunc.jl > jlfunc.ll
$ llvm-as jlfunc.ll
$ ./bin/lljvm-translator ./jlfunc.bc
Then, you dump gen'd bytecode:
$ javap -c jlfunc.class
public final class GeneratedClass {
...
public static double _julia_jlfunc_35021(double, double);
Code:
0: dconst_0
1: dstore 4
3: dconst_0
4: dstore 6
...
9: dload_0
10: ldc2_w #23 // double 2.0d
13: dmul
14: dstore 4
16: dload 4
18: dload_2
19: dadd
20: dstore 6
...
25: dload 6
27: dreturn
}
Let's say that you have a function below;
$ cat cfunc.c
#include <stdio.h>
long cfunc(long x[], int size) {
long sum = 0;
for (int i = 0; i < size; i++) {
sum += x[i];
}
return sum;
}
Then, you can handle the array in Java like;
import io.github.maropu.lljvm.LLJVMClassLoader;
import io.github.maropu.lljvm.LLJVMUtils;
import io.github.maropu.lljvm.util.ArrayUtils;
long[] javaArray = {1L, 2L, 3L};
System.out.println(LLJVMUtils.invoke(clazz, ArrayUtils.addressOf(javaArray), javaArray.length));
Most of python users possibly write functions with NumPy
, so it is useful to translate them for JVMs.
Let's say that you write a function below with NumPy
and
you get LLVM assembly code for the function via Numba
:
import numpy as np
def numpy_logistic_regression(Y, X, w, iterations):
for i in range(iterations):
w -= np.dot(((1.0 / (1.0 + np.exp(-Y * np.dot(X, w))) - 1.0) * Y), X)
return w
Then, you invoke the bitcode of the function as follows:
// Placeholders for Python arrays
PyArrayHolder Y = new PyArrayHolder();
PyArrayHolder X = new PyArrayHolder();
PyArrayHolder w = new PyArrayHolder();
// Loads LLVM bitcode and runs it
Class<?> clazz = LLJVMClassLoader.currentClassLoader.loadClassFromBitcodeFile("numpy_logistic_regression.bc");
LLJVMUtils.invoke(
clazz,
Y.with(double[] { 1.0, 1.0 }).addr(),
X.with(double[] { 1.0, 1.0, 1.0, 1.0 }).reshape(2, 2).addr(),
w.with(double[] { 1.0, 1.0 }).addr(),
1L);
As described above, our objective is to provide not a full-fledge translator but a restricted one for functions written in other languages.
So, it is important to verify that gen'd bytecode is correct and supported before execution.
The library does so when loading it in LLJVMClassLoader
and, if it detects illegal code, it throws LLJVMException
;
try {
Class<?> clazz = LLJVMClassLoader.currentClassLoader.loadClassFromBitcodeFile("func.bc");
...
} catch (LLJVMException e) {
// Writes fallback code here
...
}
For debugging uses, you can use helper functions to print generated code in a Python interactive shell. To print generated code for a Python function, you can follow directions bloew;
$ ./bin/python
Python 2.7.15 (default, Sep 12 2018, 20:19:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
__ __ ___ ____ ___
/ / / / __ / / | / / |/ /
/ /__/ /__/ // /| |/ / /|_/ /
/____/____/\___/ |___/_/ /_/ version 0.2.0-EXPERIMENTAL
LLJVM debugging helper functions imported:
- NumPy version: 1.15.2
- SciPy version: 1.1.0
- Numba version: 0.40.0
- LLVMLite version: 0.25.0
- LLVM version: 7.0.1
>>> def add(a, b):
... return a + b
...
// When generating code for a Python function, you need to explicitly specify types
// because Python is a dynamically typed language. Type names follow
// the Numba specification below:
// - https://numba.pydata.org/numba-doc/dev/reference/types.html#basic-types
>>> print_llvm(add, "int32(int32, int32)")
...
; Function Attrs: norecurse nounwind readnone
define i32 @"cfunc._ZN8__main__7add$241Eii"(i32 %.1, i32 %.2) local_unnamed_addr #1 {
entry:
%.16.i = add i32 %.2, %.1
ret i32 %.16.i
}
>>> print_jvm(add, "int32(int32, int32)")
...
.method public static _cfunc__ZN8__main__7add_242Eii(II)I
iconst_0
istore 2
begin_method:
invokestatic io/github/maropu/lljvm/runtime/VMemory/createStackFrame()V
label2:
iload_1 ; __2
iload_0 ; __1
iadd
istore_2 ; __16_i
invokestatic io/github/maropu/lljvm/runtime/VMemory/destroyStackFrame()V
iload_2 ; __16_i
ireturn
.limit stack 16
.limit locals 3
end_method:
.end method
- Supports OpenJDK 8 (64bit) only
- Bundles native binaries for Linux/x86_64 and Mac/x86_64
- For Linux, it is built by clang++ v3.6.2 (w/ glibc v2.17 and libstdc++ v4.8.5) on AWS Linux AMI (ami-0ad99772)
- For Mac, it is built by Apple clang++ v900.0.39.2 on macOS Sierra v10.12.1
- LLVM v7.0.1 used internally
<dependency>
<groupId>io.github.maropu</groupId>
<artifactId>lljvm-core</artifactId>
<version>0.2.0-EXPERIMENTAL</version>
<type>jar</type>
<scope>compile</scope>
</dependency>
See lljvm-example.
To comile it, you need to check requirements below;
- gcc-c++ v4.8.0+ / clang++ v3.1.0+
- cmake v3.4.3+
- python v2.7+
- zlib v1.2.3.4+
- ncurses v5.7+
Then, you run lines below;
$ git clone https://github.com/maropu/lljvm-translator.git
$ cd lljvm-translator/lib/lljvm-native
// Downloads/compiles LLVM, builds a native library based on the compiled LLVM,
// and then copys the library into a proper location.
//
// Or, you can use the pre-built LLVM in http://releases.llvm.org/download.html#7.0.1;
// $ wget http://releases.llvm.org/7.0.1/clang+llvm-7.0.1-<your platform>.tar.xz
// $ tar xvf clang+llvm-7.0.1-<your platform>.tar.xz
// $ LLVM_DIR=`pwd`/clang+llvm-7.0.1-<your platform> CXX=clang++ ./waf configure
// $ ./waf -v
$ ./build-lljvm.sh
// Moves the root and builds jar with the binary above
$ cd ../..
$ ./build/mvn clean package
$ ls target
lljvm-core_0.2.0-EXPERIMENTAL-with-dependencies.jar
lljvm-core_0.2.0-EXPERIMENTAL.jar
...
Python UDFs in Apache Spark have well-known overheads and the recent work of Vectorized UDFs in the community significantly improves the performance. But, Python UDFs still incur large performance gaps against Scala UDFs. If we could safely inject python UDFs into Spark gen'd code, we would make the Python UDF overheads close to zero. Here is a sample patch and a quick benchmark below shows that the injection could make it around 50x faster than the performance of the Vectorized UDFs:
- Adds more platform-dependent binaries in
src/main/resources/native
- Statically Links BSD libc++ for native binaries
- Supports Python v3.x
If you hit some bugs and requests, please leave some comments on Issues or Twitter(@maropu).