Skip to content

Latest commit

 

History

History
903 lines (795 loc) · 32.2 KB

specification.md

File metadata and controls

903 lines (795 loc) · 32.2 KB

Preliminaries

File type

The assembler will automatically recognize files with the extension jbc (Java ByteCode) as files to parse and assemble. Files with the extension class will be disassembled to jbc files.

Notation

Tokens are defined in this document using the token := ... notation. Tokens are written in italic, literals use the normal formatting.

Regex-like operations, such as ( and ) for groups, * for 0 or more, and ? for 0 or 1 are also be used in the documentation.

Comments

Standard Java syntax comments are possible: // for single-line comments and /* */ for multi-line comments.

Basic token types

In essence, the Assembler can distinguish 4 different token types (based on the StreamTokenizer) tokens:

  • number: any sequence of 0-9, starting with - for negative numbers, and containing a single . for non-integer numbers.
  • word: any sequence of -, ., 0-9, A-Z, a-z, and all characters with a value greater than, or equal to 240 but less than, or equal to 255. A word must not start with a number.
  • string: any sequence of characters surrounded by double quotes (")
    • string can contain escaped characters:
      • \a for the bell character
      • \b for the backspace character
      • \f for the new page character
      • \n for the new line character
      • \r for the carriage return character
      • \t for the horizontal tab character
      • \v for the vertical tab character
    • Additionally string can contain octal-escaped characters: \xxx where x is a 0-7 digit (up to \377).
  • character: a single character, surrounded by single quotes (')
    • character follows the same escape rules as string, e.g. '\n' and '\177' are valid characters.

Types

Types generally follow the Java syntax, albeit less restrictive: any word can be a type, and any type can be succeeded by [] to denote an array. Method arguments also follow Java, with the important distinction that no argument names are specified.

type := word
type := type []
methodArguments := ( )
methodArguments := ( (type ,)* type )

Access Flags

In most cases, every Java bytecode access flag can be combined, even if these combinations would be meaningless, or illegal for the JVM. An exception to this are some class access flags, which are conveniently expressed as class types rather than access flags.

classAccessFlag := public
classAccessFlag := private
classAccessFlag := protected
classAccessFlag := static
classAccessFlag := final
classAccessFlag := super
classAccessFlag := synchronized
classAccessFlag := volatile
classAccessFlag := transient
classAccessFlag := bridge
classAccessFlag := varargs
classAccessFlag := native
classAccessFlag := abstract
classAccessFlag := strictfp
classAccessFlag := synthetic
classAccessFlag := mandated
classAccessFlag := open
classAccessFlag := transitive
classAccessFlag := static_phase
accessFlag := classAccessFlag
accessFlag := module
accessFlag := enum
accessFlag := interface
accessFlag := annotation

Class files

classFile := import* version class
import := import type ;
version := version number ;

As in Java, it's possible to import classes at the top of the file. Only fully qualified class names are allowed, no wildcard or static imports are supported.

Every file should also declare the Java version to assemble for. Valid versions are: 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 5. 1.6, 6, 1.7, 7, 1.8, 8, 1.9, 9, 10, 11, 12, and 13.

Example:

import java.lang.String;
import java.lang.System;
import java.io.PrintStream;
version 12;
public class MyClass {
    public static void main(final String[] args) {
        getstatic System#PrintStream out
        ldc "Hello World!"
        invokevirtual PrintStream#void println(String)
        return
    }
}

Classes

class := classAccessFlag* classType word superClassSpecifier classInterfacesSpecifier attributes? classBody
class := classAccessFlag* interfaceType word interfaceInterfacesSpecifier attributes? classBody

classType := class
classType := enum
classType := module
interfaceType := interface
interfaceType := @interface

superClassSpecifier := extends word
classInterfacesSpecifier := implements (word ,)* word
interfaceInterfacesSpecifier := extends (word ,)* word
classBody := ;
classBody := { classMember* }

Classes are defined analogously to Java: classes and enums can extend superclasses and implement interfaces. Even though the syntax also allows modules to extend a superclass and implement interfaces, this is illegal for the JVM. Interfaces and annotations (@interface) use the extends keywords as syntactic sugar to implement interfaces. Additionally, classes extend java.lang.Object by default, emums extend java.lang.Enum by default, and annotations implement java.lang.annotation.Annotation by default.

Classes can optionally declare attributes and class members.

Examples:

public class MyException extends RuntimeException { // No attributes
    // No fields
    // No methods
}

public enum MyEnum; // Extends java.lang.Enum by default, no attributes, no fields, no methods

public @interface MyAnnotatation [ // Implements java.lang.annotation.Annotation by default
    Synthetic;
    Deprecated;
]; // No fields, no methods

public module module-info [ // Does not extend java.lang.Object by default
    // No attributes
] {
    // No fields, no methods
}

public class MyClass; // Extends java.lang.Object by default, no attributes, no fields, no methods

Class members

classMember := field
classMember := method

field := accessFlag* type word (= fieldConstant)? attributes? ;

method := accessFlag* attributes? methodBody
method := accessFlag* type methodName methodArgumentsDefinition methodThrows? attributes? methodBody
methodBody := ;
methodBody := { instruction* }
methodName := word
methodName := <init>
methodName := <clinit>
methodThrows := throws (type ,)* type
methodArgumentsDefinition := ( )
methodArgumentsDefinition := ( (methodArgumentDefinition ,)* methodArgumentDefinition )
methodArgumentDefinition := accessFlag* type word?

Fields and methods are also defined analogously to Java.

Fields can be initialized using the equals sign (=), which will set the ConstantValue attribute. Although this syntax is always valid, this initialization is only legal for the JVM if the field has static and final access flags. The loadable constant type does not have to match the field type, indeed, some combinations are perfectly valid: a boolean field can be initialized using an intConstant. Fields can also optionally declare attributes.

Examples:

public static final int INT_FIELD = 0; // No attributes

public static final boolean BOOLEAN_FIELD = 1; // The loadable constant type does not have to match the field type.
java.lang.String myStringField [
    Deprecated;
    Synthetic;
];

protected transient char someChar = 'a' []; // This is illegal for the JVM, as the field is not static and final.

A method can be declared in class initializer format or regular Java method format. Method arguments can contain access flags and method names, and the method can have a throws clause. Methods can also optionally have a method attribute, which will set the Code attribute, and declare other attributes.

Examples:

static { // No attributes
    return
} // Class initializer

public static final void main(synthetic final java.lang.String[] args) throws java.lang.Throwable [
    Deprecated;
    Synthetic;
] {
    new java.lang.Exception
    dup
    invokespecial java.lang.Exception#void <init>()
    athrow
}

public void <init>() [
    Code {
        aload_0
        invokespecial java.lang.Object#void <init>()
        return
    } // Explicit code attribute
]; // No method body

Constants

fieldReference := type # type word
fieldReference := # type word
methodReference := type # type word methodArguments
methodReference := # type word methodArguments
invokedynamicReference := number type word methodArguments

methodHandle := getstatic fieldReference
methodHandle := putstatic fieldReference
methodHandle := getfield fieldReference
methodHandle := putfield fieldReference
methodHandle := invokevirtual methodReference
methodHandle := invokestatic methodReference
methodHandle := invokespecial methodReference
methodHandle := newinvokespecial methodReference
methodHandle := invokeinterface methodReference

If the first type of fieldReference or methodReference is not supplied, the type will be the type of the current class being assembled. In other words, these notations are shorthand for accessing fields or invoking methods of the current class.

invokedynamicReference has 4 arguments: the index in the BootstrapMethods attribute, the return type of the method, the name of the method, and the method arguments.

Because the Assembler has to know which constant to assign a value to, there are multiple notations for most constants. Some constants have a defining format, for example in the case of booleanConstant, however it's always possible to explicitly provide the type of the constant, in Java 'cast' format.

boolean := true
boolean := false
doubleLiteralSuffix := D
doubleLiteralSuffix := d
floatLiteralSuffix := F
floatLiteralSuffix := f
longLiteralSuffix := L
longLiteralSuffix := l

booleanConstant := boolean
booleanConstant := (boolean) boolean
booleanConstant := (boolean) number
byteConstant := (byte) number
charConstant := character
charConstant := (char) character
charConstant := (char) number
doubleConstant := number doubleLiteralSuffix
doubleConstant := (double) number doubleLiteralSuffix
doubleConstant := (double) number
floatConstant := number floatLiteralSuffix
floatConstant := (float) number floatLiteralSuffix
floatConstant := (float) number
intConstant := number
intConstant := (int) number
longConstant := number longLiteralSuffix
longConstant := (long) number longLiteralSuffix
longConstant := (long) number
shortConstant := (short) number

stringConstant := string
stringConstant := (String) string
classConstant := type
classConstant := (Class) type
methodHandleConstant := (MethodHandle) methodHandle
methodTypeConstant := (MethodType) type methodArguments
dynamicConstant := (Dynamic) number type word

fieldConstant := booleanConstant
fieldConstant := byteConstant
fieldConstant := charConstant
fieldConstant := doubleConstant
fieldConstant := floatConstant
fieldConstant := intConstant
fieldConstant := longConstant
fieldConstant := shortConstant
fieldConstant := stringConstant

loadableConstant := fieldConstant
loadableConstant := classConstant
loadableConstant := methodHandleConstant
loadableConstant := methodTypeConstant
loadableConstant := dynamicConstant

booleanConstant, byteConstant, charConstant, intConstant, and shortConstant are all converted to integer constants by the Assembler. This means that, in most cases, those constants are indistinguishable in the compiled class file.

dynamicConstant has 3 arguments: the index in the BootstrapMethods attribute, the type of the constant and the name of the constant.

Attributes

attributes := [ attribute* ]

Some attributes are not explicitly parsed by the Assembler, but handled in a special way:

  • ConstantValue: assignment similar to Java (see section Fields)
  • MethodParameters: parameter access flags and names similar to Java (see section Methods)
  • Exceptions: methods throw exceptions similar to Java (see section Methods)
  • StackMap and StackMapTable: code is preverified by the ProGuard preverifier and these attributes are generated automatically.
  • LineNumberTable, LocalVariableTable, and LocalVariableTypeTable: using pseudo-instructions in the code (see subsection Code attribute)

These attributes can not be defined explicitly, and will not be printed explicitly by the Disassembler

BootstrapMethods attribute

attribute := BootstrapMethods { bootstrapMethod* }
bootstrapMethod := methodHandle { bootstrapMethodArgument* }
bootstrapMethodArgument := loadableConstant ;

Example:

BootstrapMethods {
    invokestatic java.lang.invoke.StringConcatFactory#java.lang.invoke.CallSite makeConcatWithConstants(java.lang.invoke.MethodHandles$Lookup, java.lang.String, java.lang.invoke.MethodType, java.lang.String, java.lang.Object[]) {
        "abc \001 def";
    }
}

SourceFile attribute

attribute := SourceFile string ;

Example: SourceFile "Assembler.java";

SourceDir attribute

attribute := SourceDir string ;

Example: SourceDir "My Source Directory";

InnerClasses attribute

attribute := InnerClasses { innerClass* }
innerClass := classAccessFlag* innerClassType innerName? outerClass? ;
innerClassType := classType
innerClassType := interfaceType
innerName := as word
outerClass := in type

Both innerName and outerClass are optional. Note that even though module is a valid class type, it has no valid meaning in inner classes in Java bytecode.

Example:

InnerClasses {
    public class InnerClass as InnerName in OuterClass;
    public static @interface InnerAnnotation as Annotation;
    public enum InnerEnum in EnclosingClass;
    private module InnerModule;
}

EnclosingMethod attribute

attribute := EnclosingMethod enclosingClass enclosingMethod? ;
enclosingClass := type
enclosingMethod := # type word methodArguments

Although the enclosing class always has to be specified, enclosingMethod is optional.

Example:

EnclosingMethod EnclosingClass # void enclosingMethod(java.lang.String, java.lang.Object);
EnclosingMethod AnotherEnclosingClass;

NestHost attribute

attribute := NestHost type ;

Example:

NestHost java.lang.Class;

NestMembers attribute

attribute := NestMembers { nestMember* }
nestMember := type ;

Example:

NestMembers {
    java.lang.Class;
    java.lang.String;
}

Deprecated attribute

attribute := Deprecated ;

Synthetic attribute

attribute := Synthetic ;

Signature attribute

attribute := Signature string ;

Example:

Signature "Ljava/lang/Enum<LType;>;";

Code attribute

attribute := Code { instruction* } attributes?

Instructions

instruction := nop
instruction := aconst_null
instruction := iconst_m1
instruction := iconst_0
instruction := iconst_1
instruction := iconst_2
instruction := iconst_3
instruction := iconst_4
instruction := iconst_5
instruction := lconst_0
instruction := lconst_1
instruction := fconst_0
instruction := fconst_1
instruction := fconst_2
instruction := dconst_0
instruction := dconst_1
instruction := bipush number
instruction := sipush number
instruction := ldc loadableConstant
instruction := ldc_w loadableConstant
instruction := ldc2_w loadableConstant
instruction := iload number
instruction := lload number
instruction := fload number
instruction := dload number
instruction := aload number
instruction := iload_0
instruction := iload_1
instruction := iload_2
instruction := iload_3
instruction := lload_0
instruction := lload_1
instruction := lload_2
instruction := lload_3
instruction := fload_0
instruction := fload_1
instruction := fload_2
instruction := fload_3
instruction := dload_0
instruction := dload_1
instruction := dload_2
instruction := dload_3
instruction := aload_0
instruction := aload_1
instruction := aload_2
instruction := aload_3
instruction := iaload
instruction := laload
instruction := faload
instruction := daload
instruction := aaload
instruction := baload
instruction := caload
instruction := saload
instruction := istore number
instruction := lstore number
instruction := fstore number
instruction := dstore number
instruction := astore number
instruction := istore_0
instruction := istore_1
instruction := istore_2
instruction := istore_3
instruction := lstore_0
instruction := lstore_1
instruction := lstore_2
instruction := lstore_3
instruction := fstore_0
instruction := fstore_1
instruction := fstore_2
instruction := fstore_3
instruction := dstore_0
instruction := dstore_1
instruction := dstore_2
instruction := dstore_3
instruction := astore_0
instruction := astore_1
instruction := astore_2
instruction := astore_3
instruction := iastore
instruction := lastore
instruction := fastore
instruction := dastore
instruction := aastore
instruction := bastore
instruction := castore
instruction := sastore
instruction := pop
instruction := pop2
instruction := dup
instruction := dup_x1
instruction := dup_x2
instruction := dup2
instruction := dup2_x1
instruction := dup2_x2
instruction := swap
instruction := iadd
instruction := ladd
instruction := fadd
instruction := dadd
instruction := isub
instruction := lsub
instruction := fsub
instruction := dsub
instruction := imul
instruction := lmul
instruction := fmul
instruction := dmul
instruction := idiv
instruction := ldiv
instruction := fdiv
instruction := ddiv
instruction := irem
instruction := lrem
instruction := frem
instruction := drem
instruction := ineg
instruction := lneg
instruction := fneg
instruction := dneg
instruction := ishl
instruction := lshl
instruction := ishr
instruction := lshr
instruction := iushr
instruction := lushr
instruction := iand
instruction := land
instruction := ior
instruction := lor
instruction := ixor
instruction := lxor
instruction := iinc number number
instruction := i2l
instruction := i2f
instruction := i2d
instruction := l2i
instruction := l2f
instruction := l2d
instruction := f2i
instruction := f2l
instruction := f2d
instruction := d2i
instruction := d2l
instruction := d2f
instruction := i2b
instruction := i2c
instruction := i2s
instruction := lcmp
instruction := fcmpl
instruction := fcmpg
instruction := dcmpl
instruction := dcmpg
instruction := ifeq label
instruction := ifne label
instruction := iflt label
instruction := ifge label
instruction := ifgt label
instruction := ifle label
instruction := if_icmpeq label
instruction := if_icmpne label
instruction := if_icmplt label
instruction := if_icmpge label
instruction := if_icmpgt label
instruction := if_icmple label
instruction := if_acmpeq label
instruction := if_acmpne label
instruction := goto label
instruction := jsr label
instruction := ret number
instruction := tableswitch { switchCase* }
instruction := lookupswitch { switchCase* }
instruction := ireturn
instruction := lreturn
instruction := freturn
instruction := dreturn
instruction := areturn
instruction := return
instruction := getstatic fieldReference
instruction := putstatic fieldReference
instruction := getfield fieldReference
instruction := putfield fieldReference
instruction := invokevirtual methodReference
instruction := invokespecial methodReference
instruction := invokestatic methodReference
instruction := invokeinterface methodReference
instruction := invokedynamic invokedynamicReference
instruction := new type
instruction := newarray type
instruction := anewarray type
instruction := arraylength
instruction := athrow
instruction := checkcast type
instruction := instanceof type
instruction := monitorenter
instruction := monitorexit
instruction := multianewarray type number
instruction := ifnull label
instruction := ifnonnull label
instruction := goto_w label
instruction := jsr_w label

switchCase := case number : label
switchCase := default : label

Note that the wide instruction is not present, this instruction is replaced by the pseudo-instructions:

instruction := iload_w number
instruction := lload_w number
instruction := fload_w number
instruction := dload_w number
instruction := aload_w number
instruction := istore_w number
instruction := lstore_w number
instruction := fstore_w number
instruction := dstore_w number
instruction := astore_w number
instruction := iinc_w number number
instruction := ret_w number

Furthermore, pseudo-instructions exist for labels, try-catch blocks, local variables, local variable types, and line numbers:

instruction := label :
instruction := catch type label label
instruction := catch any label label
instruction := startlocalvar number type word
instruction := endlocalvar number
instruction := startlocalvartype number string word
instruction := endlocalvartype number
instruction := line number

label := word

A catch pseudo-instruction specifies an exception handler at the location of the pseudo-instruction. The catch type, start, end, and handler will be added to the exception table in the Code attribute.

startlocalvar and startlocalvartype, endlocalvar and endlocalvartype, specify the start or end of a local variable or local variable type, respectively. These pseudo-instructions modify the LocalVariableTable or LocalVariableTypeTable attributes in the Code attribute. The number defines the index of the local variable or local variable type. A startlocalvar and startlocalvartype must always have an accompanying endlocalvar or endlocalvartype, placed after the startlocalvar or startlocalvartype in the instructions.

line specifies the line number at a position in the bytecode. The line number and bytecode offset will be stored in a LineNumberTable attribute.

Annotations attributes

attribute := RuntimeVisibleAnnotations { annotation* }
attribute := RuntimeInvisibleAnnotations { annotation* }
attribute := RuntimeVisibleParameterAnnotations { parameterAnnotation* }
attribute := RuntimeInvisibleParameterAnnotations { parameterAnnotation* }
attribute := RuntimeVisibleTypeAnnotations { typeAnnotation* }
attribute := RuntimeInvisibleTypeAnnotations { typeAnnotation* }
attribute := AnnotationDefault elementValue

annotation := type { (word = elementValue)* }
parameterAnnotation := { annotation* }
typeAnnotation := annotation targetInfo { typePath* }

Examples:

RuntimeVisibleAnnotations {
    java.lang.Deprecated {
        since = "sinceVersion";
        forRemoval = true;
    }
}
RuntimeInvisibleAnnotations {
    java.lang.Deprecated {} // Empty values
}
RuntimeVisibleParameterAnnotations {
    {} // Empty annotations for parameter 0
    {
        java.lang.Deprecated {
            since = "sinceVersion";
            forRemoval = true;
        }
    }
}
RuntimeInvisibleParameterAnnotations {
    {
        java.lang.Deprecated {} // Empty values
    }
    {} // Empty annotations for parameter 1
    {} // Empty annotations for parameter 2
    {} // Empty annotations for parameter 3
}
RuntimeVisibleTypeAnnotations {
    java.lang.Deprecated {
        since = "sinceVersion";
        forRemoval = true;
    } local_variable {
        start0 end0 0;
        start10 end10 10;
    } {} // Empty type path
}
RuntimeVisibleTypeAnnotations {
    java.lang.Deprecated {} argument_generic_method_new newLabel 1 {
        array;
        type_argument 1;
    }
}
RuntimeInvisibleTypeAnnotations {
    java.lang.Deprecated {} field {} // Empty values, empty type path
}
AnnotationDefault {
    false; // Boolean element value
    true; // Boolean element value
    (byte) 1; // Byte element value
    '2'; // Char element value
    3.0D; // Double element value
    4F; // Float element value
    5; // Int element value
    6l; // Long element value
    (short) 7; // Short element value
    "string"; // String element value
    java.lang.Class; // Class element value
    Enum#Constant; // Enum constant element value
    @java.lang.Deprecated {} // Annotation element value
    {} // Array element value
} // Array element value

Element values

elementValue := booleanConstant ;
elementValue := byteConstant ;
elementValue := charConstant ;
elementValue := doubleConstant ;
elementValue := floatConstant ;
elementValue := intConstant ;
elementValue := longConstant ;
elementValue := shortConstant ;
elementValue := stringConstant ;
elementValue := classConstant ;

elementValue := (Enum) type # word ;
elementValue := type # word ;
elementValue := (Annotation) annotation
elementValue := @ annotation
elementValue := (Array) { elementValue* }
elementValue := { elementValue* }

Apart from the usual primitive constants, string constants, and class constants, element values can also denote enum constants (enum type + constant name), annotations and arrays. Note that annotation element values and array element values do not end with a ;, as they already (either implicitly or explicitly) end with a }.

Target infos

targetInfo := parameter_generic_class number
targetInfo := parameter_generic_method number
targetInfo := extends number
targetInfo := bound_generic_class number number
targetInfo := bound_generic_method number number
targetInfo := field
targetInfo := return
targetInfo := receiver
targetInfo := parameter number
targetInfo := throws number
targetInfo := local_variable { localVar* }
targetInfo := resource_variable { localVar* }
targetInfo := catch number
targetInfo := instance_of label
targetInfo := new label
targetInfo := method_reference_new label
targetInfo := method_reference label
targetInfo := cast label number
targetInfo := argument_generic_method_new label number
targetInfo := argument_generic_method label number
targetInfo := argument_generic_method_reference_new label number
targetInfo := argument_generic_method_reference label number

localVar := label label number ;

In general, the arguments of the target infos roughly match the ones specified in the Class File Format specification.

Type path

typePath := array number? ;
typePath := inner_type number? ;
typePath := wildcard number? ;
typePath := type_argument number? ;

Although every type path has an optional number argument, this argument only has meaning in combination with type_argument. In that case, the number denotes which type argument is annotated (see the Class File Format specification for more details).

Module attribute

attribute := Module accessFlag* word word? { moduleDirective* }

moduleDirective := requires accessFlag* word word? ;
moduleDirective := exports accessFlag* type exportsTo? ;
moduleDirective := opens accessFlag* type opensTo? ;
moduleDirective := uses type ;
moduleDirective := provides type providesWith? ;

exportsTo := to (word ,)* word
opensTo := to (word ,)* word
providesWith := with (type ,)* type

The module attribute specifies the module access flags, the module name, and an optional module version. As the module version must be a word, it can not start with a number.

exports, opens, and provides all have optional arguments specifying the directive. These arguments use the same syntax as their Java counterparts.

Example:

Module open synthetic mandated ModuleName v1.0 {
    requires transitive   some.package.RequiredModule v1.0;
    requires static_phase some.package.OtherRequiredModule;
    requires synthetic    some.package.SyntheticRequiredModule alpha;
    requires mandated     some.package.MandatedRequiredModule beta;

    exports synthetic some.package.exportedpackage;
    exports mandated  some.package.mandated.exportedpackage to some.package.export.to.package, some.package.export.to.otherpackage, some.package.export.to.finalpackage;

    opens synthetic some.package.openedpackage;
    opens mandated  some.package.mandated.openedpackage to some.package.open.to.package, some.package.open.to.otherpackage, some.package.open.to.finalpackage;

    uses some.package.UsedClass;
    uses some.package.OtherUsedClass;
    uses some.package.MoreUsedClass;
    uses some.package.FinalUsedClass;

    provides some.package.ProvidedClass;
    provides some.package.OtherProvidedClass with some.package.OtherProvidedClassImpl, some.package.OtherProvidedClassImpl1;
    provides some.package.FinalProvidedClass;
}

ModuleMainClass attribute

attribute := ModuleMainClass type ;

Example:

ModuleMainClass some.package.ModuleMainClass;

ModulePackages attribute

attribute := ModulePackages { type* }

Example:

ModulePackages {
    some.package;
    some.other.package;
}