Skip to content

Commit

Permalink
MachInst lowering logic: allow effectful instructions to merge.
Browse files Browse the repository at this point in the history
This PR updates the "coloring" scheme that accounts for side-effects in
the MachInst lowering logic. As a result, the new backends will now be
able to merge effectful operations (such as memory loads) *into* other
operations; previously, only the other way (pure ops merged into
effectful ops) was possible. This will allow, for example, a load+ALU-op
combination, as is common on x86. It should even allow a load + ALU-op +
store sequence to merge into one lowered instruction.

The scheme arose from many fruitful discussions with @julian-seward1
(thanks!); significant credit is due to him for the insights here.

The first insight is that given the right basic conditions, i.e.  that
the root instruction is the only use of an effectful instruction's
result, all we need is that the "color" of the effectful instruction is
*one less* than the color of the current instruction. It's easier to
think about colors on the program points between instructions: if the
color coming *out* of the first (effectful def) instruction and *in* to
the second (effectful or effect-free use) instruction are the same, then
they can merge. Basically the color denotes a version of global state;
if the same, then no other effectful ops happened in the meantime.

The second insight is that we can keep state as we scan, tracking the
"current color", and *update* this when we sink (merge) an op. Hence
when we sink a load into another op, we effectively *re-color* every
instruction it moved over; this may allow further sinks.

Consider the example (and assume that we consider loads effectful in
order to conservatively ensure a strong memory model; otherwise, replace
with other effectful value-producing insts):

```
  v0 = load x
  v1 = load y
  v2 = add v0, 1
  v3 = add v1, 1
```

Scanning from bottom to top, we first see the add producing `v3` and we
can sink the load producing `v1` into it, producing a load + ALU-op
machine instruction. This is legal because `v1` moves over only `v2`,
which is a pure instruction. Consider, though, `v2`: under a simple
scheme that has no other context, `v0` could not sink to `v2` because it
would move over `v1`, another load. But because we already sunk `v1`
down to `v3`, we are free to sink `v0` to `v2`; the update of the
"current color" during the scan allows this.

This PR also cleans up the `LowerCtx` interface a bit at the same time:
whereas previously it always gave some subset of (constant, mergeable
inst, register) directly from `LowerCtx::get_input()`, it now returns
zero or more of (constant, mergable inst) from
`LowerCtx::maybe_get_input_as_source_or_const()`, and returns the
register only from `LowerCtx::put_input_in_reg()`. This removes the need
to explicitly denote uses of the register, so it's a little safer.

Note that this PR does not actually make use of the new ability to merge
loads into other ops; that will come in future PRs, especially to
optimize the `x64` backend by using direct-memory operands.
  • Loading branch information
cfallin committed Nov 11, 2020
1 parent 9ced345 commit d2c8f1b
Show file tree
Hide file tree
Showing 6 changed files with 252 additions and 175 deletions.
17 changes: 8 additions & 9 deletions cranelift/codegen/src/isa/aarch64/lower.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ pub(crate) enum ResultRegImmShift {

/// Lower an instruction input to a 64-bit constant, if possible.
pub(crate) fn input_to_const<C: LowerCtx<I = Inst>>(ctx: &mut C, input: InsnInput) -> Option<u64> {
let input = ctx.get_input(input.insn, input.input);
let input = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
input.constant
}

Expand Down Expand Up @@ -171,7 +171,7 @@ pub(crate) fn put_input_in_reg<C: LowerCtx<I = Inst>>(
debug!("put_input_in_reg: input {:?}", input);
let ty = ctx.input_ty(input.insn, input.input);
let from_bits = ty_bits(ty) as u8;
let inputs = ctx.get_input(input.insn, input.input);
let inputs = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
let in_reg = if let Some(c) = inputs.constant {
// Generate constants fresh at each use to minimize long-range register pressure.
let masked = if from_bits < 64 {
Expand All @@ -189,8 +189,7 @@ pub(crate) fn put_input_in_reg<C: LowerCtx<I = Inst>>(
}
to_reg.to_reg()
} else {
ctx.use_input_reg(inputs);
inputs.reg
ctx.put_input_in_reg(input.insn, input.input)
};

match (narrow_mode, from_bits) {
Expand Down Expand Up @@ -272,7 +271,7 @@ fn put_input_in_rs<C: LowerCtx<I = Inst>>(
input: InsnInput,
narrow_mode: NarrowValueMode,
) -> ResultRS {
let inputs = ctx.get_input(input.insn, input.input);
let inputs = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
if let Some((insn, 0)) = inputs.inst {
let op = ctx.data(insn).opcode();

Expand Down Expand Up @@ -305,7 +304,7 @@ fn put_input_in_rse<C: LowerCtx<I = Inst>>(
input: InsnInput,
narrow_mode: NarrowValueMode,
) -> ResultRSE {
let inputs = ctx.get_input(input.insn, input.input);
let inputs = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
if let Some((insn, 0)) = inputs.inst {
let op = ctx.data(insn).opcode();
let out_ty = ctx.output_ty(insn, 0);
Expand Down Expand Up @@ -1040,7 +1039,7 @@ pub(crate) fn maybe_input_insn<C: LowerCtx<I = Inst>>(
input: InsnInput,
op: Opcode,
) -> Option<IRInst> {
let inputs = c.get_input(input.insn, input.input);
let inputs = c.maybe_get_input_as_source_or_const(input.insn, input.input);
debug!(
"maybe_input_insn: input {:?} has options {:?}; looking for op {:?}",
input, inputs, op
Expand Down Expand Up @@ -1080,14 +1079,14 @@ pub(crate) fn maybe_input_insn_via_conv<C: LowerCtx<I = Inst>>(
op: Opcode,
conv: Opcode,
) -> Option<IRInst> {
let inputs = c.get_input(input.insn, input.input);
let inputs = c.maybe_get_input_as_source_or_const(input.insn, input.input);
if let Some((src_inst, _)) = inputs.inst {
let data = c.data(src_inst);
if data.opcode() == op {
return Some(src_inst);
}
if data.opcode() == conv {
let inputs = c.get_input(src_inst, 0);
let inputs = c.maybe_get_input_as_source_or_const(src_inst, 0);
if let Some((src_inst, _)) = inputs.inst {
let data = c.data(src_inst);
if data.opcode() == op {
Expand Down
9 changes: 5 additions & 4 deletions cranelift/codegen/src/isa/aarch64/lower_inst.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2935,10 +2935,11 @@ pub(crate) fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(
// register. We simply use the variant of the add instruction that
// sets flags (`adds`) here.

// Ensure that the second output isn't directly called for: it
// should only be used by a flags-consuming op, which will directly
// understand this instruction and merge the comparison.
assert!(!ctx.is_reg_needed(insn, ctx.get_output(insn, 1).to_reg()));
// Note that the second output (the flags) need not be generated,
// because flags are never materialized into a register; the only
// instructions that can use a value of type `iflags` or `fflags`
// will look directly for the flags-producing instruction (which can
// always be found, by construction) and merge it.

// Now handle the iadd as above, except use an AddS opcode that sets
// flags.
Expand Down
5 changes: 2 additions & 3 deletions cranelift/codegen/src/isa/arm32/lower.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ pub(crate) fn input_to_reg<C: LowerCtx<I = Inst>>(
) -> Reg {
let ty = ctx.input_ty(input.insn, input.input);
let from_bits = ty.bits() as u8;
let inputs = ctx.get_input(input.insn, input.input);
let inputs = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
let in_reg = if let Some(c) = inputs.constant {
let to_reg = ctx.alloc_tmp(Inst::rc_for_type(ty).unwrap(), ty);
for inst in Inst::gen_constant(to_reg, c, ty, |reg_class, ty| ctx.alloc_tmp(reg_class, ty))
Expand All @@ -78,8 +78,7 @@ pub(crate) fn input_to_reg<C: LowerCtx<I = Inst>>(
}
to_reg.to_reg()
} else {
ctx.use_input_reg(inputs);
inputs.reg
ctx.put_input_in_reg(input.insn, input.input)
};

match (narrow_mode, from_bits) {
Expand Down
6 changes: 3 additions & 3 deletions cranelift/codegen/src/isa/arm32/lower_inst.rs
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ pub(crate) fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(
}
Opcode::Trueif => {
let cmp_insn = ctx
.get_input(inputs[0].insn, inputs[0].input)
.maybe_get_input_as_source_or_const(inputs[0].insn, inputs[0].input)
.inst
.unwrap()
.0;
Expand Down Expand Up @@ -344,7 +344,7 @@ pub(crate) fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(
} else {
// Verification ensures that the input is always a single-def ifcmp.
let cmp_insn = ctx
.get_input(inputs[0].insn, inputs[0].input)
.maybe_get_input_as_source_or_const(inputs[0].insn, inputs[0].input)
.inst
.unwrap()
.0;
Expand Down Expand Up @@ -471,7 +471,7 @@ pub(crate) fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(
}
Opcode::Trapif => {
let cmp_insn = ctx
.get_input(inputs[0].insn, inputs[0].input)
.maybe_get_input_as_source_or_const(inputs[0].insn, inputs[0].input)
.inst
.unwrap()
.0;
Expand Down
54 changes: 22 additions & 32 deletions cranelift/codegen/src/isa/x64/lower.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ fn matches_input<C: LowerCtx<I = Inst>>(
input: InsnInput,
op: Opcode,
) -> Option<IRInst> {
let inputs = ctx.get_input(input.insn, input.input);
let inputs = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
inputs.inst.and_then(|(src_inst, _)| {
let data = ctx.data(src_inst);
if data.opcode() == op {
Expand All @@ -77,7 +77,7 @@ fn matches_input_any<C: LowerCtx<I = Inst>>(
input: InsnInput,
ops: &[Opcode],
) -> Option<IRInst> {
let inputs = ctx.get_input(input.insn, input.input);
let inputs = ctx.maybe_get_input_as_source_or_const(input.insn, input.input);
inputs.inst.and_then(|(src_inst, _)| {
let data = ctx.data(src_inst);
for &op in ops {
Expand All @@ -89,14 +89,9 @@ fn matches_input_any<C: LowerCtx<I = Inst>>(
})
}

fn lowerinput_to_reg(ctx: Ctx, input: LowerInput) -> Reg {
ctx.use_input_reg(input);
input.reg
}

/// Put the given input into a register, and mark it as used (side-effect).
fn put_input_in_reg(ctx: Ctx, spec: InsnInput) -> Reg {
let input = ctx.get_input(spec.insn, spec.input);
let input = ctx.maybe_get_input_as_source_or_const(spec.insn, spec.input);

if let Some(c) = input.constant {
// Generate constants fresh at each use to minimize long-range register pressure.
Expand All @@ -118,7 +113,7 @@ fn put_input_in_reg(ctx: Ctx, spec: InsnInput) -> Reg {
}
cst_copy.to_reg()
} else {
lowerinput_to_reg(ctx, input)
ctx.put_input_in_reg(spec.insn, spec.input)
}
}

Expand Down Expand Up @@ -165,16 +160,11 @@ fn extend_input_to_reg(ctx: Ctx, spec: InsnInput, ext_spec: ExtSpec) -> Reg {
dst.to_reg()
}

fn lowerinput_to_reg_mem(ctx: Ctx, input: LowerInput) -> RegMem {
// TODO handle memory.
RegMem::reg(lowerinput_to_reg(ctx, input))
}

/// Put the given input into a register or a memory operand.
/// Effectful: may mark the given input as used, when returning the register form.
fn input_to_reg_mem(ctx: Ctx, spec: InsnInput) -> RegMem {
let input = ctx.get_input(spec.insn, spec.input);
lowerinput_to_reg_mem(ctx, input)
// TODO handle memory; merge a load directly, if possible.
RegMem::reg(ctx.put_input_in_reg(spec.insn, spec.input))
}

/// Returns whether the given input is an immediate that can be properly sign-extended, without any
Expand All @@ -193,23 +183,24 @@ fn lowerinput_to_sext_imm(input: LowerInput, input_ty: Type) -> Option<u32> {
}

fn input_to_sext_imm(ctx: Ctx, spec: InsnInput) -> Option<u32> {
let input = ctx.get_input(spec.insn, spec.input);
let input = ctx.maybe_get_input_as_source_or_const(spec.insn, spec.input);
let input_ty = ctx.input_ty(spec.insn, spec.input);
lowerinput_to_sext_imm(input, input_ty)
}

fn input_to_imm(ctx: Ctx, spec: InsnInput) -> Option<u64> {
ctx.get_input(spec.insn, spec.input).constant
ctx.maybe_get_input_as_source_or_const(spec.insn, spec.input)
.constant
}

/// Put the given input into an immediate, a register or a memory operand.
/// Effectful: may mark the given input as used, when returning the register form.
fn input_to_reg_mem_imm(ctx: Ctx, spec: InsnInput) -> RegMemImm {
let input = ctx.get_input(spec.insn, spec.input);
let input = ctx.maybe_get_input_as_source_or_const(spec.insn, spec.input);
let input_ty = ctx.input_ty(spec.insn, spec.input);
match lowerinput_to_sext_imm(input, input_ty) {
Some(x) => RegMemImm::imm(x),
None => match lowerinput_to_reg_mem(ctx, input) {
None => match input_to_reg_mem(ctx, spec) {
RegMem::Reg { reg } => RegMemImm::reg(reg),
RegMem::Mem { addr } => RegMemImm::mem(addr),
},
Expand Down Expand Up @@ -495,8 +486,6 @@ fn lower_to_amode<C: LowerCtx<I = Inst>>(ctx: &mut C, spec: InsnInput, offset: i
)
} else {
for i in 0..=1 {
let input = ctx.get_input(add, i);

// Try to pierce through uextend.
if let Some(uextend) = matches_input(
ctx,
Expand All @@ -506,7 +495,7 @@ fn lower_to_amode<C: LowerCtx<I = Inst>>(ctx: &mut C, spec: InsnInput, offset: i
},
Opcode::Uextend,
) {
if let Some(cst) = ctx.get_input(uextend, 0).constant {
if let Some(cst) = ctx.maybe_get_input_as_source_or_const(uextend, 0).constant {
// Zero the upper bits.
let input_size = ctx.input_ty(uextend, 0).bits() as u64;
let shift: u64 = 64 - input_size;
Expand All @@ -521,7 +510,7 @@ fn lower_to_amode<C: LowerCtx<I = Inst>>(ctx: &mut C, spec: InsnInput, offset: i
}

// If it's a constant, add it directly!
if let Some(cst) = input.constant {
if let Some(cst) = ctx.maybe_get_input_as_source_or_const(add, i).constant {
let final_offset = (offset as i64).wrapping_add(cst as i64);
if low32_will_sign_extend_to_64(final_offset as u64) {
let base = put_input_in_reg(ctx, add_inputs[1 - i]);
Expand Down Expand Up @@ -950,13 +939,14 @@ fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(
_ => unreachable!("unhandled output type for shift/rotates: {}", dst_ty),
};

let (count, rhs) = if let Some(cst) = ctx.get_input(insn, 1).constant {
// Mask count, according to Cranelift's semantics.
let cst = (cst as u8) & (dst_ty.bits() as u8 - 1);
(Some(cst), None)
} else {
(None, Some(put_input_in_reg(ctx, inputs[1])))
};
let (count, rhs) =
if let Some(cst) = ctx.maybe_get_input_as_source_or_const(insn, 1).constant {
// Mask count, according to Cranelift's semantics.
let cst = (cst as u8) & (dst_ty.bits() as u8 - 1);
(Some(cst), None)
} else {
(None, Some(put_input_in_reg(ctx, inputs[1])))
};

let dst = get_output_reg(ctx, outputs[0]);

Expand Down Expand Up @@ -3012,7 +3002,7 @@ fn lower_insn_to_regs<C: LowerCtx<I = Inst>>(

// Verification ensures that the input is always a single-def ifcmp.
let cmp_insn = ctx
.get_input(inputs[0].insn, inputs[0].input)
.maybe_get_input_as_source_or_const(inputs[0].insn, inputs[0].input)
.inst
.unwrap()
.0;
Expand Down
Loading

0 comments on commit d2c8f1b

Please sign in to comment.