2.5 Instructions

In this report, the term condition codes denotes conditions used by conditional branches, including the overflow condition.

On the x86/x86_64 architectures, operations such as add set overflow set iff the signed add yields an arithmetic overflow, and clear it otherwise. To achieve the same on PPC64, one must use the technique:

<<clear XER>>
addo. Dest,Src1,Src2

which first clears the XER register (see below), and addo., in case of an overflow, sets the SO flag of the XER. In either case, the overflow condition is set to the resulting SO flag, reflecting the outcome of the operation. The SO flag can then be used for conditional branching and the like.

Clearing (the SO-bit of) the XER register can be achieved in many ways. We will clear the entire XER register, using the sequence:

li 0,0
mtxer 0

which first clears R0 and then moves that into the XER register.1

Static branch prediction has not been exploited in this report, but should.

The following table shows the correspondence between IR condition codes and conditional branch instructions.

IRx86/x86_64PPC64
gujabgt
geujaebge
lujbblt
leujbeble
gjgbgt
gejgebge
ljlblt
lejleble
ejebeq
nejnebne
ojobso
nojnobns

We now list each IR instructions with its purpose and back-end specific translation.

2.5.1 move(Src,Dest)

Purpose

To copy the value of source Src into destination Dest.

Condition Codes

Undefined.

x86
x86_64

If the operands are identical, then

/* nothing */

Else if Src is the constant 0 and Dest is a register,

xor Dest,Dest

Else, for x86_64, if Src is a local label and Dest is a register, then

lea OFFSET(%rip),Dest

Else if Src is a floating-point register and Dest is in memory, then

// if x86
fstpl Dest
// else if x86_64
movsd Src,Dest

Else if Src is in memory and Dest is a floating-point register, then

// if x86
fldl Src
// else if x86_64
movsd Src,Dest

Else if one operand is a register and the other one is a register or in memory, then

mov Src,Dest

Else if both operands are in memory, then

mov Src,%rax
mov %rax,Dest

Else if Src is a 32-bit signed integer, then

mov $Src,Dest

Else if Dest is a register, then

movabs $Src,Dest

Else let r be %rdx if Dest uses %rax and %rax otherwise, and

movabs $Src,r
mov r,Dest
PPC64

[PERM: Note: std and ld treat base register R0 as zero, so this must be forbidden here.]

If Src is a floating-point register and Dest is in memory, then

stfd Src,Dest

Else if Src is in memory and Dest is a floating-point register, then

lfd Dest,Src

Else if Src is in a register and Dest is in memory, then

std Src,Dest

Else if Src is in memory and Dest is in a register, then

ld Dest,Src

Else if Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src,arg1)
std arg1,Dest

Else if Src is a register, then

mr Dest,Src

Else if Src is a signed 16-bit integer SI, then

li Dest,SI

Else if Src equals (HI<<16)+LO, where HI is a signed 16-bit integer and LO is an unsigned 16-bit integer, then

lis Dest,HI
ori Dest,Dest,LO // omit if LO = 0

Else if Src is a local label at offset OFF from [PERM: This could (and naturally will) be done for any (32bit-aligned) immediate that happens to have the value toc+OFF, with OFF a signed, multiple-of-4, 16-bit integer.] the TOC, then reduce to

[PERM: This may clobber arg0. Can Dest be arg0?].

add(toc,OFF,Dest)

Else, Src must be preallocated at offset OFF in the TOC, and [PERM: Discuss TOC allocation and toc-register handling, somewhere.]

2.5.2 cmps(Dest,Src)

Purpose

To compare Dest and Src as signed values. Dest must be a general purpose register or in memory.

Condition Codes

Overflow is undefined, the others are set.

x86
x86_64

If both operands are in memory, then reduce to

move(Src,val)
cmps(Dest,val)

Else if Src is an immediate and Dest is of the form cp(0), then

cmpw $Src,(%rcx)

Else if one operand is a register and the other one is a register or in memory, then

cmp Src,Dest

Else if Src is a 32-bit signed integer, then

cmp $Src,Dest

Else, for x86_64

movabs $Src,%r11
cmp %r11,Dest
PPC64

If Dest is of the form cp(0), then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

lwz arg0,Dest
cmps(arg0,Src)

Else if Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg0,Dest
cmps(arg0,Src)

Else if Src is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src
cmps(Dest,arg1)

Else if Src is a register, then

cmpd Dest,Src

Else if Src is a signed 16-bit integer SI, then

cmpdi Dest,SI

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src,arg1)
cmpd Dest,arg1

2.5.3 cmpu(Dest,Src)

Purpose

To compare Dest and Src as unsigned values. Dest must be a general purpose register or in memory.

Condition Codes

Overflow is undefined, the others are set.

x86
x86_64

If both operands are in memory, then reduce to

move(Src,val)
cmpu(Dest,val)

Else if Src is an immediate and Dest is of the form cp(0), then

cmpw $Src,(%rcx)

Else if one operand is a register and the other one is a register or in memory, then

cmp Src,Dest

Else if Src is a 32-bit signed integer, then

cmp $Src,Dest

Else, for x86_64

movabs $Src,%r11
cmp %r11,Dest
PPC64

If Dest is of the form cp(0), then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

lwz arg0,Dest
cmpu(arg0,Src)

Else if Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg0,Dest
cmpu(arg0,Src)

Else if Src is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src
cmpu(Dest,arg1)

Else if Src is a register, then

cmpld Dest,Src

Else if Src is an unsigned 16-bit integer UI, then

cmpldi Dest,UI

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src,arg1)
cmpld Dest,arg1

2.5.4 test(Dest,Src)

Purpose

Compute (Dest /\ Src). Src must be an immediate or ac1.

Condition Codes

Set e if the result is zero, and ne otherwise. Other condition codes are undefined.

x86
x86_64

If Dest is a register and Src is an 8-bit unsigned integer, then

testb $Src,Dest

Else if Dest translates to a memory operand r(OFFSET) and Src can be obtained by shifting an 8-bit unsigned integer c left by 8*n bits, then

testb $c,(OFFSET+n)(r)

Else [PERM: This is incorrect if Src is ac1 (i.e. in memory)]

test $Src,Dest
PPC64

If Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg0,Dest
test(arg0,Src)

Else if Src is a register (i.e. ac1), then [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

and. arg0,Dest,Src

Else if Src is a 16-bit unsigned integer, then [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

andi. arg0,Dest,Src

Else if Src is a 16-bit unsigned integer UI shifted 16 bits, then [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

andis. arg0,Dest,UI

Else if Src is a stretch of N 1-bits followed by M least significant 0-bits, then [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

rldicr. arg0,Dest,64-N-M,N-1

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src,arg1)
and. arg0,Dest,arg1

2.5.5 jump(Target)

Purpose

To transfer program control to Target.

Condition Codes

Undefined.

x86
x86_64

If Target is of the form cp(OFF), then

lea OFF(cp), %rax
jmp *%rax

Else for x86_64, if Target is not reachable with a 32-bit offset

jmp Trampoline
[...]
Trampoline: jmp *0(%rsi)
.quad Target

Else

jmp Target
PPC64

If Target is of the form cp(OFF), then OFF is nonzero, and the transfer must use the CTR register:

addi 0, cp, OFF
mtctr 0
bctr

Else if Target is a local label, then

b Target

Else, reduce to: [PERM: Do we need to use the CTR register here (e.g. can the callee be relying on CTR being set?)]

b Trampoline
[...]
Trampoline: 
move(Target,0)
mtctr 0
bctr

2.5.6 call(Target)

[PERM: FIXME: Must document plcall()]

Purpose

To transfer program control to Target, with the return address pushed on the stack or saved in a register.

Condition Codes

Undefined.

x86
x86_64

For x86_64, if Target is not reachable with a 32-bit offset

call Trampoline
[...]
Trampoline: jmp *0(%rsi)
.quad Target

Else

call Target
PPC64

There is a bug here: IR instruction: call(label(G))

Power code: bl 0x3fffb035c8e8

Problem 1: Callee expects CTR initialized.

Problem 2: Callee can escape to native_nonjit, which will access TOC[arg5].

Conclusion: call(label(_)) must emit the same sequence as call(native_entry(_))!

If Target is a local label, then

bl Target

[PERM: NOTE: using bl is sub-optimal if we will not return (via the link register) to the following instruction. See p. 36 “Use Branch instructions for which LK=1 only as subroutine calls”]

Else, reduce to the following, where the transfer must use the CTR register.

bl Trampoline
[...]
Trampoline: 
move(Target,0)
mtctr 0
bctr

[PERM: NOTE: this move must be encoded in a way that CALLEE_TOC_OFFSET in ppc64le_kernel.s4 understands! Document the requirements! We could simplify initial implementation by always putting the toc offset in a fixed register the_reg (e.g. arg5) (so CALLEE_TOC_OFFSET can just patch TOC+the_reg. We can optimize this later. Question: Presumably Target will be an immediate in these cases?]

[PERM: NOTE: the jitter must not blindly re-use same-valued TOC entries, since some entries may be changed, post-jit, by CALLEE_TOC_OFFSET users.]

2.5.7 ccall(Cond,Target)

Purpose

If Cond is true, then transfer program control to Target, with the return address pushed on the stack or saved in a register. Cond is most likely false.

Condition Codes

Undefined.

x86
x86_64

Let NCond be the negation of Cond, and

jcc NCond,1f
call(Target)
1:
PPC64

[PERM: BUG: Does this have the same problem as call to local label? (must go via CTR+TOC)]

If Target is a local label, then

bcl Cond,Target

[PERM: Is it true here, as for the call instruction, that “the transfer must use the CTR register.” (and the CALLEE_TOC_OFFSET issues)?]

Else if Trampoline is within 32764 bytes, reduce to:

bcl Cond,Trampoline
[...]
Trampoline: 
move(Target,0)
mtctr 0
bctr

Else, let NCond be the negation of Cond, and reduce to:

bc NCond, 1f
bl Trampoline
1: [...]
Trampoline: 
move(Target,0)
mtctr 0
bctr

2.5.8 branch(Cond,Target)

Purpose

To conditionally transfer program control to Target.

Condition Codes

Must preserve all condition codes except overflow, which is left undefined.

x86
x86_64

For x86_64, if Target is not reachable with a 32-bit offset

jcc Cond,Trampoline
[...]
Trampoline: jmp *0(%rsi)
.quad Target

Else

jcc Cond,Target
PPC64

Let NCond be the negation of Cond. If Target is a local label, then

// if Target is within 32764 bytes
bc Cond,Target
// else Target is not within 32764 bytes
bc NCond, 1f
b Target
1:

Else the explicit branch instruction must [PERM: is a trampoline really strictly necessary, or just desirable?] go via a trampoline:

// if Trampoline is within 32764 bytes
bc Cond,Trampoline 
// else Trampoline is not within 32764 bytes
bc NCond, 1f
b Trampoline
1: [...]
Trampoline: 
move(Target,0)
mtctr 0
bctr

2.5.9 cmove(Cond,Src,Dest)

Purpose

To conditionally copy the value of source Src into destination Dest.

Condition Codes

Undefined.

x86
x86_64

If both operands are in registers, then

cmove Cond,Src,Dest

Else, let NCond be the negation of Cond, and

jcc NCond,1f
move(Src,Dest)
1:
PPC64

If both operands are in registers, then note that neither Src nor Dest can be R0 (which would be treated as constant zero), and:

// if Cond is l or lu
isel Dest,Src,Dest,0
// else if Cond is g or gu
isel Dest,Src,Dest,1
// else if Cond is e
isel Dest,Src,Dest,2
// else if Cond is o
isel Dest,Src,Dest,3
// else if Cond is le or leu
isel Dest,Dest,Src,1
// else if Cond is ge or geu
isel Dest,Dest,Src,0
// else if Cond is ne
isel Dest,Dest,Src,2
// else if Cond is no
isel Dest,Dest,Src,3

Else, let NCond be the negation of Cond, and

bc NCond,1f
move(Src,Dest)
1:

2.5.10 add(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1+Src2) in Dest.

Condition Codes

Undefined.

x86
x86_64

If Src1 and Dest are the same memory operand and Src2 is the constant 0, then

/* nothing */

Else if Src2 is the constant 0, then the instruction reduces to

move(Src1,Dest)

Else if Src1 and Dest are the same memory operand and Src2 is a 32-bit signed integer, then

add $Src2,Dest

Else if Src1 is a register, Src2 is the 32-bit signed integer OFFSET, and Dest is a register, then

lea OFFSET(Src1),Dest

Else for x86_64, if Src1 and Dest are the same memory operand and Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
add %r11,Dest

Else if Dest is in memory, the instruction reduces to

add(Src1,Src2,val)
move(val,Dest)

Else, the instruction reduces to

move(Src1,Dest)
add(Dest,Src2,Dest)
PPC64

[PERM: An unstated assumption seems to be that Src1 is a register or in memory.]

If Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

add(Src1,Src2,arg0)
std arg0,Dest

Else if Src1 is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src1
add(arg1,Src2,Dest)

Else if Src2 is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg2,Src2
add(Src1,arg2,Dest)

Else if Src2 is a signed 16-bit integer SI, then note that Src1 cannot be R0, which would mean the constant zero, and

addi Dest,Src1,SI

Else if Src2 equals (HI<<16)+LO, where HI is a signed 16-bit integer and LO is an unsigned 16-bit integer, then note that neither register operand can be R0, which would mean the constant zero, and

addis Dest,Src1,HI
ori Dest,Dest,LO // omit if LO = 0  [PERM: NO! this is wrong for addition]

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src2,arg2)
add Dest,Src1,arg2

2.5.11 addo(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1+Src2) in Dest.

Condition Codes

Overflow set iff the signed add yields an arithmetic overflow, and cleared otherwise. Other condition codes undefined.

x86
x86_64

Src2 is an immediate.

If Src1 and Dest are the same memory operand and Src2 is a 32-bit signed integer, then

add $Src2,Dest

Else for x86_64, if Src1 and Dest are the same memory operand and Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
add %r11,Dest

Else if Dest is in memory, the instruction reduces to

addo(Src1,Src2,val)
mov val,Dest

Else, the instruction reduces to

move(Src1,Dest)
addo(Dest,Src2,Dest)
PPC64

Src1 is a register and Src2 is an immediate2.

[PERM: BUG: the arguments can be, e.g. addo(ac0, ac1, ac0), i.e. Src2 may not be an immediate.]

Reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

li 0,0
mtxer 0
move(Src2,arg2) [PERM: Wrong. Move does not preserve condition codes (so could clobber XER SO-bit).]
addo. Dest,Src1,arg2

2.5.12 sub(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1-Src2) in Dest.

Condition Codes

Undefined.

x86
x86_64

If Src1 and Dest are the same memory operand and Src2 is a 32-bit signed integer, then

sub $Src2,Dest

Else if Src1 is a register, Src2 is the 32-bit signed integer OFFSET, and Dest is a register, then

lea -OFFSET(Src1),Dest

Else for x86_64, if Src1 and Dest are the same memory operand and Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
sub %r11,Dest

Else if Dest is in memory, the instruction reduces to

sub(Src1,Src2,val)
move(val,Dest)

Else, the instruction reduces to

move(Src1,Dest)
sub(Dest,Src2,Dest)
PPC64

If Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

sub(Src1,Src2,arg0)
std arg0,Dest

Else if Src1 is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src1
sub(arg1,Src2,Dest)

Else if Src2 is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg2,Src2
sub(Src1,arg2,Dest)

Else if -Src2 is a signed 32-bit integer, then reduce to

add(Src1, -Src2, Dest)

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src2,arg2)
subf Dest,arg2,Src1

2.5.13 subo(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1-Src2) in Dest. Src2 need not be an immediate.

Condition Codes

Overflow set if the signed subtract yields an arithmetic overflow, and cleared otherwise. Other condition codes undefined.

x86
x86_64

If Src2 and Dest are the same and Src1 is the constant 0, then

neg Dest

Else if Src1 and Dest are the same memory operand and Src2 is a 32-bit signed integer, then

sub $Src2,Dest

Else for x86_64, if Src1 and Dest are the same memory operand and Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
sub %r11,Dest

Else if Dest is in memory, the instruction reduces to

subo(Src1,Src2,val)
mov val, Dest

Else, the instruction reduces to

move(Src1,Dest)
subo(Dest,Src2,Dest)
PPC64

No operand can be in memory. [PERM: Does not the same hold for the operands also for x86/x86_64? If not, why? Because registers are scarce on x86/x86_64, operands can be in memory there. –Mats]

[PERM: BUG: the arguments can be, e.g. subo(val,y(1),val), i.e. operands can be in memory.]

If Src1 is 0, then

li 0,0
mtxer 0
nego. Dest,Src2

Else if Src1 and Src2 are in registers, then

li 0,0 [PERM: What if Src2 or Src1 is R0? xref addo.]
mtxer 0
subfo. Dest,Src2,Src1

Else, reduce to, to be completed [PERM: What if Src2 is the most negative value, will overflow condition be set correctly?]

addo(Src1,-Src2,Dest)

2.5.14 mulo(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1*Src2) in Dest. Dest must be a register and Src2 must be an immediate.

[PERM: BUG: the arguments can be, e.g. mulo(ac0,ac1,val), i.e. Src2 may not be an immediate.]

Condition Codes

Overflow set if the signed multiply yields an arithmetic overflow, and cleared otherwise. Other condition codes are undefined.

x86
x86_64

For x86_64, if Src2 is not a 32-bit signed integer, then

mov Src1,Dest
movabs $Src2,%r11
mul %r11,Dest

Else

mov Src1,Dest
mul $Src2,Dest
PPC64

[PERM: FIXME: arg0..arg2 must be preserved, use something else.]

li 0,0
mtxer 0
move(Src2,arg2) [PERM: Wrong. Move does not preserve condition codes (so could clobber XER SO-bit).]
mulldo. Dest,Src1,arg2

2.5.15 sh(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1<<Src2) in Dest. Dest must be a register and Src2 must be an immediate in the range [-4,4].

Condition Codes

Undefined.

x86
x86_64

If Src1 is different from Dest, then reduce to

mov Src1,Dest
sh(Dest,Src2,Dest)

Else if Src2 > 0 then

shl $Src2,Dest

Else

shr $-Src2,Dest
PPC64

If Src1 is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src1
sh(arg1,Src2,Dest)

Else if Src2 > 0 then

sldi Dest,Src1,Src2

Else

srdi Dest,Src1,-Src2

2.5.16 and(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1/\Src2) in Dest. Src1 and Dest must be the same operand and Src2 must be an immediate.

[PERM: BUG: the arguments can be, e.g. and(x(3),x(2),val), i.e. Src1 and Dest may differ.] [PERM: BUG: the arguments can be, e.g. and(ac0,ac1,ac0), i.e. Src2 may not be an immediate.]

Condition Codes

Undefined.

x86
x86_64

For x86_64, if Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
and %r11,Dest

Else

and $Src2,Dest
PPC64

If Src2 is a 16-bit unsigned integer UI, then

andi. Dest,Src1,UI

Else if Src2 equals (HI<<16), where HI is an unsigned 16-bit integer, then

andis. Dest,Src1,HI

Else if Src2 is a stretch of N 1-bits, extending through the least significant bit, then

rldicl Dest,Src1,0,64-N

Else if Src2 is a stretch of N 1-bits, extending through the most significant bit, then

rldicr Dest,Src1,0,N-1

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src2,arg2)
and Dest,Src1,arg2

2.5.17 or(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1\/Src2) in Dest. Src1 and Dest must be the same operand and Src2 must be an immediate.

[PERM: BUG: the arguments can be, e.g. or(val,11,x(3,0)), i.e. Src1 and Dest may differ.] [PERM: BUG: the arguments can be, e.g. or(val,y(6),val) and or(ac0,ac1,ac0), i.e. Src2 may not be an immediate.]

Condition Codes

Undefined.

x86
x86_64

For x86_64, if Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
or %r11,Dest

Else

or $Src2,Dest
PPC64

If Src2 is a 16-bit unsigned integer UI, then

ori Dest,Src1,UI

Else if Src2 equals (HI<<16)+LO, where HI is an unsigned 16-bit integer and LO is an unsigned 16-bit integer, then

oris Dest,Src1,HI
ori  Dest,Dest,LO // omit if LO = 0

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src2,arg2)
or Dest,Src1,arg2

2.5.18 xor(Src1,Src2,Dest)

Purpose

To store the value of the expression (Src1 \ Src2) in Dest. Src1 and Dest must be the same operand and Src2 must be an immediate.

[PERM: BUG: the arguments can be, e.g. xor(val,-5,arg1) or xor(ac0,ac1,ac0), i.e. Src1 and Dest may differ.]

Condition Codes

Undefined.

x86
x86_64

For x86_64, if Src2 is not a 32-bit signed integer, then

movabs $Src2,%r11
xor %r11,Dest

Else

xor $Src2,Dest
PPC64

If Src2 is a 16-bit unsigned integer UI, then

xori Dest,Src1,UI

Else if Src2 equals (HI<<16)+LO, where HI is an unsigned 16-bit integer and LO is an unsigned 16-bit integer, then

xoris Dest,Src1,HI
xori  Dest,Dest,LO // omit if LO = 0

Else, reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

move(Src2,arg2)
xor Dest,Src1,arg2

2.5.19 int2cp(Src,Dest)

Purpose

To convert a tagged integer to a choicepoint pointer. Dest must be val.

Condition Codes

Undefined.

x86
mov Src,%eax
sar $1,%eax
dec %eax
add w_choice_start,%eax

note that val is %eax on x86.

x86_64
mov Src,%rax
sub $3,%rax
add w_choice_start,%rax

note that val is %rax on x86.

PPC64

If Src is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src
int2cp(arg1,Dest)

Else,

ld val,w_choice_start
addi val,val,-3
add val,Src,val

2.5.20 cp2int(Src,Dest)

Purpose

To convert a choicepoint pointer to a tagged integer. Dest cannot be val.

Condition Codes

Undefined.

x86
mov Src,%eax
sub w_choice_start,%eax
lea 3(,%eax,2),%eax
mov %eax,Dest [PERM: Can do better if Dest is a register]
x86_64
mov Src,%rax
sub w_choice_start,%rax
add $3,%rax
mov %rax,Dest [PERM: Can do better if Dest is a register]
PPC64

If Src is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

ld arg1,Src
cp2int(arg1,Dest)

Else if Dest is in memory, then reduce to [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

cp2int(Src,arg0)
std arg0,Dest

Else,

ld Dest,w_choice_start
subf Dest,Dest,Src
addi Dest,Dest,3

2.5.21 init(Dest1,Dest2)

Purpose

To create a brand new variable in the first destination, making the second destination a variable bound to the first. Dest1 must be in memory.

Condition Codes

Undefined.

x86
x86_64

If Dest1 is on the form r(0), then

mov r,Dest1
mov r,Dest2

Else if Dest2 is the register r, then

lea Dest1,r
mov r,Dest1

Else

lea Dest1,%rax
mov %rax,Dest1
mov %rax,Dest2
PPC64

Both Dest1 and Dest2 must not be based on R0 (which would mean zero in the instructions la and std).

If Dest1 is on the form r(0), then

std r,Dest1
std r,Dest2

Else if Dest2 is the register r, then

la r,Dest1
std r,Dest1 [PERM: // saner as ''std r,r'' I think]

Else [PERM: FIXME: arg0..arg2 must be preserved, use something else.]

la arg0,Dest1
std arg0,Dest1 [PERM: // saner as ''std arg0,arg0'' I think (since arg0 contains the address)]
std arg0,Dest2

2.5.22 pop

Purpose

To discard the top of the stack.

Condition Codes

Undefined.

x86
x86_64
pop %rax
PPC64
/* nothing */

2.5.23 context(Target)

Target is a local label.

Purpose

To refresh the TOC pointer.

Condition Codes

Undefined.

x86
x86_64
/* nothing */
PPC64

The CTR is assumed to contain the address of the local label (this is ensured by the caller, typically by jumping to the label using bctr or the like).

Let OFF be the offset to the TOC from Target. Reduce to

mfctr toc
add(toc,OFF,toc)

2.5.24 half(Constant)

Purpose

To lay out an aligned constant occupying half a machine word.

Condition Codes

Undefined.

x86
[possible padding]
.value Constant
x86_64 [PERM: No padding for x86_64? jit.c does padding for all Intel]
PPC64
.long Constant

2.5.25 word(Constant)

Purpose

To lay out an aligned constant occupying one machine word.

Condition Codes

Undefined.

x86
[possible padding]
.long Constant
x86_64
PPC64
[possible padding]
.quad Constant

2.5.26 label(L)

Purpose

A label indicating a code point that can be referred to by other instructions. L is on the form '$VAR'(Int).

Condition Codes

Undefined.

2.5.27 align(Int)

Purpose

To enforce code alignment. Let pc16 denote “program counter modulo 16”.

Condition Codes

Undefined.

x86

Depending on Int:

0

Bump pc until pc16 in {0,8,12}. [PERM: Verified x86.]

1

Bump pc until pc16 in {2,10,14}. [PERM: Verified x86.]

2

If pc16 in [9,15], bump pc until pc16=0. [PERM: Verified x86.]

3

Bump pc until pc16=12. [PERM: Verified x86.]

4

Bump pc until pc16 in {0,4,8,12}. [PERM: Verified x86.]

x86_64

Depending on Int:

0

Bump pc until pc16 in {0,8}. [PERM: Verified x64.]

1

Bump pc until pc16 in {4,12}. [PERM: Verified x64.]

2

If pc16 in [9,15], bump pc until pc16=0. [PERM: Verified x64.]

3

Bump pc until pc16=8. [PERM: Verified x64.]

4

Bump pc until pc16 in {0,8}. [PERM: Verified x64.]

PPC64

Depending on Int:

1

Bump pc until pc16 in {4,12}. [PERM: Verified PPC.]

2

No extra alignment needed (since 4-byte alignment is always assumed). [PERM: Verified PPC.]

0
3
4

Bump pc until pc16 in {0,8}. [PERM: Verified PPC.]

2.5.27.1 The meaning of align instruction arguments

[PERM: This information is reverse engineered and it should be verified that that I have understood things correctly].

All the alignment instructions correspond to non-executable code, i.e. any code before the alignment instruction does not fall through into the alignment instruction. This mens that the padding, if any, need not be executable, and for debuggability it is good if it is explicitly non-executable (e.g. ub2 on Intel).

In the following, let ws stand for the size of a word (4 or 8 bytes). Let hs stand for the size of a half word (2 or 4 bytes).

The meaning of the alignment instruction arguments are as follows:

align(0)

Used after a plcall instruction, to ensure suitable padding for the following data. The data is word or half-word, so alignment should be suitable for either, i.e. for a word. The plcall instruction is like a jump, but passes the address of the pc following the jump instruction (i.e. the address corresponding to the start of the align(0)).

On some architectures, e.g. x86/x64, plcall uses the “trick” of using an ordinary machine code call instruction that sets up the return address, i.e. the address of the align(0) that follows, for free. The callee can then use the return address (on x86/x64 this corresponds to popping the return address from the stack) to obtain the address of the data that follows the plcall instruction (e.g., on x86 this corresponds to aligning the popped return address in a way consistent with align(0)).

So, minimum alignment would require ((pc mod hs) == 0) and ((pc mod ws) == 0), and in the “ordinary” case (see below) would additionally require that ((pc + hs+hs+ws) mod 16) is code aligned.

On 64-bit this means pc16 in {0,8} (this is the same regardless of whether code should be 64-bit aligned or 32-bit aligned). [PERM: Verified x64, PPC.]

On 32-bit this means pc16 in {0,4,8,12} (which is always stronger than code alignment on x86), but this is not what is used on x86, see below.

Note: On the 32-bit x86, data alignment would correspond to (pc mod 16) in {0,4,8,12}, but for some reason this is not exactly what is used. Instead (pc mod 16) must be in {0,8,12}, i.e. it must not be 4. Presumably it is to avoid putting the code label at the last 4 bytes of a 16 byte block. See align(1) below for a discussion. [PERM: Verified x86.]

Used in two situations:

align(1)

This alignment is used at the start of a disjunct “pseudo-predicate”, before an native_op continuation.

It is always followed by a half-word data, a word, and then a code label, in the following way:

comment(disjunct),
align(1),
label(L1),
half(native_op),
word(L),
label(Entry), // code label
context(Entry),
label(L2),
…

The alignment should ensure that, when used as in the above example, the half-word is half-word-aligned and that the word is word-aligned and that the code-label has “good” alignment for code. I.e. that ((pc mod hs) == 0) and (((pc+hs) mod ws) == 0) and that (pc+hs+ws) is good for code.

On 64-bit this means pc16 in {4,12} (this is the same regardless of code should be 64-bit aligned or 32-bit aligned). [PERM: Verified x64, PPC.]

On 32-bit this means pc16 in {2,6,10,14} (assuming code should be 32-bit aligned; {2,10} if code should be 64-bit aligned), but this is not what is used on x86, see below. [PERM: Verified x86.]

Note: On the 32-bit x86, alignment would correspond to (pc mod 16) in {2,6,10,14}, but for some reason this is not exactly what is used. Instead (pc mod 16) must be in {2,10,14}, i.e. it must not be 6. The reason is unknown, perhaps it is to avoid the case when the code label points at the last (4 byte) word of a 16-byte memory block. Note that this is exactly the same code alignment avoided by the special x86 rule for align(0). [PERM: Verified x86.]

Note native_op continuation is also used as continuation after plcall instructions, but in those cases a different align instruction is used.

align(2)

The intent is to align the following code label in a “good way”, considering code alignment etc, while still avoiding excessive code bloat.

On 64-bit PPC, with 32-bit instruction size, this corresponds to pc16 in {0,4,8,12}, but see below. An experiment with only {0,8} (SP_JIT_ALIGN2) apparently slowed thing down. [PERM: Verified PPC.]

On 32-bit x86 and 64-bit x64, with no hard alignment restrictions, no alignment would be needed, but this is not what is used. Instead a pc16 less than or equal to 8 is left unchanged, whereas a larger modulus causes the pc to be bumped to the next 16-byte boundary. Presumably this is to ensure that the code contains at least half a 16 byte memory block. [PERM: Verified x86, x64.]

The only hard requirement is that a label following this alignment is valid as a jump destination. On some architectures, like PPC64, it is assumed all other instructions preserves this invarint, so, on such architectures, this instruction can safely be treated as a no-op.

align(3)

Used for ensuring “good” alignment for a code label that comes after a word after the alignment, e.g. like:

…
align(3),
word(native_entry(Pred)),
label(Entry), // code label
…

This is used before the predicate entry label and similar cases.

On 64-bit PPC, with 32-bit instruction size, this corresponds to pc16 in {0,4,8,12}, but this is not what is used. Instead only {0,8} is used, presumably for better cache behavior. [PERM: Verified PPC.]

On 64-bit x64, with no hard alignment restrictions, no alignment would be needed, but this is not what is used. Instead pc is bumped until pc16 is 8, corresponding to 16-byte aligned code label, presumably to ensure the code label is at the start of a 16 byte memory block. [PERM: Verified x64.]

On 32-bit x86, with no hard alignment restrictions, no alignment would be needed, but this is not what is used. Instead pc is bumped until pc16 is 12, corresponding to 16-byte aligned code label, presumably to ensure the code label is at the start of a 16 byte memory block. [PERM: Verified x86.]

The only hard requirement is that a label following this alignment (after a word) is valid as a jump destination. [PERM: Verify this, e.g. are there special requirements in the lead-in?]

align(4)

This is used for aligning data in a “good way”, like try chains and switches, after an unconditional jump.

On 64-bit, with 8-byte data, this corresponds to pc16 in {0,8}, [PERM: Verified x64, PPC.]

On 32-bit, with 4-byte data, this corresponds to pc16 in {0,4,8,12}, [PERM: Verified x86].

The only hard requirement is that a label following this alignment is suitably aligned as data, i.e is word-aligned.

jump(…),
align(4),
label(Try), // data label
try_chain(Tail,AlignedArity).
…

2.5.28 try_chain(list of (Label-Alternative),Arity)

Purpose

To lay out a data structure for backtracking purposes.

Condition Codes

Undefined.

x86
x86_64
PPC64

Every element of the list of pairs corresponds to a block of three machine words followed by two half machine words, laid out as follows, where b+o denotes an address at o machine words after the start of the block:

b+0  : Pointer to the next block, or NULL if it is the last block.
b+1  : Label, i.e. code address.
b+2  : Alternative, i.e. struct try_node pointer.
b+3  : offsetof(struct node,term[Arity])
b+3.5: Wmode(TRY)

2.5.29 switch(list of (Key-Target),Default)

Purpose

To perform a switch on the principal functor of register x0. Target is the jump target when x0 matches Key. Default is the default jump target.

Condition Codes

Undefined.

x86
x86_64
PPC64

This is laid out as a regular struct sw_on_key, machine-word aligned.

2.5.30 trampolines(Base)

Base is a local label that must be three preceding instruction.

The trampolines, if any, are emitted here.

2.5.31 toc(Base)

Base is a local label that must be three preceding instruction.

The TOC entries, if any, are emitted here.


Footnotes

(1)

The mcrxr 0 instruction would be shorter, but it is not available on server class Power CPUs.

(2)

Unlike the case for x86/x86_64



Send feedback on this subject.