Bitmanip opcodes

These are bit manipulation opcodes that, if provided, augment SimpleV for the purposes of efficiently accelerating Vector Processing, 3D Graphics and Video Processing.

The justification for their inclusion in BitManip is identical to the significant justification that went into their inclusion in the RISC-V Vector Extension (under the "Predicate Mask" opcodes section)

See https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-mask-instructions for details.

Predicate Masks

SV uses standard integer scalar registers as a predicate bitmask. Therefore, the majority of RISC-V RV32I / RV64I bit-level instructions are perfectly adequate. Some exceptions however present themselves from RVV.

logical bit-wise instructions

These are the available bitwise instructions in RVV:

vmand.mm vd, vs2, vs1     # vd[i] =   vs2[i].LSB &&  vs1[i].LSB
vmnand.mm vd, vs2, vs1    # vd[i] = !(vs2[i].LSB &&  vs1[i].LSB)
vmandnot.mm vd, vs2, vs1  # vd[i] =   vs2[i].LSB && !vs1[i].LSB
vmxor.mm  vd, vs2, vs1    # vd[i] =   vs2[i].LSB ^^  vs1[i].LSB
vmor.mm  vd, vs2, vs1     # vd[i] =   vs2[i].LSB ||  vs1[i].LSB
vmnor.mm  vd, vs2, vs1    # vd[i] = !(vs2[i[.LSB ||  vs1[i].LSB)
vmornot.mm  vd, vs2, vs1  # vd[i] =   vs2[i].LSB || !vs1[i].LSB
vmxnor.mm vd, vs2, vs1    # vd[i] = !(vs2[i].LSB ^^  vs1[i].LSB)

The ones that exist in scalar RISC-V are:

AND rd, rs1, rs2          # rd = rs1 & rs2
OR  rd, rs1, rs2          # rd = rs1 | rs2
XOR rd, rs1, rs2          # rd = rs1 ^ rs2

The ones in Bitmanip are:

ANDN rd, rs1, rs2         # rd = rs1 & ~rs2
ORN  rd, rs1, rs2         # rd = rs1 | ~rs2
XORN rd, rs1, rs2         # rd = rs1 ^ ~rs2

This leaves:

NOR
NAND

These are currently listed as "pseudo-ops" in BitManip-Draft (0.91) They need to be actual opcodes.

TODO: there is an extensive table in RVV of bit-level operations:

output instruction pseudoinstruction

0 1 2 3 instruction pseudoinstruction
0 0 0 0 vmxor.mm vd, vd, vd vmclr.m vd
1 0 0 0 vmnor.mm vd, src1, src2
0 1 0 0 vmandnot.mm vd, src2, src1
1 1 0 0 vmnand.mm vd, src1, src1 vmnot.m vd, src1
0 0 1 0 vmandnot.mm vd, src1, src2
1 0 1 0 vmnand.mm vd, src2, src2 vmnot.m vd, src2
0 1 1 0 vmxor.mm vd, src1, src2
1 1 1 0 vmnand.mm vd, src1, src2
0 0 0 1 vmand.mm vd, src1, src2
1 0 0 1 vmxnor.mm vd, src1, src2
0 1 0 1 vmand.mm vd, src2, src2 vmcpy.m vd, src2
1 1 0 1 vmornot.mm vd, src2, src1
0 0 1 1 vmand.mm vd, src1, src1 vmcpy.m vd, src1
1 0 1 1 vmornot.mm vd, src1, src2
1 1 1 1 vmxnor.mm vd, vd, vd vmset.m vd

pcnt - population count

population-count.

Pseudocode:

unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
  v &= v - 1; // clear the least significant bit set
}

This instruction is present in BitManip.

ffirst - find first bit

finds the first bit set as an index.

Pseudocode:

uint_xlen_t clz(uint_xlen_t rs1)
{
    for (int count = 0; count < XLEN; count++)
        if ((rs1 << count) >> (XLEN - 1))
            return count;
    return XLEN; // -1
}

This is similar but not identical to BitManip "CLZ". CLZ returns XLEN when no bits are set, whereas RVV returns -1.

sbf - set before first bit

Sets all LSBs leading up to (excluding) where an LSB in the src is set, and sets zeros including and following the src bit found. If the second operand is non-zero, this process continues the search (in the same LSB to MSB order) beginning each time (including the first time) from where 1s are set in the second operand.

A side-effect of the search is that when src is zero, the output is all ones. If the second operand is non-zero and the src is zero, the output is a copy of the second operand.

 # Example

 7 6 5 4 3 2 1 0   Bit number

 1 0 0 1 0 1 0 0   a3 contents
                   sbf a2, a3, x0
 0 0 0 0 0 0 1 1   a2 contents

 1 0 0 1 0 1 0 1   a3 contents
                   sbf a2, a3, x0
 0 0 0 0 0 0 0 0   a2

 0 0 0 0 0 0 0 0   a3 contents
                   sbf a2, a3, x0
 1 1 1 1 1 1 1 1   a2

 1 1 0 0 0 0 1 1   a0 vcontents
 1 0 0 1 0 1 0 0   a3 contents
                   sbf a2, a3, a0
 0 1 0 0 0 0 1 1   a2 contents

Pseudo-code:

def sof(rd, rs1, rs2):
    rd = 0
    setting_mode = rs2 == x0 or (regs[rs2] & 1)

    while i < XLEN:
        bit = 1<<i

        # only reenable when predicate in use, and bit valid
        if !setting_mode && rs2 != x0:
            if (regs[rs2] & bit):
                # back into "setting" mode
                setting_mode = True

        # skipping mode
        if !setting_mode:
            # skip any more 1s
            if regs[rs1] & bit == 1:
                i += 1
                continue

        # setting mode, search for 1
        if regs[rs1] & bit: # found a bit in rs1:
            setting_mode = False
            # next loop starts skipping
        else:
            regs[rd] |= bit # always set except when search succeeds

        i += 1

def sbf(rd, rs1, rs2):
    rd = 0
    # start setting if no predicate or if 1st predicate bit set
    setting_mode = rs2 == x0 or (regs[rs2] & 1)
    while i < XLEN:
        bit = 1<<i
        if rs2 != x0 and (regs[rs2] & bit):
            # reset searching
            setting_mode = False
        if setting_mode:
            if regs[rs1] & bit: # found a bit in rs1: stop setting rd
                setting_mode = False
            else:
                regs[rd] |= bit
        else if rs2 != x0: # searching mode
            if (regs[rs2] & bit):
                setting_mode = True # back into "setting" mode
        i += 1

sif - set including first bit

Similar to sbf except including the bit which ends a run. i.e: Sets all LSBs leading up to and including where an LSB in the src is set, and sets zeros following the point where the src bit is found.

The side-effect of when the src is zero is also the same as for sbf: output is all 1s if src2 is zero, and output is equal to src2 if src2 is non-zero.

 # Example

 7 6 5 4 3 2 1 0   Element number

 1 0 0 1 0 1 0 0   a3 contents
                   sif a2, a3
 0 0 0 0 0 1 1 1   a2 contents

 1 0 0 1 0 1 0 1   a3 contents
                   sif a2, a3
 0 0 0 0 0 0 0 1   a2

 1 1 0 0 0 0 1 1   a0 vcontents
 1 0 0 1 0 1 0 0   a3 contents
                   sif a2, a3, a0
 1 1 x x x x 1 1   a2 contents

Pseudo-code:

def sif(rd, rs1, rs2):
    rd = 0
    setting_mode = rs2 == x0 or (regs[rs2] & 1)

    while i < XLEN:
        bit = 1<<i

        # only reenable when predicate in use, and bit valid
        if !setting_mode && rs2 != x0:
            if (regs[rs2] & bit):
                # back into "setting" mode
                setting_mode = True

        # skipping mode
        if !setting_mode:
            # skip any more 1s
            if regs[rs1] & bit == 1:
                i += 1
                continue

        # setting mode, search for 1
        regs[rd] |= bit # always set during search
        if regs[rs1] & bit: # found a bit in rs1:
            setting_mode = False
            # next loop starts skipping

        i += 1

sof - set only first bit

Similar to sbf and sif except only set the bit which ends a run.

Unlike sbf and sif however, if the src is zero then the output is also guaranteed to be zero, irrespective of src2's contents.

 # Example

 7 6 5 4 3 2 1 0   Element number

 1 0 0 1 0 1 0 0   a3 contents
                   sof a2, a3
 0 0 0 0 0 1 0 0   a2 contents

 1 0 0 1 0 1 0 1   a3 contents
                   sof a2, a3
 0 0 0 0 0 0 0 1   a2

 1 1 0 0 0 0 1 1   a0 vcontents
 1 1 0 1 0 1 0 0   a3 contents
                   sof a2, a3, a0
 0 1 x x x x 0 0   a2 contents

Pseudo-code:

def sof(rd, rs1, rs2):
    rd = 0
    setting_mode = rs2 == x0 or (regs[rs2] & 1)

    while i < XLEN:
        bit = 1<<i

        # only reenable when predicate in use, and bit valid
        if !setting_mode && rs2 != x0:
            if (regs[rs2] & bit):
                # back into "setting" mode
                setting_mode = True

        # skipping mode
        if !setting_mode:
            # skip any more 1s
            if regs[rs1] & bit == 1:
                i += 1
                continue

        # setting mode, search for 1
        if regs[rs1] & bit: # found a bit in rs1:
            regs[rd] |= bit # only set when search succeeds
            setting_mode = False
            # next loop starts skipping

        i += 1