Major Opcode Allocation

SimpleV Prefix, 16-bit Compressed, and SV VBLOCK all require considerable opcode space. Similar to OpenPOWER v3.1 "prefixes" the key driving difference here is to reduce overall instruction size and thus greatly reduce I-Cache size and thus in turn power consumption.

Consequently rather than settle for a v3.1 32 bit prefix, 8 major opcodes are taken up and given new meanings. Two options here involve either:

Taking 8 arbitrary unused major opcodes as-is
Moving anything in the range 0-7 elsewhere

This only in "LibreSOC Mode". Candidates for moving elsewhere include mulli, twi and tdi.

2 opcodes for 16-bit Compressed instructions with 11 bits available
2 opcodes are required in order to give SV-P48 the 11 bits needed for prefixing
2 opcodes are likewise required for SV-P64 to have 27 bits available
2 opcodes for SV-C32 and SV-C48 (32 bit versions of P48 and P64)

With only 11 bits for 16-bit Compressed, it may be better to use the opportunity to switch into "16 bit mode". Interestingly SV-C32 could likewise switch into the same.

VBLOCK can be added later by using further VSX dedicated major opcodes (EXT62, EXT60)

EXT00 - unused (one instruction: attn)
EXT01 - v3.1B prefix
EXT02 - twi
EXT03 - tdi
EXT04 - vector/bcd
EXT05 - unused
EXT06 - vector
EXT07 - mulli
EXT09 - reserved
EXT17 - unused (2 instructions: sc, scv)
EXT22 - reserved sandbox
EXT46 - lmw
EXT47 - stmw
EXT56 - lq
EXT57 - vector ld
EXT58 - ld (leave ok)
EXT59 - FP (leave ok)
EXT60 - vector
EXT61 - st (leave ok)
EXT62 - vector st
EXT63 - FP (leave ok)

Potential allocations:

|  hword 0   | hword1  |  hword2    |  hword 3   |
EXT00/01 - C 10bit -> 16bit
EXT60/62 - VBLOCK
EXT09/17 - SV-C32 and other SV-C
EXT06/07 - SV-C32-Swizzle and other SV-C-Swizzle
EXT02/03 - SV-P48                
EXT04/05 - SV-P64
EXT56/57 - Predicated-SV-P48
EXT46/47 - Predicated SV-P64

Spare:

EXT22

C10/16 FSM

if EXT == 00/01
     start @ 10bit
if state==10bit:
     if bit15:
         next = 16bit
     else:
         next = Standard
if state==16bit:
     if bit0 & bit15:
         insn = C.immediate
     if ~bit15:
         if ~bit0:
             next = Standard
         else
             next = Standard.then.16bit

SV-Compressed FSM

if EXT == 09/17:
    if bit0:
         SV.mode =

Major opcode map

Table 9: Primary Opcode Map (opcode bits 0:5)

    |  000   |   001 |  010  | 011   |  100  |    101 |  110  |  111
000 |        |       |  tdi  | twi   | EXT04 |        |       | mulli | 000
001 | subfic |       | cmpli | cmpi  | addic | addic. | addi  | addis | 001
010 | bc/l/a | EXT17 | b/l/a | EXT19 | rlwimi| rlwinm |       | rlwnm | 010
011 |  ori   | oris  | xori  | xoris | andi. | andis. | EXT30 | EXT31 | 011
100 |  lwz   | lwzu  | lbz   | lbzu  | stw   | stwu   | stb   | stbu  | 100
101 |  lhz   | lhzu  | lha   | lhau  | sth   | sthu   | lmw   | stmw  | 101
110 |  lfs   | lfsu  | lfd   | lfdu  | stfs  | stfsu  | stfd  | stfdu | 110
111 |  lq    | EXT57 | EXT58 | EXT59 | EXT60 | EXT61  | EXT62 | EXT63 | 111
    |  000   |   001 |   010 |  011  |   100 |   101  | 110   |  111

LE/BE complications.

See https://bugs.libre-soc.org/show_bug.cgi?id=529 for discussion

With the Major Opcode being at the opposite end of the sequential byte order when read from memory in LE mode, a solution which allows 16 and 48 bit instructions to co-exist with 32 bit ones is to look at bytes 2 and 3 before looking at 0 and 1.

Option 1:

A 16 bit instruction would therefore be in bytes 2 and 3, removed from the instruction stream ahead of bytes 0 and 1, which would remain where they were. The next instruction would repeat the analysis, starting now instead at the new byte 2-3.

A 48 bit instruction would again use bytes 2 and 3, read the major opcode, and extract bytes 0 thru 5 from the stream. However the 48 bit instruction would be constructed from bytes 2,3,0,1,4,5. Again: after these 6 bytes were extracted fron the stream the analysis would begin again for the next instruction at bytes 2 and 3.

Option 2:

When reading from memory, before handing to the instruction decoder, bytes 0 and 1 are swapped unconditionally with bytes 2 and 3. Effectively this is near-identical to LE/BE byte-level swapping on a 32-bit block except this time it is half-word (16 bit) swapping on a 32-bit block.

With the Major Opcode then always being in the 1st 2 bytes it becomes much simpler for the pre-analysis phase to determine instruction length, regardless of what that length is (16/32/48/64/VBLOCK).

Option 3:

Just as in VLE, require instructions to be in BE order. Data, which has nothing to do with instruction order, may optionally remain in LE order.

Why does VLE use a separate 64k page?

VLE requires that the memory page be marked as VLE-encoded. It also requires rhat the instructions be in BE order even when 32 bit standard opcodes are mixed in.

Questions:

What would happen without the page being marked, when attempting to call ppc64le ABI code?
How would ppc64le code in the same page be distinguished from SVPrefix code?

The answers are that it is either impossible or that it requires a special mode-switching instruction to be called on entry and exit from functions, transitioning to and from ppc64le mode.

This transition may be achieved very simply by marking the 64k page.

16 bit Compressed

See 16 bit compressed