From: Ken Hagan on
On Tue, 08 Jun 2010 02:44:51 +0100, Skybuck Flying
<IntoTheFuture(a)hotmail.com> wrote:

> Ok, so implicit operands which I would call the hardcoded operands can be
> many...
>
> This leaves the question how many explicit/flexible operands can
> instructions have ?
>
> For now my guess would be 2 for general x86 instructions... for sse I am
> not so sure...

There's no need to guess, since Intel publish more documentation than you
could possibly want on the subject. See
http://www.intel.com/products/processor/manuals/index.htm.

There are instructions that modify many registers, but the registers
involved are always implicit. Generally there are just one or two explicit
arguments, and at most one of those can use an addressing mode that
involves multiple registers. This follows from the instruction format.

x86 instructions are a 1 or 2-byte "op-code" (more recently, but rarely, 3
bytes), which may be followed by a "modrm" byte, which may in turn be
followed by a "sib" byte. After those you might have 1, 2 or 4 bytes of
immediate data. In front of the whole instruction you might have one or
more prefix bytes.

That's it. (Since you asked the question, I'm guessing that you were
expecting something a little less constrained.) The MMX and SSE
instructions aren't any different from the traditional ones in terms of
encoding. They do work on different registers, but that's implicit in the
op-code. You can't freely mix MMX and SSE registers with normal
instructions. It's no accident that there are 8 general purpose registers,
8 FPU registers, 8 MMX registers and 8 SSE registers. They are all
shoe-horned into the same instruction format.
From: BGB / cr88192 on

"Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in message
news:7z1vcidkpw.fsf(a)ask.diku.dk...
> "Skybuck Flying" <IntoTheFuture(a)hotmail.com> writes:
>
>> Suppose I want to design a new instruction for the x86 instruction set or
>> perhaps the newer version/extension the x64 instruction set... I have a
>> question about that...
>>
>> The question is:
>>
>> What's the maximum number of operands at my disposal for the design ?
>
> As many as you like. Since x86 uses variable-length instructions, there
> is no fixed upper limit.
>

well, most things limit max opcode length to 16-bytes, so this is a limit...


> If you want to add your new instructions by a minimal modification to an
> existing implementation of the ISA, the answer depends on which
> implementation you choose to modify. If the implementation translates
> an x86 instruction into micro instructions internally, the answer is
> still that you can pretty much use as many operands as you like, as long
> as your complex instruction can be expressed in terms of the existing
> micro instructions. If not, it gets more complicated. If your
> instruction actually requires a modification of the ALU, it gets even
> more hairy. Adding an extra operand to the ALU is probably
> prohibitively expensive, as it is likely to slow everything else down.
>


well, x86 does have opcode encoding rules...

conventional arguments (allowed by typical ModRM bytes):
imm
reg
mem
reg, imm
mem, imm
reg, mem
mem, reg
reg, mem, imm
mem, reg, imm

XOP and AVX allow a few more register arguments.


this is excluding the possibility of using an otherwise unusual or a virtual
opcode encoding.

virtual-opcode:
use an existing opcode as a trigger, and switch to custom decoding for the
rest of the opcode.
this could be compared to a more extreme case of XOP.


this could be done within an OS for customized virtual opcodes:
a special magic opcode triggers a #UD.
the OS #UD handler examines the opcode, and decides whether to interpret it
as a special ISA extension.

however, I don't know of any which do this, since usually system-calls are
used instead.


another different hack I have used (in my projects), is to make some fake
opcodes (at the ASM level) which are converted into a function call (which
encodes all of the arguments into the function name), but this call uses a
special calling convention (namely, it preserves all registers/... except
those which it is specifically allowed to modify...).

the example would be like the big ugly URL's used for CGI scripts/... just
as function names.

typically, this strategy is used in combination with link-time and run-time
code generation...


or such...


From: Alexei A. Frounze on
On Jun 10, 11:56 am, "BGB / cr88192" <cr88...(a)hotmail.com> wrote:
> "Torben Ægidius Mogensen" <torb...(a)diku.dk> wrote in messagenews:7z1vcidkpw.fsf(a)ask.diku.dk...
>
> > "Skybuck Flying" <IntoTheFut...(a)hotmail.com> writes:
>
> >> Suppose I want to design a new instruction for the x86 instruction set or
> >> perhaps the newer version/extension the x64 instruction set... I have a
> >> question about that...
>
> >> The question is:
>
> >> What's the maximum number of operands at my disposal for the design ?
>
> > As many as you like.  Since x86 uses variable-length instructions, there
> > is no fixed upper limit.
>
> well, most things limit max opcode length to 16-bytes, so this is a limit....
>
> > If you want to add your new instructions by a minimal modification to an
> > existing implementation of the ISA, the answer depends on which
> > implementation you choose to modify.  If the implementation translates
> > an x86 instruction into micro instructions internally, the answer is
> > still that you can pretty much use as many operands as you like, as long
> > as your complex instruction can be expressed in terms of the existing
> > micro instructions.  If not, it gets more complicated.  If your
> > instruction actually requires a modification of the ALU, it gets even
> > more hairy.  Adding an extra operand to the ALU is probably
> > prohibitively expensive, as it is likely to slow everything else down.
>
> well, x86 does have opcode encoding rules...
>
> conventional arguments (allowed by typical ModRM bytes):
> imm
> reg
> mem
> reg, imm
> mem, imm
> reg, mem
> mem, reg
> reg, mem, imm
> mem, reg, imm
>
> XOP and AVX allow a few more register arguments.
>
> this is excluding the possibility of using an otherwise unusual or a virtual
> opcode encoding.
>
> virtual-opcode:
> use an existing opcode as a trigger, and switch to custom decoding for the
> rest of the opcode.
> this could be compared to a more extreme case of XOP.
>
> this could be done within an OS for customized virtual opcodes:
> a special magic opcode triggers a #UD.
> the OS #UD handler examines the opcode, and decides whether to interpret it
> as a special ISA extension.
>
> however, I don't know of any which do this, since usually system-calls are
> used instead.
>
> another different hack I have used (in my projects), is to make some fake
> opcodes (at the ASM level) which are converted into a function call (which
> encodes all of the arguments into the function name), but this call uses a
> special calling convention (namely, it preserves all registers/... except
> those which it is specifically allowed to modify...).
>
> the example would be like the big ugly URL's used for CGI scripts/... just
> as function names.
>
> typically, this strategy is used in combination with link-time and run-time
> code generation...

Apparently, both Windows and ReactOS patch the code containing the
prefetchnta instruction, e.g. RtlPrefetchMemoryNonTemporal(),
depending on whether or not the instruction is supported by the CPU:
http://www.computer.org/portal/web/csdl/doi/10.1109/HICSS.2010.182
(click on the PDF link, search for this fxn)
http://www.koders.com/c/fid4D23409E3EA6032D618D125732B4AC17A3E773DA.aspx
(search the code for this fxn)
There's a similar thing in Linux with the option of just skipping the
instruction (see handle_prefetch()):
http://lwn.net/Articles/8634/

From my experience Microsoft's C/C++ compiler seems to generate
prefetchnta in x64 code unconditionally thereby leaving it up to the
system to figure the way around the instruction unsupported by the
CPU. And the system either patches the code or emulates/skips the
instruction.

A number of virtualization products introduce otherwise illegal/non-
existent instructions to communicate between the guest OS and the host
and "emulate" those instructions: http://www.symantec.com/avcenter/reference/Virtual_Machine_Threats.pdf
(see sections starting with VI)

So, these "hacks" are there in the wild, for good and bad.

Alex
From: Torben �gidius Mogensen on
"BGB / cr88192" <cr88192(a)hotmail.com> writes:


> well, x86 does have opcode encoding rules...
>
> conventional arguments (allowed by typical ModRM bytes):
> imm
> reg
> mem
> reg, imm
> mem, imm
> reg, mem
> mem, reg
> reg, mem, imm
> mem, reg, imm
>
> XOP and AVX allow a few more register arguments.

Even within this limittaions, you can have any number of implied
register operands (well, up to the number of registers the x86 has).
Additionally, you can split up an instruction into one or more
instructions that set up operands and one that performs the operation.
It is, of course, debatable if this is one instruction or two or more
instructions, but if they are required to be adjacent, I would still
count them as one.

Torben
From: Terje Mathisen "terje.mathisen at on
Alexei A. Frounze wrote:
> Apparently, both Windows and ReactOS patch the code containing the
> prefetchnta instruction, e.g. RtlPrefetchMemoryNonTemporal(),
> depending on whether or not the instruction is supported by the CPU:
> http://www.computer.org/portal/web/csdl/doi/10.1109/HICSS.2010.182
> (click on the PDF link, search for this fxn)
> http://www.koders.com/c/fid4D23409E3EA6032D618D125732B4AC17A3E773DA.aspx
> (search the code for this fxn)
> There's a similar thing in Linux with the option of just skipping the
> instruction (see handle_prefetch()):
> http://lwn.net/Articles/8634/

I checked the relevant link and code, and I really don'tthink the author
understands just how hard it will be to get it right.

He does note that there are multiple problem areas related to SMP
systems,and that these make the solution much more complicated than it
would be for a single-core setup.

Anyway, his SMP hack to allow fixup of large instructions is to make
sure that all the opcodes used, that could fault, will do so based on
the first 4 opcode bytes only, i.e. independently of any following bytes.

With this restriction he can first use simple store instructions to
overwrite the tail, if any, and then use a locked update to fix the
first four bytes.

(He doesn't state it explicitely, but I assume he fixes 1-3 byte opcodes
by always writing 4 bytes, rewriting the current values into the
following bytes.

The potential probem I noted here is that afaik, many systems only
guarantee the atomicity of locked writes if they are properly aligned,
and 75% of all opcodes will not start on a 4-byte boundary.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"