From: Ken Hagan on 8 Jun 2010 05:19 On Tue, 08 Jun 2010 02:44:51 +0100, Skybuck Flying <IntoTheFuture(a)hotmail.com> wrote: > Ok, so implicit operands which I would call the hardcoded operands can be > many... > > This leaves the question how many explicit/flexible operands can > instructions have ? > > For now my guess would be 2 for general x86 instructions... for sse I am > not so sure... There's no need to guess, since Intel publish more documentation than you could possibly want on the subject. See http://www.intel.com/products/processor/manuals/index.htm. There are instructions that modify many registers, but the registers involved are always implicit. Generally there are just one or two explicit arguments, and at most one of those can use an addressing mode that involves multiple registers. This follows from the instruction format. x86 instructions are a 1 or 2-byte "op-code" (more recently, but rarely, 3 bytes), which may be followed by a "modrm" byte, which may in turn be followed by a "sib" byte. After those you might have 1, 2 or 4 bytes of immediate data. In front of the whole instruction you might have one or more prefix bytes. That's it. (Since you asked the question, I'm guessing that you were expecting something a little less constrained.) The MMX and SSE instructions aren't any different from the traditional ones in terms of encoding. They do work on different registers, but that's implicit in the op-code. You can't freely mix MMX and SSE registers with normal instructions. It's no accident that there are 8 general purpose registers, 8 FPU registers, 8 MMX registers and 8 SSE registers. They are all shoe-horned into the same instruction format.
From: BGB / cr88192 on 10 Jun 2010 14:56 "Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in message news:7z1vcidkpw.fsf(a)ask.diku.dk... > "Skybuck Flying" <IntoTheFuture(a)hotmail.com> writes: > >> Suppose I want to design a new instruction for the x86 instruction set or >> perhaps the newer version/extension the x64 instruction set... I have a >> question about that... >> >> The question is: >> >> What's the maximum number of operands at my disposal for the design ? > > As many as you like. Since x86 uses variable-length instructions, there > is no fixed upper limit. > well, most things limit max opcode length to 16-bytes, so this is a limit... > If you want to add your new instructions by a minimal modification to an > existing implementation of the ISA, the answer depends on which > implementation you choose to modify. If the implementation translates > an x86 instruction into micro instructions internally, the answer is > still that you can pretty much use as many operands as you like, as long > as your complex instruction can be expressed in terms of the existing > micro instructions. If not, it gets more complicated. If your > instruction actually requires a modification of the ALU, it gets even > more hairy. Adding an extra operand to the ALU is probably > prohibitively expensive, as it is likely to slow everything else down. > well, x86 does have opcode encoding rules... conventional arguments (allowed by typical ModRM bytes): imm reg mem reg, imm mem, imm reg, mem mem, reg reg, mem, imm mem, reg, imm XOP and AVX allow a few more register arguments. this is excluding the possibility of using an otherwise unusual or a virtual opcode encoding. virtual-opcode: use an existing opcode as a trigger, and switch to custom decoding for the rest of the opcode. this could be compared to a more extreme case of XOP. this could be done within an OS for customized virtual opcodes: a special magic opcode triggers a #UD. the OS #UD handler examines the opcode, and decides whether to interpret it as a special ISA extension. however, I don't know of any which do this, since usually system-calls are used instead. another different hack I have used (in my projects), is to make some fake opcodes (at the ASM level) which are converted into a function call (which encodes all of the arguments into the function name), but this call uses a special calling convention (namely, it preserves all registers/... except those which it is specifically allowed to modify...). the example would be like the big ugly URL's used for CGI scripts/... just as function names. typically, this strategy is used in combination with link-time and run-time code generation... or such...
From: Alexei A. Frounze on 11 Jun 2010 03:42 On Jun 10, 11:56 am, "BGB / cr88192" <cr88...(a)hotmail.com> wrote: > "Torben Ægidius Mogensen" <torb...(a)diku.dk> wrote in messagenews:7z1vcidkpw.fsf(a)ask.diku.dk... > > > "Skybuck Flying" <IntoTheFut...(a)hotmail.com> writes: > > >> Suppose I want to design a new instruction for the x86 instruction set or > >> perhaps the newer version/extension the x64 instruction set... I have a > >> question about that... > > >> The question is: > > >> What's the maximum number of operands at my disposal for the design ? > > > As many as you like. Since x86 uses variable-length instructions, there > > is no fixed upper limit. > > well, most things limit max opcode length to 16-bytes, so this is a limit.... > > > If you want to add your new instructions by a minimal modification to an > > existing implementation of the ISA, the answer depends on which > > implementation you choose to modify. If the implementation translates > > an x86 instruction into micro instructions internally, the answer is > > still that you can pretty much use as many operands as you like, as long > > as your complex instruction can be expressed in terms of the existing > > micro instructions. If not, it gets more complicated. If your > > instruction actually requires a modification of the ALU, it gets even > > more hairy. Adding an extra operand to the ALU is probably > > prohibitively expensive, as it is likely to slow everything else down. > > well, x86 does have opcode encoding rules... > > conventional arguments (allowed by typical ModRM bytes): > imm > reg > mem > reg, imm > mem, imm > reg, mem > mem, reg > reg, mem, imm > mem, reg, imm > > XOP and AVX allow a few more register arguments. > > this is excluding the possibility of using an otherwise unusual or a virtual > opcode encoding. > > virtual-opcode: > use an existing opcode as a trigger, and switch to custom decoding for the > rest of the opcode. > this could be compared to a more extreme case of XOP. > > this could be done within an OS for customized virtual opcodes: > a special magic opcode triggers a #UD. > the OS #UD handler examines the opcode, and decides whether to interpret it > as a special ISA extension. > > however, I don't know of any which do this, since usually system-calls are > used instead. > > another different hack I have used (in my projects), is to make some fake > opcodes (at the ASM level) which are converted into a function call (which > encodes all of the arguments into the function name), but this call uses a > special calling convention (namely, it preserves all registers/... except > those which it is specifically allowed to modify...). > > the example would be like the big ugly URL's used for CGI scripts/... just > as function names. > > typically, this strategy is used in combination with link-time and run-time > code generation... Apparently, both Windows and ReactOS patch the code containing the prefetchnta instruction, e.g. RtlPrefetchMemoryNonTemporal(), depending on whether or not the instruction is supported by the CPU: http://www.computer.org/portal/web/csdl/doi/10.1109/HICSS.2010.182 (click on the PDF link, search for this fxn) http://www.koders.com/c/fid4D23409E3EA6032D618D125732B4AC17A3E773DA.aspx (search the code for this fxn) There's a similar thing in Linux with the option of just skipping the instruction (see handle_prefetch()): http://lwn.net/Articles/8634/ From my experience Microsoft's C/C++ compiler seems to generate prefetchnta in x64 code unconditionally thereby leaving it up to the system to figure the way around the instruction unsupported by the CPU. And the system either patches the code or emulates/skips the instruction. A number of virtualization products introduce otherwise illegal/non- existent instructions to communicate between the guest OS and the host and "emulate" those instructions: http://www.symantec.com/avcenter/reference/Virtual_Machine_Threats.pdf (see sections starting with VI) So, these "hacks" are there in the wild, for good and bad. Alex
From: Torben �gidius Mogensen on 11 Jun 2010 04:31 "BGB / cr88192" <cr88192(a)hotmail.com> writes: > well, x86 does have opcode encoding rules... > > conventional arguments (allowed by typical ModRM bytes): > imm > reg > mem > reg, imm > mem, imm > reg, mem > mem, reg > reg, mem, imm > mem, reg, imm > > XOP and AVX allow a few more register arguments. Even within this limittaions, you can have any number of implied register operands (well, up to the number of registers the x86 has). Additionally, you can split up an instruction into one or more instructions that set up operands and one that performs the operation. It is, of course, debatable if this is one instruction or two or more instructions, but if they are required to be adjacent, I would still count them as one. Torben
From: Terje Mathisen "terje.mathisen at on 11 Jun 2010 07:11
Alexei A. Frounze wrote: > Apparently, both Windows and ReactOS patch the code containing the > prefetchnta instruction, e.g. RtlPrefetchMemoryNonTemporal(), > depending on whether or not the instruction is supported by the CPU: > http://www.computer.org/portal/web/csdl/doi/10.1109/HICSS.2010.182 > (click on the PDF link, search for this fxn) > http://www.koders.com/c/fid4D23409E3EA6032D618D125732B4AC17A3E773DA.aspx > (search the code for this fxn) > There's a similar thing in Linux with the option of just skipping the > instruction (see handle_prefetch()): > http://lwn.net/Articles/8634/ I checked the relevant link and code, and I really don'tthink the author understands just how hard it will be to get it right. He does note that there are multiple problem areas related to SMP systems,and that these make the solution much more complicated than it would be for a single-core setup. Anyway, his SMP hack to allow fixup of large instructions is to make sure that all the opcodes used, that could fault, will do so based on the first 4 opcode bytes only, i.e. independently of any following bytes. With this restriction he can first use simple store instructions to overwrite the tail, if any, and then use a locked update to fix the first four bytes. (He doesn't state it explicitely, but I assume he fixes 1-3 byte opcodes by always writing 4 bytes, rewriting the current values into the following bytes. The potential probem I noted here is that afaik, many systems only guarantee the atomicity of locked writes if they are properly aligned, and 75% of all opcodes will not start on a 4-byte boundary. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching" |