Prev: Cache line list handling
Next: A naive conjecture about computer architecture and artificial intelligence
From: Frank Kotler on 6 Jul 2010 09:07 Nimai wrote: > I'm learning to program in straight machine code, and I just finished > reading the Intel manuals. > > I have a burning question that the books haven't answered, maybe I'm > just stupid and I missed it. > > If I do a JMP to a bunch of garbled data, how does the prefetching > process know where the "instruction boundaries" are? Where will EIP > be when the inevitable invalid opcode exception is triggered? > > In other words, if the instructions are garbage, how much garbage is > taken in? What are the rules? > > My guess is, each possible opcode byte has something like a lookup > table entry, and after parsing a byte, the prefetcher either adds > another byte to the instruction, adds a modr/m byte to the instruction > and grabs displacement and immediate bytes, or ends the instruction > and sends it to the pipeline. This is entirely based on inference, I > can't find anything in the manuals to confirm or deny this. > > Whatever process it uses, it MUST be entirely deterministic, or code > can't be. So where is it documented? I haven't a clue. I'm with Bob Masta - try it and see! ("one test is worth a thousand expert opinions") But I observe that guys who design the chips hang out on comp.arch so I'll cross-post it there, in hopes that it may get you a definitive answer (which may be "it's proprietary, we can't tell ya"). Good luck! Best, Frank
From: Joe Pfeiffer on 6 Jul 2010 09:25 > Nimai wrote: >> I'm learning to program in straight machine code, and I just finished >> reading the Intel manuals. >> >> I have a burning question that the books haven't answered, maybe I'm >> just stupid and I missed it. >> >> If I do a JMP to a bunch of garbled data, how does the prefetching >> process know where the "instruction boundaries" are? Where will EIP >> be when the inevitable invalid opcode exception is triggered? >> >> In other words, if the instructions are garbage, how much garbage is >> taken in? What are the rules? >> >> My guess is, each possible opcode byte has something like a lookup >> table entry, and after parsing a byte, the prefetcher either adds >> another byte to the instruction, adds a modr/m byte to the instruction >> and grabs displacement and immediate bytes, or ends the instruction >> and sends it to the pipeline. This is entirely based on inference, I >> can't find anything in the manuals to confirm or deny this. >> >> Whatever process it uses, it MUST be entirely deterministic, or code >> can't be. So where is it documented? Why should it be documented? What you've described is conceptually how it works; all that's left that matters to the programmer is how many instructions of what type can be decoded simultaneously (since that can affect optimization). As for when you get a fault, that depends on just what the garbling is. NX bit set? Immediately. Bad opcode? Immediately. Ends up trying to read/write data from invalid address? Immediately, but it'll be a proetection fault on the data address. Made it past the first "instruction"? On to the second... -- As we enjoy great advantages from the inventions of others, we should be glad of an opportunity to serve others by any invention of ours; and this we should do freely and generously. (Benjamin Franklin)
From: nedbrek on 6 Jul 2010 20:07 Hello, Welcome comp.lang.asm.x86! "Joe Pfeiffer" <pfeiffer(a)nospicedham.cs.nmsu.edu> wrote in message news:1br5jg214l.fsf(a)snowball.wb.pfeifferfamily.net... >> Nimai wrote: >>> If I do a JMP to a bunch of garbled data, how does the prefetching >>> process know where the "instruction boundaries" are? Where will EIP >>> be when the inevitable invalid opcode exception is triggered? >>> >>> In other words, if the instructions are garbage, how much garbage is >>> taken in? What are the rules? >>> >>> My guess is, each possible opcode byte has something like a lookup >>> table entry, and after parsing a byte, the prefetcher either adds >>> another byte to the instruction, adds a modr/m byte to the instruction >>> and grabs displacement and immediate bytes, or ends the instruction >>> and sends it to the pipeline. This is entirely based on inference, I >>> can't find anything in the manuals to confirm or deny this. >>> >>> Whatever process it uses, it MUST be entirely deterministic, or code >>> can't be. So where is it documented? > > As for when you get a fault, that depends on just what the garbling is. > NX bit set? Immediately. > > Bad opcode? Immediately. > > Ends up trying to read/write data from invalid address? Immediately, > but it'll be a proetection fault on the data address. > > Made it past the first "instruction"? On to the second... That about sums it up! Two aspects, architectural (what software sees) and hardware (what actually happens). The hardware is just going to shovel bits into the execution engine. An advanced machine doesn't even look at the bits at first. Hardware further down the line interprets the bits into instructions. This part of the machine is very speculative, so it can never be sure a bad branch somewhere won't make everything right. The machine won't flag any bad decode until it is sure that the architectural path goes that way. Any machine has to come to the same result as a simple, one instruction-at-a-time machine would (maintaining the architectural illusion). There are all sorts of nifty tricks to make this happen, but rest assured the fault will be deterministic. However, architecturally, there is only one invalid opcode instruction (0f 08) so anything else might run for a while. Also, new instructions get added - so what happens to be invalid today might be a real instruction tomorrow. You might even manage to fall into an infinite loop! (jmp byte -2, eb fe) Hope your environment has preemptive multitasking! Hope that helps! Ned
From: Andy 'Krazy' Glew on 6 Jul 2010 19:39 On 7/6/2010 6:07 AM, Frank Kotler wrote: > Nimai wrote: >> I'm learning to program in straight machine code, and I just finished >> reading the Intel manuals. >> >> I have a burning question that the books haven't answered, maybe I'm >> just stupid and I missed it. >> >> If I do a JMP to a bunch of garbled data, how does the prefetching >> process know where the "instruction boundaries" are? Where will EIP >> be when the inevitable invalid opcode exception is triggered? >> >> In other words, if the instructions are garbage, how much garbage is >> taken in? What are the rules? >> >> My guess is, each possible opcode byte has something like a lookup >> table entry, and after parsing a byte, the prefetcher either adds >> another byte to the instruction, adds a modr/m byte to the instruction >> and grabs displacement and immediate bytes, or ends the instruction >> and sends it to the pipeline. This is entirely based on inference, I >> can't find anything in the manuals to confirm or deny this. >> >> Whatever process it uses, it MUST be entirely deterministic, or code >> can't be. So where is it documented? Nimai's guess is a fairly accurate description of what is treated as the defacto architectural definition. The actual hardware is more like: fetch 1 or 2 blocks of instructions (typically 16 byte aligned) containing the branch target decode in parallel several instructions in those blocks starting at the branch target i.e. it is done in parallel. Although there have been machines that could only decode one instruction at a time, if never seen before. typically those machines have instruction predecode bits in the instruction cache, maybe even the L2, and have rather poor performance on code they haaven't seen before. But most modern machines can at least decode multiple bytes of a given instruction within a cycle. Typically via Option 1: assume first byte is an opcode byte assume second is a modrm assume 3rd-6th are an offset Option 2: assume first byte is a REX prefix or some other ptefix assume second byte is an opcode byte assume third is a modrm assume 4rd-7th are an offset .. and so on, in parallel, using whichever option matches. But, the semantics are as if looked at a byte at a time.
From: MitchAlsup on 6 Jul 2010 19:48 On Jul 6, 8:07 am, Frank Kotler <fbkot...(a)nospicedham.myfairpoint.net> wrote: > Nimai wrote: > > I'm learning to program in straight machine code, and I just finished > > reading the Intel manuals. > > > I have a burning question that the books haven't answered, maybe I'm > > just stupid and I missed it. > > > If I do a JMP to a bunch of garbled data, how does the prefetching > > process know where the "instruction boundaries" are? Where will EIP > > be when the inevitable invalid opcode exception is triggered? The EIP will point to the first instruction that has detectable garbage. The key word, here, is detectable, as so very many byte sequences are legal (if not very useable) opcodes. > > In other words, if the instructions are garbage, how much garbage is > > taken in? What are the rules? It is wise to assume that at least 3 cache lines of garbage are fetched before garbage is decoded. > > My guess is, each possible opcode byte has something like a lookup > > table entry, and after parsing a byte, the prefetcher either adds > > another byte to the instruction, adds a modr/m byte to the instruction > > and grabs displacement and immediate bytes, or ends the instruction > > and sends it to the pipeline. This is entirely based on inference, I > > can't find anything in the manuals to confirm or deny this. > > > Whatever process it uses, it MUST be entirely deterministic, or code > > can't be. So where is it documented? It ends up different on different architectures. But your logic is sound, you are just not thinking in parallel. What generally happens is that at least 4 bytes are fully decoded in to 256 signals per byte. Then various logic condenses the 256 signals (times the number of bytes) to 50-ish, expecially fereting out prefixes (with respect to operating mode). Then another layer of logic identifies the major opcode byte. And the rest is simply a cascade of multiplexers. One end result of all ths multiplexing is the start pointer for the next instruction. The major opcode byte specifies whether there are opcode bits in the minor opcode byte (if present) and modr/m and SIB encodings. Knowing if a minor, modr/m, or SIB is present and whether an immediate is present gives you all that is necessary (prior to SSE4) to detrmine the subsequent instruction boundary. Bad opcodes are generally about another whole pipe stage down the pipe from instruction parsing. There is no reason to clutter up a hard problem with an intractable problem in a gate limited and fanlimited pipestage. You still have at least 5 pipe stages before any damage is done to machine state. Plenty of time to stumble accorss the myriad of subtle invalid opcodes due to improper use of modr/m or SIBs or prefix violations. And NO reason to get cute and try to do them earlier. {I happen to know how to do this in 12 gate delays from RAW bytes and 3 instructions at a time in 8 gates with end pointer bits.} All of this is also dependent on some sequencing decisions made in the pipeline. Mitch
|
Next
|
Last
Pages: 1 2 Prev: Cache line list handling Next: A naive conjecture about computer architecture and artificial intelligence |