Prev: Simple Hack To Get $2500 To Your PayPal Account.
Next: ARM-based desktop computer ? (Hybrid computers ?: Low + High performance ;))
From: nmm1 on 21 Jul 2010 11:38 Arising out of a course I am writing, I want to find out roughly how Intel and AMD handle I/O transfers to and from memory at the hardware level. So far, my searching has led nowhere beyond what I know, such as: The I/O controller uses the HyperTransport or QuickPath link to talk to the memory controller on the CPU that owns the memory. Well, I assume that, because anything else would be silly. But: Do they do those in a cache-coherent fashion, or is that independent of the cache? Indeed, do they update the cache (as some systems used to) and, if so, up to which level? Is there any public documentation on this? I am not looking for details, but enough information to be able to write reliable notes on advanced tuning. Any help appreciated. Regards, Nick Maclaren.
From: MitchAlsup on 21 Jul 2010 12:53 On Jul 21, 10:38 am, n...(a)cam.ac.uk wrote: > Arising out of a course I am writing, I want to find out roughly > how Intel and AMD handle I/O transfers to and from memory at the > hardware level. So far, my searching has led nowhere beyond what > I know, such as: > > The I/O controller uses the HyperTransport or QuickPath link > to talk to the memory controller on the CPU that owns the memory. > Well, I assume that, because anything else would be silly. > > But: > > Do they do those in a cache-coherent fashion, or is that > independent of the cache? If I remember correctly: The I/O DMA devices use non-coherent parts of the link transfer protocol. A nC-READ goes to the memory controller, the memory controller back snoops the processor caches and the I/O device gets up-to-date data. Cache data remains in whatever state it was {i.e. Exclusive Modified} and is not degraded to shared. A nC-WRITE goes to the memory controller, the memory controller back invalidates the processor caches and memory gets the up-to-date-data. Cache data becomes invalid. I/O Reads and Writes performed by the processor use parts of the physical address space that is mapped nonCacheable (in Page tables and/ or MTRRs). These requests are routed based on address through the fabric and do register sized Writes or Reads--expect 1-5 microseconds round trip. In order to guarentee that a pending Write request has arrived at a device, a nonCacheable Read to the same device is used to push the Write all the way to the device over <whatever connection structure there is>. > Indeed, do they update the cache (as some systems used to) and, > if so, up to which level? At least for AMD system (and I would assume Intel also) the whole cache hierarchy remains coherent. > Is there any public documentation on this? I am not looking > for details, but enough information to be able to write reliable > notes on advanced tuning. I learned this data while employed, some of it is hard to come by, and some of it requires NDA-level signatures, some of it come your way when various BIOS bugs are being discussed. Mitch
From: Tim McCaffrey on 21 Jul 2010 13:43 In article <i274ab$rg6$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk says... > > >Arising out of a course I am writing, I want to find out roughly >how Intel and AMD handle I/O transfers to and from memory at the >hardware level. So far, my searching has led nowhere beyond what >I know, such as: > > The I/O controller uses the HyperTransport or QuickPath link >to talk to the memory controller on the CPU that owns the memory. >Well, I assume that, because anything else would be silly. > >But: > > Do they do those in a cache-coherent fashion, or is that >independent of the cache? > > Indeed, do they update the cache (as some systems used to) and, >if so, up to which level? > > Is there any public documentation on this? I am not looking >for details, but enough information to be able to write reliable >notes on advanced tuning. > >Any help appreciated. > Some of this information can be inferred from information in the AMD BIOS writer's guide. I learned some of this the hard way recently, but as Mitch suggests, much of it is under NDA. In any case, things like "do a read to push the writes" is required by PCI, so everything else up and down the chain usually requires it (or takes advantage of it) as well. Think of it this way: For HT (QPI is probably roughly the same), you have a device that is either PCI, PCIe, SATA, IDE, USB or HT based. For PCI, PCIe, SATA & IDE the request needs to be translated to HT (or for SATA/IDE/USB it may need to be translated to something in between, etc). The HT requests are much different than any of those other protocols, when they reach the processor socket, the HT request is translated into an internal request, which is routed & flow controlled, and has various interactions with caches & memory controllers. Those interactions are based on what the HT packet contains (sequence #s, tags, flags, etc.), and for best performance it does matter alot how they are set. So, the RDMA environment doesn't just include what is on the processor, but how every bridge between the processor and the actual device handles the I/O request. Good luck with this. Tracking down performance bugs with this stuff can give you headaches for months. - Tim
From: David L. Craig on 21 Jul 2010 13:55 On Jul 21, 11:38 am, n...(a)cam.ac.uk wrote: > Is there any public documentation on this? I am not looking > for details, but enough information to be able to write reliable > notes on advanced tuning. Well, I bought two paperback's published by Intel when the 80386 was released entitled "80386 Hardware Reference Manual" and "80386 Programmer's Reference Manual" that still have very good value for me. But now PDFs containing this depth and breadth of documentation seem to be freely available in PDF format; e.g., http://www.intel.com/products/embedded/processors/corei3/desktop/technicaldocuments.htm so find the product(s) you're interested in and download what looks promising. Happy hunting.
From: David L. Craig on 22 Jul 2010 11:29
On Jul 21, 1:55 pm, "David L. Craig" <dlc....(a)gmail.com> wrote: > Well, I bought two paperback's published by Intel when the > 80386 was released entitled "80386 Hardware Reference Manual" > and "80386 Programmer's Reference Manual" that still have > very good value for me. But now PDFs containing this depth > and breadth of documentation seem to be freely available in > PDF format; e.g.,http://www.intel.com/products/embedded/processors/corei3/desktop/tech... > so find the product(s) you're interested in and download what > looks promising. Happy hunting. Wow! What used to be a few hundred pages now seems to be a few thousand or more. At least it's downloadable PDF files. I'm having fun scanning through it all. Is the RDMA being discussed here the TCP/IP protocol or something else. Is there a connection to pertinent information via PerfMon? |