Prev: Sound card C-media 6501 on windows 7
Next: EngMapFile vs. CreateFileMapping - filename differences
From: Alberto on 27 Jan 2009 15:05 Actually, we run them both on the same machine. We usually run this system with a Vp1000 (the older card) and a Vp2000 (the newer) side by side. But the difference is, now things are beginning to clear out in my mind, the Vp1000 runs on the PCI bus while the Vp2000 runs on a PCI Express bus. This is what I found out. I don't know if this is what's happening in my case, but I fired out a question to Dell and I'm waiting on an answer. You can start by taking a look at http://www.microsoft.com/whdc/system/platform/server/PAE/PAEdrv.mspx This is about PAE, but it fits my case. Microsoft states that there are chipsets that do not support the "Dual Address Cycle", or DAC, which prevents the bus from accessing more than 32 bits worth of addressing space. They state that a DAC-capable adapter, such as our Vp2000, must run on a DAC-capable bus. The DDK Dma complex checks for DAC capability at boot time, and if it finds a DAC-capable adapter running on a non-DAC-capable bus, it then resorts to a double- buffering scheme to prevent bus accesses above 4Gb. Now, my 64-bit code doesn't care about 4Gb boundaries, blindly assuming that a 64-bit-OS-capable machine will have a 64-bit-capable PCI Express bus. Problem is, maybe the T5400 bus isn't 64-bit-address capable, and that's the answer I'm waiting for from Dell. This would explain why the frequency of bugcheck 101 hits decreased significantly once I forced my physical buffer allocations to stay below the 4Gb line. There still may be some dark corner of the driver that's generating physical addresses above 4Gb, or maybe some other driver is causing the issue by not being careful enough with their bus accesses. In either case, maybe the answer is plainly not to run 64- bit Windows in such platforms. I intend to make absolutely sure I'm not allocating physical memory above the 4Gb line, that should absolve my driver. But even then, I'm not that sure I can run on such platforms. My question then is, how do I test for the "DACness" of a bus ? Do any of you out there know ? Or am I barking up the wrong tree ? Thanks for any help you can provide! Alberto. On Jan 26, 3:55 pm, "Maxim S. Shatskih" <ma...(a)storagecraft.com.no.spam> wrote: > >The funky thing is, this chip's predecessor uses the same dma scheme, > >and the same hw queue implementation, and it has been working fine for > >years now. > > On the same surrounding hardware? > > -- > Maxim S. Shatskih > Windows DDK MVP > ma...(a)storagecraft.comhttp://www.storagecraft.com
From: Maxim S. Shatskih on 27 Jan 2009 15:19 >Now, my 64-bit code doesn't care about 4Gb boundaries, blindly >assuming That's why it is a good idea to use IoGetDmaAdapter, since in this case you can get the check for the non-conforming root complex for free. >not that sure I can run on such platforms. My question then is, how do >I test for the "DACness" of a bus ? Use IoGetDmaAdapter, this can possibly do this for you for free (in MS's pci.sys) -- Maxim S. Shatskih Windows DDK MVP maxim(a)storagecraft.com http://www.storagecraft.com
From: Alberto on 27 Jan 2009 17:12 The problem is, this is not standard dma, I don't know how well I could fit it within the DDK model. The chip fetches dma/render/synchronization/interrupt driver-generated command streams and scatter-gather lists from queues in host memory, asynchronously and in parallel. There are no map registers, scatter- gather lists are dynamic, and one single dma transaction could include hundreds of scatter-gather list items; command streams of 4Mb or more aren't uncommon, and the data that gets moved by the subsequent dma's may be multiple gigabytes at each throw. One of the queue commands - and we use it a lot - is a call/return, which redirects the queue to fetch its command stream, on the fly, to a command buffer in host memory. So, an engine would (1) fetch from the queue, (2) fetch from the buffer, (3) start a dma between yet another buffer and board memory, and while that goes on, the other queue is fetching and running a render command stream from yet another host buffer. When the dma completes, the queue automatically writes a number of state registers to slots in the device extension, so that the ISR knows which are the current dma and render transactions. You can see that at any one time the chip can be transacting with several host buffers in parallel. The chip can also handle several dma, render, and internal message passing transactions before it interrupts; it can internally track and stack multiple dma and render operations and completions, including their mutual synchronization. The driver keeps the queues busy by continuously and asynchronously enqueuing multiple transactions in what looks a lot like batch mode. Plus, the chip bugs force us to play games with the chip command streams, including the scatter-gather lists, that might be hard to duplicate within the DDK model! Alberto. On Jan 27, 3:19 pm, "Maxim S. Shatskih" <ma...(a)storagecraft.com.no.spam> wrote: > >Now, my 64-bit code doesn't care about 4Gb boundaries, blindly > >assuming > > That's why it is a good idea to use IoGetDmaAdapter, since in this case you can get the check for the non-conforming root complex for free. > > >not that sure I can run on such platforms. My question then is, how do > >I test for the "DACness" of a bus ? > > Use IoGetDmaAdapter, this can possibly do this for you for free (in MS's pci.sys) > > -- > Maxim S. Shatskih > Windows DDK MVP > ma...(a)storagecraft.comhttp://www.storagecraft.com
From: Alberto on 4 Feb 2009 18:43 I want to thank all of you who contributed to this thread. For the sake of closure, I will report my final findings on this Bugcheck 101, if nothing else because it seems to be quite hard to find documentation on the Bugcheck anywhere, Internet included! I saw the problem happen on Dell T5400 and Dell 490 machines running Vista 64 or XP 64. It does not happen in other Dell machines we use, for example, on the 670 or on the 2900. It doesn't happen on our HP systems either. The machine must have more than 4Gb of memory for the problem to show up. The problem does not happen on 32-bit Windows, although I did not try running with PAE enabled; by the looks of it, chances are that I might bump into that problem on PAE enabled machines. We traced the problem to an intermittent faulty Pci Express 64-bit bus access when the address is higher than 4Gb. We solved the problem by forcing all chip/bus traffic to use physical addresses lower than 4Gb. Once we completed this implementation, the problem went away and it cannot be duplicated no matter how much traffic we throw at the bus. At this point I don't know if the problem is the bus implementation, the bridge, the bios, the OS bus driver, or our own hardware board. There's a faint chance that this might be an issue withour driver, but at this point in time I doubt it very much. I am ceasing to investigate this problem because it's far too late in the game to fix an eventual chip issue, and anything else is beyond our control. Hence, the path of least resistance was, do everything from the lower 4Gb. Again, thanks to all who contributed to this thread! Alberto. On Jan 27, 5:12 pm, Alberto <more...(a)terarecon.com> wrote: > The problem is, this is not standard dma, I don't know how well I > could fit it within the DDK model. > > The chip fetches dma/render/synchronization/interrupt driver-generated > command streams and scatter-gather lists from queues in host memory, > asynchronously and in parallel. There are no map registers, scatter- > gather lists are dynamic, and one single dma transaction could include > hundreds of scatter-gather list items; command streams of 4Mb or more > aren't uncommon, and the data that gets moved by the subsequent dma's > may be multiple gigabytes at each throw. One of the queue commands - > and we use it a lot - is a call/return, which redirects the queue to > fetch its command stream, on the fly, to a command buffer in host > memory. So, an engine would (1) fetch from the queue, (2) fetch from > the buffer, (3) start a dma between yet another buffer and board > memory, and while that goes on, the other queue is fetching and > running a render command stream from yet another host buffer. When the > dma completes, the queue automatically writes a number of state > registers to slots in the device extension, so that the ISR knows > which are the current dma and render transactions. > > You can see that at any one time the chip can be transacting with > several host buffers in parallel. The chip can also handle several > dma, render, and internal message passing transactions before it > interrupts; it can internally track and stack multiple dma and render > operations and completions, including their mutual synchronization. > The driver keeps the queues busy by continuously and asynchronously > enqueuing multiple transactions in what looks a lot like batch mode. > Plus, the chip bugs force us to play games with the chip command > streams, including the scatter-gather lists, that might be hard to > duplicate within the DDK model! > > Alberto. > > On Jan 27, 3:19 pm, "Maxim S. Shatskih" > > > > <ma...(a)storagecraft.com.no.spam> wrote: > > >Now, my 64-bit code doesn't care about 4Gb boundaries, blindly > > >assuming > > > That's why it is a good idea to use IoGetDmaAdapter, since in this case you can get the check for the non-conforming root complex for free. > > > >not that sure I can run on such platforms. My question then is, how do > > >I test for the "DACness" of a bus ? > > > Use IoGetDmaAdapter, this can possibly do this for you for free (in MS's pci.sys) > > > -- > > Maxim S. Shatskih > > Windows DDK MVP > > ma...(a)storagecraft.comhttp://www.storagecraft.com- Hide quoted text - > > - Show quoted text -
First
|
Prev
|
Pages: 1 2 Prev: Sound card C-media 6501 on windows 7 Next: EngMapFile vs. CreateFileMapping - filename differences |