Prev: [PATCH 0/3] HID: make raw output callback more flexible
Next: kprobes: Disable booster when CONFIG_PREEMPT=y
From: Benjamin Herrenschmidt on 26 Feb 2010 17:10 On Fri, 2010-02-26 at 16:00 +0000, Catalin Marinas wrote: > > I'm surprised that usb-storage has an issue here. It shouldn't > afaik, > > since it's just a SCSI driver (or not anymore ?) and the BIO or > > filesystems handle things there no ? I haven't seen a single call to > > flush_dcache_page() in any of drivers/scsi, drivers/ata or > drivers/ide > > when I looked... > > The BIO or filesystem code don't call flush_dcache_page() either (well > some do like cramfs or jffs but they decompress the data received from > the block device). That's weird... that would mean that all existing PIO IDE or SCSI is broken etc... Including I$/D$ cache coherency on powerpc and more. That surprises me :-) On an older kernel tree here: $ grep -r flush_dcache_page fs | wc -l 118 So maybe that's where things need fixing ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 27 Feb 2010 19:20 On Fri, 2010-02-26 at 21:00 +0000, Russell King - ARM Linux wrote: > On Fri, Feb 26, 2010 at 04:25:21PM +0000, Catalin Marinas wrote: > > For mmap'ed pages (and present in the page cache), is it guaranteed that > > the HCD driver won't write to it once it has been mapped into user > > space? If that's the case, it may solve the problem by just reversing > > the meaning of PG_arch_1 on ARM and assume that a newly allocated page > > has dirty D-cache by default. > > I guess we could also set PG_arch_1 in the DMA API as well, to avoid the > unnecessary D cache flushing when clean pages get mapped into userspace. That's an interesting thought for us too. When doing I$/D$ coherency, we have to fist flush the D$ and then invalidate the I$. If we could keep track of D$ and I$ separately, we could avoid the first step in many cases, including the DMA API trick you mentioned. I wonder if it's time to get a PG_arch_2 :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 27 Feb 2010 19:30 On Fri, 2010-02-26 at 21:49 +0000, Russell King - ARM Linux wrote: > On Sat, Feb 27, 2010 at 08:40:29AM +1100, Benjamin Herrenschmidt wrote: > > Hrm, the DMA API certainly doesn't handle the I$/D$ coherency on > > powerpc.. I'm afraid that whole cache handling stuff is totally > > inconsistent since different archs have different expectations here. > > It doesn't on ARM either. Ok, pfiew :-) So far, my understanding with I$/D$ is that we only care in a few cases which is executing of an mmap'ed piece of executable that is -not- being written to, and swap. I -think- that in both cases, the page cache always pops up a new page with PG_arch_1 clear before the driver gets to either DMA or PIO to it when faulted the first time around, before any PTE is inserted. So the current approach on powerpc with I$/D$ should work fine, and it -might- make sense to use a similar one on PIPT ARM, provided we don't have expectations of the I$/D$ coherency being maintained on -subsequent- writes (PIO or DMA either) to such a page by the same program transparently by the kernel. There's two potential problems with the approach, and maybe more that I have missed though. One is the case of a networked filesystem where the executable pages are modified remotely. However, I would expect such a program to invalidate the PTE mappings before making the change visible, so we -do- get a chance to re-flush provided something clears PG_arch_1. Then, there's In the case of a multithread app, where one thread does the cache flush and another thread then executes, the earlier ARMs without broadcast ops have a potential problem there. In fact, some variant of PowerPC 440 have the same problem and some people are (ab)using those for SMP setups I'm being told. For that case, I see two options. One is a big hammer but would make existing code work to "most" extent: Don't allow a page to be both writable and executable. Ping-pong the page permission lazily and flush when transitioning from write to exec. That means using a spare bit for Linux _PAGE_RW separate from your real RW bit I suppose, since you have HW loaded PTEs (on 440 it's easier since we SW load, we can do the fixup there, though it has a perf impact obviously). Another option would be to make some syscall mandatory to "sync" caches which could then do IPIs or whatever else is needed. But that would require changing existing userspace code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 27 Feb 2010 19:40 On Fri, 2010-02-26 at 22:03 +0000, Russell King - ARM Linux wrote: > On Sat, Feb 27, 2010 at 08:49:40AM +1100, Benjamin Herrenschmidt wrote: > > It will deadlock if you use normal IRQs. I don't see a good way around > > that other than using a higher-level type of IRQs. I though ARM has > > something like that (FIQs ?). Can you use those guys for IPIs ? > > If the hardware did support using FIQs for IPIs, this would not be > desirable because then it takes it away from the SoC folk to do what > they will with it. > > In the past, it's been used as a fast CPU-driven "DMA" interface - > some SoCs have been wired up in such a way that's the only use > available for the FIQ. This is an issue indeed. > The other problem we'd encounter using FIQs for IPIs is that some IPIs > need to take locks - and in order to make that safe, we'd either need > another class of locks which disable IRQs and FIQs together, or we'd > need to disable FIQs everywhere we disable IRQs - at which point FIQs > become utterly pointless. That's solvable easily :-) I mentioned having potentially to deal with a similar problem with people using PowerPC 440 for SMP (doesn't broadcast cache ops either). 440 has critical interrupts, which are akin to FIQs. The trick here is that you don't use -only- critical interrupts for IPIs. You use normal interrupts for all the current IPI types. You -add- a fast one using critical interrupts specifically for cache ops, with a very fast asm only path. This works for us because masking interrupts doesn't mask critical interrupts (it's a separate mask bit in our MSR). If that isn't the case with FIQs then the whole idea is moot. > (There only differences between FIQ and IRQ are: > - on simultaneous raising of both, the FIQ will be called before the IRQ. > - each has its own (single) vector. > - invocation of FIQ masks IRQ. > > What I'm saying is that what gives FIQ an advantage for SoC people is > that it's bare bones light weight and therefore extremely fast - as soon > as you load it up with additional complexity, it becomes less useful.) I understand. Then Catalin idea of tricking the cache with load and stores would work for the D$ side of thing. The I$ side of thing probably still needs IPIs though, and you might need to use non-blocking async SMP call function for that if you're going to do it from set_pte_at() instead of update_mmu_cache() since the later is racy. In any case, it's a lot less of a deadlock nest than the D$ side which needs to be dealt with in the DMA ops, called below layers of driver and subsystem locks. Note: Somebody at ARM needs to be severely beaten up for coming up with that SMP scheme without broadcast cache ops and not also mandating some kind FIQ IPI scheme that isn't masked with normal interrupts :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on 28 Feb 2010 00:10
On Sun, 2010-02-28 at 11:14 +1100, Benjamin Herrenschmidt wrote: > On Fri, 2010-02-26 at 21:00 +0000, Russell King - ARM Linux wrote: > > On Fri, Feb 26, 2010 at 04:25:21PM +0000, Catalin Marinas wrote: > > > For mmap'ed pages (and present in the page cache), is it guaranteed that > > > the HCD driver won't write to it once it has been mapped into user > > > space? If that's the case, it may solve the problem by just reversing > > > the meaning of PG_arch_1 on ARM and assume that a newly allocated page > > > has dirty D-cache by default. > > > > I guess we could also set PG_arch_1 in the DMA API as well, to avoid the > > unnecessary D cache flushing when clean pages get mapped into userspace. > > That's an interesting thought for us too. When doing I$/D$ coherency, we > have to fist flush the D$ and then invalidate the I$. If we could keep > track of D$ and I$ separately, we could avoid the first step in many > cases, including the DMA API trick you mentioned. > > I wonder if it's time to get a PG_arch_2 :-) Sorry to be a bit late to the party (on holiday), but I/D coherency is supposed to be taken care of using flush_cache_page in the memory mapping routines. On parisc, at least, we don't use any PG_arch flags to help. The way it's supposed to work is that I is invalidated on mapping or remapping, so the I/O code only needs to worry about flushing D. The guarantee we pass to userland is that any page we do I/O to has a clean D cache before it goes back to userspace. Thus if userspace executes the page, the I cache gets its first movein there. There is an underlying assumption to all of this: The CPU won't speculatively move in I cache until the page is executed, so we can rely on the flush_cache_page in the mapping to keep the I cache invalidated until we're ready to execute. The other fundamental assumption is that if userspace needs to modify an executable region (say for dynamic linking) it has to take care of reinvalidating the I cache itself ... although it can do this by remapping the region to alter the flags (i.e W no X then X no W). But the point of all of this is that I cache invalidation doesn't appear anywhere in the I/O path ... so if we're getting I/D incoherency, there's some problem in the mm code (or there's a missing arch assumption ... like I cache gets moved in more aggressively than we expect). Parisc is very sensitive to I/D incoherency, so we'd notice if there were a serious generic problem here. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |