Prev: [RESEND PATCH v2 3/6] lis3: Introduce platform data for second ff / wu unit
Next: in x86 architecture ,why the function atomic_sub_and_test() does not disable the interrupt?
From: Benjamin Herrenschmidt on 6 Apr 2010 02:10 On Tue, 2010-04-06 at 14:52 +0900, KOSAKI Motohiro wrote: (Adding linux-arch) > This check was introduced the following commit. yes now we don't > consider arch specific PROT_xx flags. but I don't think it is odd. > > Yeah, I can imagine at least embedded people certenary need arch > specific PROT_xx flags and they hope to change it. but I don't > think mprotect() fit for your usage. I mean mprotect() is widely > used glibc internally. then, If mprotec can change which flags, > glibc might turn off such flags implictly. > > So, Why can't we proper new syscall? It has no regression risk. I don't care much personally whether we use mprotect() or a new syscall, but at this stage we already have PROT_SAO going that way for powerpc so that would be an ABI change. However, the main issue isn't really there. The main issue is that right now, everything we do in mmap.c, mprotect.c, ... revolves around having everything translated into the single vm_flags field. VMA merging decisions, construction of vm_page_prot, etc... everything is there. However, this is a 32-bit field on 32-bit archs, and we already use all possible bits in there. It's also a field entirely defined in generic code with no provision for arch specific bits. The question here thus boils down to what direction do we want to go to if we want to untangle that and provide the ability to expose mapping "attributes" basically. In fact, I suspect even x86 might have good use of that to create things like relaxed ordering mappings no ? This boils down, so far to a few facts/questions to be resolved: - Do we want to use the existing PROT_ argument to mmap, mprotect,... ? There's plenty of bit space, and we already have at least one example of an arch adding something to it (powerpc with PROT_SAO - aka Strong Access Ordering - aka Make It Look Like An x86 :-) - If not, while a separate syscall would be fine with me for setting attributes after the fact, it makes it harder to pass them via mmap, is that a big deal ? IE. Ie it means one -always- has to call it after mmap to change the attributes. That means for example that mmap will potentially create a VMA merged with another one, just to be re-split due to the attribute change. A bit gross... - Do we want to keep the current "Funnel everything into vm_flags" approach ? That leaves no option that I can see but to extend it into a u64 so it grows on 32-bit archs. - If not, I see two approaches here: Either having a separate / new "attribute" field in the VMA or going straight for the vm_page_prot (ie. the pgprot). In both cases, things like vma_merge() need to grow a new argument since obviously we can't merge things with different attributes. - ... Unless we just replace VM_SAO with VM_CANT_MERGE and set that whenever a VMA has a non-0 attributes. Sad but simpler Any other / better idea ? Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 6 Apr 2010 03:40 On Tue, 2010-04-06 at 15:24 +0900, KOSAKI Motohiro wrote: > I guess you haven't catch my intention. I didn't say we have to remove > PROT_SAO and VM_SAO. > I mean mmap(PROT_SAO) is ok, it's only append new flag, not change exiting > flags meanings. I'm only against mprotect(PROT_NONE) turn off PROT_SAO > implicitely. > > IOW I recommend we use three syscall > mmap() create new mappings > mprotect() change a protection of mapping (as a name) > mattribute(): (or similar name) > change an attribute of mapping (e.g. PROT_SAO or > another arch specific flags) > > I'm not against changing mm/protect.c for PROT_SAO. Ok, I see. No biggie. The main deal remains how we want to do that inside the kernel :-) I think the less horrible options here are to either extend vm_flags to always be 64-bit, or add a separate vm_map_attributes flag, and add the necessary bits and pieces to prevent merge accross different attribute vma's. The more I try to hack it into vm_page_prot, the more I hate that option. Cheers Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 6 Apr 2010 18:20
On Tue, 2010-04-06 at 19:26 +0900, KOSAKI Motohiro wrote: > > Ok, I see. No biggie. The main deal remains how we want to do that > > inside the kernel :-) I think the less horrible options here are > > to either extend vm_flags to always be 64-bit, or add a separate > > vm_map_attributes flag, and add the necessary bits and pieces to > > prevent merge accross different attribute vma's. > > vma->vm_flags already have VM_SAO. Why do we need more flags? > At least, I dislike to add separate flags member into vma. > It might introduce unnecessary messy into vma merge thing. Well, we did shove SAO in there, and used up the very last vm_flag for it a while back. Now I need another one, for little endian mappings. So I'm stuck. But the problem goes further I believe. Archs do nowadays have quite an interesting set of MMU attributes that it would be useful to expose to some extent. Some powerpc's also provide storage keys for example and I think ARM have something along those lines. There's interesting cachability attributes too, on x86 as well. Being able to use such attributes to request for example a relaxed ordering mapping on x86 might be useful. I think it basically boils down to either extend vm_flags to always be 64-bit, which seems to be Nick preferred approach, or introduct a vm_attributes with all the necessary changes to the merge code to take it into account (not -that- hard tho, there's only half a page of results in grep for these things :-) Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |