Prev: [PATCH -rc] memcg: disable move charge in no mmu case
Next: [PATCH] x86,perf: Enable not tagged retired instruction counting
From: H. Peter Anvin on 15 Mar 2010 15:50 On 03/15/2010 12:00 PM, David Miller wrote: > From: Ulrich Drepper <drepper(a)redhat.com> > Date: Mon, 15 Mar 2010 09:00:55 -0700 > >> On 03/15/2010 08:13 AM, H. Peter Anvin wrote: >>> One option would be to do a libkernel.so, >> >> No need. Put it in the vdso. And name it something other than syscall. >> The syscall() API is fixed, you cannot change it. >> >> All this only if it makes sense for ALL archs. If it cannot work for >> just one arch then it's not worth it at all. > > There are many archs that still lack VDSO. Putting it into the vdso is also rather annoyingly heavyweight for what is nothing other than an ordinary shared library. Just making it an ordinary shared library seems a lot saner. I don't see why syscall() can't change the type for its first argument -- it seems to be exactly what symbol versioning is for. Doesn't change the fact that it is fundamentally broken, of course. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 15 Mar 2010 16:30 On Mon, 2010-03-15 at 14:44 +0100, Ralf Baechle wrote: > Syscall is most often used for new syscalls that have no syscall stub in > glibc yet, so the user of syscall() encodes this ABI knowledge. If at a > later stage syscall() is changed to have this sort of knowledge we break > the API. This is something only the kernel can get right. Well, no. The change I propose would not break the ABI on powerpc and would auto-magically fix thoses cases :-) But again, you don't have to do the same thing on MIPS or sparc, it's definitely arch specific. IE. What you are saying is that a syscall defined in the kernel as: sys_foo(u64 arg); To be called from userspace would require something like: u64 arg = 0x123456789abcdef01; #if defined(__powerpc__) && WORDSIZE == 32 syscall(SYS_foo, (u32)(arg >> 32), (u32)arg); #ese syscall(SYS_foo, arg); While with the trick of making syscall a macro wrapping an underlying __syscall that has an added dummy argument, the register alignment is "corrected" and thus -both- forms above suddenly work for me. That might actually work for you too. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 15 Mar 2010 16:30 On Sun, 2010-03-14 at 22:54 -0700, David Miller wrote: > From: Benjamin Herrenschmidt <benh(a)kernel.crashing.org> > Date: Mon, 15 Mar 2010 16:18:33 +1100 > > > Or is there any good reason -not- to do that in glibc ? > > The whole point of syscall() is to handle cases where the C library > doesn't know about the system call yet. > > I think it's therefore very much "buyer beware". > > On sparc it'll never work to use the workaround you're proposing since > we pass everything in via registers. > > So arch knowledge will always need to be present in these situations. I'm not sure I follow. We also pass via register on powerpc, but the offset introduced by the sysno argument breaks register pair alignment which cannot be fixed up inside syscall(). However, if I change glibc's syscall to be something like #define syscall(sysno, args...) __syscall(0 /* dummy */, sysno, args) And make __syscall then do something like: mr r0, r4 mr r3, r5 mr r4, r6 mr r5, r7 mr r6, r8 .../... sc blr Then at least all that class of syscalls will be fixed. Of course this has to be in glibc arch code. I was merely asking if that was something our glibc folks would consider and whether somebody could think of a better solution :-) Cheers ,Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 15 Mar 2010 16:40 On Mon, 2010-03-15 at 12:41 -0700, H. Peter Anvin wrote: > I don't see why syscall() can't change the type for its first argument > -- it seems to be exactly what symbol versioning is for. > > Doesn't change the fact that it is fundamentally broken, of course. No need to change the type of the first arg and go for symbol versionning if you do something like I proposed earlier, there will be no conflict between syscall() and __syscall() and both variants can exist. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Benjamin Herrenschmidt on 15 Mar 2010 16:40
> The powerpc implementation of syscall is: > > > ENTRY (syscall) > mr r0,r3 > mr r3,r4 > mr r4,r5 > mr r5,r6 > mr r6,r7 > mr r7,r8 > mr r8,r9 > sc > PSEUDO_RET > PSEUDO_END (syscall) And my proposal is to make it instead: #define syscall(__sysno, __args...) __syscall(0,__sysno,__args) ENTRY (__syscall) mr r0,r4 mr r3,r5 mr r4,r6 mr r5,r7 mr r6,r8 mr r7,r9 mr r8,r10 sc PSEUDO_RET PSEUDO_END (__syscall) > The ABI says: > > "Long long arguments are considered to have 8-byte size and alignment. > The same 8-byte arguments that must go in aligned pairs or registers are > 8-byte aligned on the stack." Right, that's what I'm explaining too. > This implies that the SYS_fallocate call will skip a register to get the > required alignment in the parameter save area. > > for ppc32 on entry > > r3 == SYS_fallocate > r4 == fd > r5 == mode > r6 == not used > r7, r8 == offset > r9 == len len is 64-bit too afaik but let's ignore that for now > This gets shifted to: > > r0 == SYS_fallocate > r3 == fd > r4 == mode > r5 == not used > r6, r7 == offset > r8 == len Which is not correct, as the kernel expects: r0 == SYS_fallocate r3 == fd r4 == mode r5, r6 == offset r7, r8 == len > For syscall the vararg parms will be mirrored to the parameter save area > but will not be used. The ABI does not talk to LE for this case. Right, but the fact that we shift all args by -1- register means that we break the 64-bit register pair alignment compared to the real syscall which uses r0 instead for the syscall number. Hence my proposal to add a dummy argument to restore that alignment. As it is there is userspace code that does: syscall(SYS_fallocate, fd, mode, offset, len); Which works on x86 but is broken on ppc32 unless we do that change. Cheers, Ben. > Ryan does the new ABI doc cover this? > > > This will break because the first argument to syscall now shifts > > everything by one register, which breaks the register pair alignment > > (and I suppose archs with stack based calling convention can have > > similar alignment issues even if x86 doesn't). > > > > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as > > it's first argument to correct that ? Either that or making it some kind > > of macro wrapper around a __syscall(int dummy, int sysno, ...) ? > > > > As it is, any 32-bit app using syscall() on any of the syscalls that > > takes 64-bit arguments will be broken, unless the app itself breaks up > > the argument, but the the order of the hi and lo part is different > > between BE and LE architectures ;-) > > > > So is there a more "correct" solution than another here ? Should powerpc > > glibc be fixed at least so that syscall() keeps the alignment ? > > > > Cheers, > > Ben. > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo(a)vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |