Prev: [PATCH -rc] memcg: disable move charge in no mmu case
Next: [PATCH] x86,perf: Enable not tagged retired instruction counting
From: Steven Munroe on 15 Mar 2010 11:00 On Mon, 2010-03-15 at 15:48 +1100, Benjamin Herrenschmidt wrote: > Hoy there ! > > This may have been discussed earlier (I have some vague memories...) but > I just hit a problem with that again (Mark: hint, it's in hdparm's > fallocate) so I'd like a bit of a refresh here on what is the "right > thing" to do... > > So some syscalls want a 64-bit argument. Let's take fallocate() as our > example. So we already know that we have to be extra careful since some > 32-bit arch will pass this into 2 registers (or stack slots) which need > to be aligned, and so we tend to already take care of making sure that > the said 64-bit argument is either defined as 2x32-bit arguments, or > defined as 1x64 bit argument aligned to 2x32-bit in the argument list. > > So far so good... > > The problem is when user space tries to use the same trick for calling > those functions using glibc-provided syscall() function. In this > example, hdparm does: > > err = syscall(SYS_fallocate, fd, mode, offset, len); > > With "offset" being a 64-bit argument. > The powerpc implementation of syscall is: ENTRY (syscall) mr r0,r3 mr r3,r4 mr r4,r5 mr r5,r6 mr r6,r7 mr r7,r8 mr r8,r9 sc PSEUDO_RET PSEUDO_END (syscall) The ABI says: "Long long arguments are considered to have 8-byte size and alignment. The same 8-byte arguments that must go in aligned pairs or registers are 8-byte aligned on the stack." This implies that the SYS_fallocate call will skip a register to get the required alignment in the parameter save area. for ppc32 on entry r3 == SYS_fallocate r4 == fd r5 == mode r6 == not used r7, r8 == offset r9 == len This gets shifted to: r0 == SYS_fallocate r3 == fd r4 == mode r5 == not used r6, r7 == offset r8 == len For syscall the vararg parms will be mirrored to the parameter save area but will not be used. The ABI does not talk to LE for this case. Ryan does the new ABI doc cover this? > This will break because the first argument to syscall now shifts > everything by one register, which breaks the register pair alignment > (and I suppose archs with stack based calling convention can have > similar alignment issues even if x86 doesn't). > > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as > it's first argument to correct that ? Either that or making it some kind > of macro wrapper around a __syscall(int dummy, int sysno, ...) ? > > As it is, any 32-bit app using syscall() on any of the syscalls that > takes 64-bit arguments will be broken, unless the app itself breaks up > the argument, but the the order of the hi and lo part is different > between BE and LE architectures ;-) > > So is there a more "correct" solution than another here ? Should powerpc > glibc be fixed at least so that syscall() keeps the alignment ? > > Cheers, > Ben. > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on 15 Mar 2010 11:10 Benjamin Herrenschmidt wrote: > err = syscall(SYS_fallocate, fd, mode, offset, len); > > With "offset" being a 64-bit argument. > > This will break because the first argument to syscall now shifts > everything by one register, which breaks the register pair alignment > (and I suppose archs with stack based calling convention can have > similar alignment issues even if x86 doesn't). > > Ulrich, Steven, shouldn't we have glibc's syscall() take a long long as > it's first argument to correct that ? Either that or making it some kind > of macro wrapper around a __syscall(int dummy, int sysno, ...) ? > > As it is, any 32-bit app using syscall() on any of the syscalls that > takes 64-bit arguments will be broken, unless the app itself breaks up > the argument, but the the order of the hi and lo part is different > between BE and LE architectures ;-) > > So is there a more "correct" solution than another here ? Should powerpc > glibc be fixed at least so that syscall() keeps the alignment ? There are several problems with syscall(), not just this - because a number of system calls in section 2 of the manual don't map directly to kernel syscalls with the same function prototype. Even fork() has become something complicated in Glibc that doesn't use the fork syscall :-( So anything using syscall() has to be careful on Linux already. Changing the 64-bit alignment won't fix the other differences. -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on 15 Mar 2010 11:30 On 03/15/2010 06:44 AM, Ralf Baechle wrote: > > Syscall is most often used for new syscalls that have no syscall stub in > glibc yet, so the user of syscall() encodes this ABI knowledge. If at a > later stage syscall() is changed to have this sort of knowledge we break > the API. This is something only the kernel can get right. > One option would be to do a libkernel.so, with auto-generated stubs out of the kernel build tree. As already discussed in #kernel this morning, there are a number of sticky points with types and namespaces for this this, but those aren't any worse than the equivalent problems for syscall(3). -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ulrich Drepper on 15 Mar 2010 12:10 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 03/15/2010 08:13 AM, H. Peter Anvin wrote: > One option would be to do a libkernel.so, No need. Put it in the vdso. And name it something other than syscall. The syscall() API is fixed, you cannot change it. All this only if it makes sense for ALL archs. If it cannot work for just one arch then it's not worth it at all. - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAkueWbcACgkQ2ijCOnn/RHRtBQCeP88S/0xei7CAt65AGboqsrC8 N7wAoK7Qbi+OZuQrgHTCgTA27TgY+gQU =4tJ6 -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on 15 Mar 2010 15:00
From: Ulrich Drepper <drepper(a)redhat.com> Date: Mon, 15 Mar 2010 09:00:55 -0700 > On 03/15/2010 08:13 AM, H. Peter Anvin wrote: >> One option would be to do a libkernel.so, > > No need. Put it in the vdso. And name it something other than syscall. > The syscall() API is fixed, you cannot change it. > > All this only if it makes sense for ALL archs. If it cannot work for > just one arch then it's not worth it at all. There are many archs that still lack VDSO. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |