Prev: [PATCH -tip v3 1/5] [BUGFIX] kprobes: Fix selftest to clear flags field for reusing probes
Next: [BUGFIX] kprobes: Fix selftest to clear flags field for reusing probes
From: Brian Bloniarz on 24 May 2010 11:00 On 05/24/2010 03:28 AM, Michael Kerrisk wrote: > Actually, SO_*BUF is pretty weird. It returns double what was > supplied. It's not simply a matter of rounding up: it always doubles > what was supplied. Rationale in net/core/sock.c: set_rcvbuf: sk->sk_userlocks |= SOCK_RCVBUF_LOCK; /* * We double it on the way in to account for * "struct sk_buff" etc. overhead. Applications * assume that the SO_RCVBUF setting they make will * allow that much actual data to be received on that * socket. * * Applications are unaware that "struct sk_buff" and * other overheads allocate from the receive buffer * during socket buffer allocation. * * And after considering the possible alternatives, * returning the value we actually used in getsockopt * is the most desirable behavior. */ if ((val * 2) < SOCK_MIN_RCVBUF) sk->sk_rcvbuf = SOCK_MIN_RCVBUF; else sk->sk_rcvbuf = val * 2; break; I'm guessing pipes don't have this kind of wrinkle. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Kerrisk on 24 May 2010 11:50 On Mon, May 24, 2010 at 4:51 PM, Brian Bloniarz <bmb(a)athenacr.com> wrote: > On 05/24/2010 03:28 AM, Michael Kerrisk wrote: >> Actually, SO_*BUF is pretty weird. It returns double what was >> supplied. It's not simply a matter of rounding up: it always doubles >> what was supplied. > > Rationale in net/core/sock.c: > > set_rcvbuf: > � � � � � � � �sk->sk_userlocks |= SOCK_RCVBUF_LOCK; > � � � � � � � �/* > � � � � � � � � * We double it on the way in to account for > � � � � � � � � * "struct sk_buff" etc. overhead. � Applications > � � � � � � � � * assume that the SO_RCVBUF setting they make will > � � � � � � � � * allow that much actual data to be received on that > � � � � � � � � * socket. > � � � � � � � � * > � � � � � � � � * Applications are unaware that "struct sk_buff" and > � � � � � � � � * other overheads allocate from the receive buffer > � � � � � � � � * during socket buffer allocation. > � � � � � � � � * > � � � � � � � � * And after considering the possible alternatives, > � � � � � � � � * returning the value we actually used in getsockopt > � � � � � � � � * is the most desirable behavior. > � � � � � � � � */ > � � � � � � � �if ((val * 2) < SOCK_MIN_RCVBUF) > � � � � � � � � � � � �sk->sk_rcvbuf = SOCK_MIN_RCVBUF; > � � � � � � � �else > � � � � � � � � � � � �sk->sk_rcvbuf = val * 2; > � � � � � � � �break; > > I'm guessing pipes don't have this kind of wrinkle. Yes, all of the above is understood. It's exposing these details to userspace that's weird... -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface" http://blog.man7.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 24 May 2010 13:20 On Mon, May 24 2010, OGAWA Hirofumi wrote: > Jens Axboe <jens.axboe(a)oracle.com> writes: > > >> >> I'd recommend this: Pass it in and out in bytes. Don't round to a > >> >> power of 2. Require the user to know what they are doing. Give an > >> >> error if the user doesn't supply a power-of-2 * page-size for > >> >> F_SETPIPE_SZ. (Again, consider the case of architectures with > >> >> switchable page sizes.) > >> > > >> > But is there much point in erroring on an incorrect size? If the > >> > application says "I need at least 120kb of space in there", kernel > >> > returns "OK, you got 128kb". Would returning -1/EINVAL for that case > >> > really make a better API? Doesn't seem like it to me. > >> > >> FWIW, my first impression of this was setsockopt(SO_RCV/SNDBUF) of unix > >> socket. Well, API itself wouldn't say "at least this size" or "exactly > >> this size", so, in here, important thing is consistency of interfaces, I > >> think. (And the both is sane API at least for me if those had > >> consistency in the system.) > >> > >> Well, so how about set/get in bytes, and kernel will set "at least > >> specified size" actually like setsockopt(SO_RCV/SNDBUF)? > > > > Isn't that pretty much what I described? > > Yes, probably. Well, 120kb was still multiple of page size. :) It is, but 120KB/page_size is not (which is the power-of-2 of interest here). -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jens Axboe on 24 May 2010 13:40 On Mon, May 24 2010, Michael Kerrisk wrote: > > Right, that looks like a thinko. > > > > I'll submit a patch changing it to bytes and the agreed API and fix this > > -Eerror. Thanks for your comments and suggestions! > > Thanks. And of course you are welcome. (Please CC linux-api(a)vger on > this patche (and all patches that change the API/ABI.) The first change is this: http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b and the one dealing with the pages vs bytes API is this: http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29 Not tested yet, will do so before sending in of course. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Michael Kerrisk on 24 May 2010 14:00
On Mon, May 24, 2010 at 7:35 PM, Jens Axboe <jens.axboe(a)oracle.com> wrote: > On Mon, May 24 2010, Michael Kerrisk wrote: >> > Right, that looks like a thinko. >> > >> > I'll submit a patch changing it to bytes and the agreed API and fix this >> > -Eerror. Thanks for your comments and suggestions! >> >> Thanks. And of course you are welcome. (Please CC linux-api(a)vger on >> this patche (and all patches that change the API/ABI.) > > The first change is this: > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=0191f8697bbdfefcd36e7b8dc3eeddfe82893e4b > > and the one dealing with the pages vs bytes API is this: > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=b9598db3401282bb27b4aef77e3eee12015f7f29 > > Not tested yet, will do so before sending in of course. Eyeballing it quickly, these changes look right. Do you have some test programs you can make available? Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface" http://blog.man7.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |