Prev: BUG kmalloc-64: Poison overwritten, INFO: Allocated in bdi_alloc_work+0x2b/0x100 age=175 cpu=1 pid=3514
Next: [PATCH 62/72] Blackfin: bf537-stamp: add adp5588 gpio resources
From: Eric Paris on 14 Sep 2009 15:10 Long ago I implemented fanotify as basically a /dev interface using ioctls(). Alan suggested I use a socket protocol and could then make use of get/setsockopt() which although still not great is light years better than ioctl. Currently the fanotify interface as I want to push it to Linus and as I've been requesting comments on for the last 1.25 years is just that. It really makes no use of the networking system other than bind() and setsockopt() and everyone tends to agree the things I want to do can't reasonably be done using network hooks and a 'real' socket protocol. I like this interface, setsockopt() makes it so easy to add new functionality as we flush out other users. fanotify as it stands today has a number of groups who will port to it, has a nubmer of advantages over inotify, and I have been told privately meets the needs of the original group of people who paid me to work on it (two very large anti-malware companies who currently unprotect and hack the syscall table of their users) Just this week I got another request to look at syscalls. So I did, I haven't prototyped it, but I can do it with syscalls, they would look like this: int fanotify_init(int flags, int f_flags, __u64 mask, unsigned int priority); int fanotify_add_mark(int fanotify_fd, char *path, __u64 mask, __u64 ignored_mask); int fanotify_add_mark_fd(int fanotify_fd, int fd, __u64 mask, __u64 ignored_mask); int fanotify_rm_mark(int fanotify_fd, char *path, __u64 mask); int fanotify_rm_mark_fd(int fanotify_fd, int fd, __u64 mask); Those above 4 could probably be squashed into 2 syscalls with an extra flags field. int fanotify_clear_marks(int fanotify_fd); int fanotify_perm_response(int fanotify_fd, __u64 cookie, int response); int fanotify_ignore_sb(int fanotify_fd, long f_type); int fanotify_ignore_fsid(int fanotify_fd, fsid_t f_fsid); These 2 are the most questionable, they would honestly only be used for things that wanted system wide notification, I can't imagine that being many things other than AV vendors. But they really need a way to exclude notification when people open/close/read/write to /proc (which is the point of the ignore_sb.) Since I don't have a solution to subtree notification I don't know if it will work in this syscall framework. I know people want subtree notification and I'm willing to take a stab at it after the fscking all notification is accepted. That's one of the main reasons I like setsockopt over tons of syscalls. I can add a new one very easily. I also can easily expand arguments by just creating a new sockopt. No userspace headaches. Are there demands that I convert to syscalls? Do I really gain anything using 9 new inextensible syscalls over socket(), bind(), and 8 setsockopt() calls? I'd like to send these patches along, so a ruling from on high would be great.... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Evgeniy Polyakov on 15 Sep 2009 16:20 On Mon, Sep 14, 2009 at 03:08:15PM -0400, Eric Paris (eparis(a)redhat.com) wrote: > Just this week I got another request to look at syscalls. So I did, I > haven't prototyped it, but I can do it with syscalls, they would look > like this: > > int fanotify_init(int flags, int f_flags, __u64 mask, unsigned int priority); .... > Are there demands that I convert to syscalls? Do I really gain anything > using 9 new inextensible syscalls over socket(), bind(), and 8 > setsockopt() calls? In my personal opinion sockets are way much simpler and obvious than syscalls. Also one should not edit hundred of arch-dependant headers conflicting with other pity syscallers. But implementing af_fanotify opens a door for zillions of other af_something which can be implemented using existing infrastructure namely netlink will solve likely 99% of potential usage cases. And frankly I did not find it way too convincing that using netlink is impossible in your scenario if some things will be simplified, namely event merging. Moreover you can implement a pool of working threads and postpone all the work to them and appropriate event queues, which will allow to use rlimits for the listeners and open files 'kind of' on behalf of those processes. But it is quite diferent from the approach you selected and which is more obvious indeed. So if you ask a question whether fanotify should use sockets or syscalls, I would prefer sockets, but I still recommend to rethink your decision to move away from netlink to be 100% sure that it will not solve your needs. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Paris on 15 Sep 2009 18:00 On Wed, 2009-09-16 at 00:16 +0400, Evgeniy Polyakov wrote: > On Mon, Sep 14, 2009 at 03:08:15PM -0400, Eric Paris (eparis(a)redhat.com) wrote: > > Just this week I got another request to look at syscalls. So I did, I > > haven't prototyped it, but I can do it with syscalls, they would look > > like this: > > > > int fanotify_init(int flags, int f_flags, __u64 mask, unsigned int priority); > > ... > > > Are there demands that I convert to syscalls? Do I really gain anything > > using 9 new inextensible syscalls over socket(), bind(), and 8 > > setsockopt() calls? > > In my personal opinion sockets are way much simpler and obvious than > syscalls. Also one should not edit hundred of arch-dependant headers > conflicting with other pity syscallers. > > But implementing af_fanotify opens a door for zillions of other > af_something which can be implemented using existing infrastructure > namely netlink will solve likely 99% of potential usage cases. > > And frankly I did not find it way too convincing that using netlink is > impossible in your scenario if some things will be simplified, namely > event merging. Nothing's impossible, but is netlink a square peg for this round hole? One of the great benefits of netlink, the attribute matching and filtering, although possibly useful isn't some panacea as we have to do that well before netlink to have anything like decent performance. Imagine every single fs event creating an skb and sending it with netlink only to have most of them dropped. The only other benefit to netlink that I know of is the reasonable, easy, and clean addition of information later in time with backwards compatibility as needed. That's really cool, I admit, but with the limited amount of additional info that users have wanted out of inotify I think my data type extensibility should be enough. > Moreover you can implement a pool of working threads and > postpone all the work to them and appropriate event queues, which will > allow to use rlimits for the listeners and open files 'kind of' on > behalf of those processes. I'm sorry, I don't userstand. I don't see how worker threads help anything here. Can you explain what you are thinking? > But it is quite diferent from the approach you selected and which is > more obvious indeed. So if you ask a question whether fanotify should > use sockets or syscalls, I would prefer sockets I've heard someone else off list say this as well. I'm not certain why. I actually spent the day yesterday and have fanotify working over 5 new syscalls (good thing I wrote the code with separate back and and front ends for just this purpose) And I really don't hate it. I think 3 might be enough. fanotify_init() ---- very much like inotify_init fanotify_modify_mark_at() --- like inotify_add_watch and rm_watch fanotify_modify_mark_fd() --- same but with an fd instead of a path fanotify_response() --- userspace tells the kernel what to do if requested/allowed (could probably be done using write() to the fanotify fd) fanotify_exclude() --- a horrid syscall which might be better as an ioctl since it isn't strongly typed.... If there is a strong need for syscalls I'm ready and willing. I'd love to hear Linus say socket+setsockopt() is a merge blocker and I have to go to syscalls if he sees it that way. If davem and friends say I'm not networky enough to use socket()+setsockopt() I guess I have to look at netlink (which I'm still not convinced buys us anything vs the crazy complexity) or go with syscalls. I'm perfectly willing to admit this is a /dev+ioctl type interface only implemented on socket+setsockopt(). If that's a no go, someone say it now please. > but I still recommend > to rethink your decision to move away from netlink to be 100% sure that > it will not solve your needs. I don't see what's gained using netlink. I am already reusing the fsnotify code to do all my queuing. Someone help me understand the benefit of netlink and help me understand how we can reasonably meet the needs and I'll try to prototype it. 1) fd's must be opened in the recv process 2) reliability, if loss must know on the send side -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds on 15 Sep 2009 20:00 On Tue, 15 Sep 2009, Eric Paris wrote: > > I don't see what's gained using netlink. I'm personally not a big believer in netlink. What's the point, really? If you are sending datagrams back-and-forth, go wild. But if it's more structured than that, netlink has no actual upsides as far as I can tell. Same goes for sockets in this case, actually. What's the upside? I'll throw out a couple of upsides of actual system calls, people can feel free to comment: - things like 'strace' _work_ and the traces make sense, and you generally see what the app is trying to do from the traces (sure, it takes some time for strace to learn new system calls, but even when it only gives a system call number, it's never any worse than some "made-up packet interface". - if you have a system call definition, it tends to be a much stricter interface than "let's send some packets around with a network interface". - No unnecessary infrastructure. That said, maybe the netlink/socket people can argue for their standpoints. (And btw, I still want to know what's so wonderful about fanotify that we would actually want yet-another-filesystem-notification-interface. So I'm not sayying that I'll take a system call interface. I just don't think that hiding interfaces behind some random packet interface is at all any better) Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Eric Paris on 15 Sep 2009 21:30
On Tue, 2009-09-15 at 16:49 -0700, Linus Torvalds wrote: > And btw, I still want to know what's so wonderful about fanotify that we > would actually want yet-another-filesystem-notification-interface. So I'm > not sayying that I'll take a system call interface. The real thing that fanotify provides is an open fd with the event rather than some arbitrary 'watch descriptor' that userspace must somehow magically map back to data on disk. This means that it could be used to provide subtree notification, which inotify is completely incapable of doing. And it can be used to provide system wide notification. We all know who wants that. It provides an extensible data format which allows growth impossible in inotify. I don't know if anyone remember the inotify patches which wanted to overload the inotify cookie field for some other information, but inotify information extension is not reasonable or backwards compatible. fanotify also allows userspace to make access decisions and cache those in the kernel. Useful for integrity checkers (anti-malware people) and for hierarchical storage management people. I've got private commitments for two very large anti malware companies, both of which unprotect and hack syscall tables in their customer's kernels, that they would like to move to an fanotify interface. Both Red Hat and Suse have expressed interest in these patches and have contributed to the patch set. The patch set is actually rather small (entire set of about 20 patches is 1800 lines) as it builds on the fsnotify work already in 2.6.31 to reuse code from inotify rather than reimplement the same things over and over (like we previously had with inotify and dnotify) Don't know what else to say..... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |