From: Rob Donovan on 8 Aug 2010 15:30 Hi, We use CISAM files a lot in our application, which uses the FCNTL system call for record locking. I've noticed a possible problem in though with FCNTL, after a lot of work using the systemtap tracing program. The problem is, when you have lots of F_RDLCK locks being created and released, then it slows down any F_WRLCK with F_SETLKW locks massively. It's because the F_RDLCK seems to 'drown out' the write locks. Because our system (it's a large system with 700-800 users, so lots of activity) does lots more reads than writes, it causes the writes to be very slow. This is because (I think), if I have say 15 processes doing read locks, and 1 process doing write wait locks, then when the write tries to get a lock. It can't, because process 1 has a read lock, so it. Then I think how it works is that when the read lock gets released it then wakes up any other locks waiting (i.e. the write), so that it can then try to lock. The problem is that, if process 1 creates a read lock, then the write process tries to get its lock and cant, so it sleeps, then process 2 gets a read lock (which it can at this point) and then process 1 releases its lock, wakes up the write process, but because process 2 got its read lock, the write process still can't get its lock, so its sleeps again. This goes on for quite some time, until eventually, the write process gets lucky and actually grabs a lock. (I think the write lock actually sits in the 'for' loop in do_lock_file_wait() in fs/locks.c, waiting for the lock to be freed) Obviously, this slows down the write locks a lot. I can show this by running some code (not the actual application code, just a test example to show it happening a lot). If you touch a file 'control.dat' in your current dir, and run test_read (code example below) in the background with 15 sessions, and then run test_write once. test_write will hardly ever gets a write lock (seen by systemtap or strace) and will just wait. It's not that bad in our application, but the writes slow down massively (to .03ms compared to .00003 normally, and sometimes 3-6 seconds for just 1 write lock). Is there anything that can possibly be done in the kernel to help this, as I would have thought this could cause problems with other people? One possible solution would be that when the write lock tries to get a lock and cant, its actually puts its lock in a queue of some kind, so that the other reads that are about to start can see that, and they 'queue' and wait for the write lock first.. I'm obviously not a kernel coder, so I have no idea of the effects of something like that, hence this post. I've tried this on various versions, and it seems to be the same on, Fedora 2.6.33.6-147.2.4.fc13.i686, RHEL5 & RHEL6 Beta. Thanks for any input or help, Rob. test_read.c: #include <fcntl.h> main() { int myfd; char buffer[5000]; struct flock myflock; myfd = open("control.dat",O_RDWR); while (1) { myflock.l_type = F_RDLCK; myflock.l_whence = SEEK_SET; myflock.l_start = 0; myflock.l_len = 1073741823; myflock.l_pid = getpid(); fcntl(myfd, F_SETLKW, &myflock); lseek(myfd, 0, SEEK_SET); read(myfd, buffer, 200); myflock.l_type = F_UNLCK; fcntl(myfd, F_SETLKW, &myflock); } } test_write.c: #include <fcntl.h> #include <time.h> main() { struct timespec mytime; struct flock myflock; int myfd; char buffer[5000]; myfd = open("control.dat",O_RDWR); while (1) { myflock.l_type = F_WRLCK; myflock.l_whence = SEEK_SET; myflock.l_start = 0; myflock.l_len = 1; myflock.l_pid = getpid(); fcntl(myfd, F_SETLKW, &myflock); lseek(myfd, 0, SEEK_SET); read(myfd, buffer, 200); myflock.l_type = F_UNLCK; fcntl(myfd, F_SETLKW, &myflock); mytime.tv_sec = 0; mytime.tv_nsec = 10000; nanosleep(&mytime,NULL); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Friesen on 9 Aug 2010 17:50 On 08/08/2010 01:26 PM, Rob Donovan wrote: > The problem is, when you have lots of F_RDLCK locks being created and > released, then it slows down any F_WRLCK with F_SETLKW locks massively. > Is there anything that can possibly be done in the kernel to help this, as I > would have thought this could cause problems with other people? > > One possible solution would be that when the write lock tries to get a lock > and cant, its actually puts its lock in a queue of some kind, so that the > other reads that are about to start can see that, and they 'queue' and wait > for the write lock first.. I'm obviously not a kernel coder, so I have no > idea of the effects of something like that, hence this post. What you're seeing is classical "reader priority" behaviour. The alternative is "writer priority". I don't think POSIX specifies which behaviour to use, so it's up to the various implementations. If you really need writer priority, how about building your own lock object in userspace on top of fcntl locks? -- Chris Friesen Software Developer GENBAND chris.friesen(a)genband.com www.genband.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Rob Donovan on 11 Aug 2010 12:30 Hi, Not sure it's about read or write 'priority' so much is it? I wouldn't want to particularly favour writes over reads either, or it will just make the problem happen for reads wouldn't it? And to do this, and make it favour writes, I presume it would have to be coded into the kernel to do this, there isn't any 'switch' for me to try? Could we not have it 'fairly' process locks? So that if a read lock comes along, and there is a write lock waiting for another read lock to unlock, then the 2nd read has to wait for the write lock. Not particularly because the write lock has priority, but because it was requested after the write lock was. In my example, if you run 15 of the read process, the write process never gets the chance to lock, ever, as its continually blocked by 1 or more of the reads. Running 15 of the read processes is much more load than our real system gets, so we don't get writes blocked totally like that, but they can block for 10 or more seconds sometimes. Which is quite excessive for 1 write. To me, it seems like there needs to be something in the fcntl() routines so that when a lock is called with F_SETLKW, if it gets blocked then it needs to put its 'request' in some kind of queue, so that if any more reads come along, they know there is already a lock waiting to get the lock before it, so they queue up behind it. Or is that kind of checking / queuing going to slow down the calls to much, maybe? Example of what is happening in my test: Process 1, creates a read lock Process 2, tries to create a write wait lock, but cant because of process 1, so it sleeps. Process 3, creates a read lock (since nothing is blocking this) Process 1, unlocks and wakes up any waiting locks, i.e. the write lock process 2. Process 2, gets waken up, and tries to lock, but cant because of process 3 read lock, so sleeps again. Process 4, creates a read lock (since nothing is blocking this) Process 3, unlocks and wakes up any waiting locks, i.e. the write lock process 2. Process 2, gets waken up, and tries to lock, but cant because of process 4 read lock, so sleeps again. Process 5, creates a read lock.... This can go on and on until the write lock becomes 'lucky' enough to get waken up when just when the last read lock gets unlocked and before another read lock starts. Then it can get its lock. We moved to RHEL5 recently (from Tru64) and we have massive problems with fcntl calls, because of the way RHEL5 does its BKL. The more read fcntl calls the system got the slower the fcntl syscall became, globally. We're now testing RHEL6 beta which has changes to the BKL (spin-locks vs semaphores I believe), and now the read fcntl calls are much quicker and don't effect each other so much, which I think has caused this 'write' locks problem for us. Because now we get lots more fcntl read locks as they are quicker. (However, I'm still testing this with systemtap to 'prove' it) I don't think I can start writing my own lock objects :) .... We are using CISAM from IBM, and don't actually have control of the FCNTL calls. Rob. -----Original Message----- From: linux-kernel-owner(a)vger.kernel.org [mailto:linux-kernel-owner(a)vger.kernel.org] On Behalf Of Chris Friesen Sent: 09 August 2010 22:41 To: Rob Donovan Cc: linux-kernel(a)vger.kernel.org Subject: Re: FCNTL Performance problem On 08/08/2010 01:26 PM, Rob Donovan wrote: > The problem is, when you have lots of F_RDLCK locks being created and > released, then it slows down any F_WRLCK with F_SETLKW locks massively. > Is there anything that can possibly be done in the kernel to help this, as I > would have thought this could cause problems with other people? > > One possible solution would be that when the write lock tries to get a lock > and cant, its actually puts its lock in a queue of some kind, so that the > other reads that are about to start can see that, and they 'queue' and wait > for the write lock first.. I'm obviously not a kernel coder, so I have no > idea of the effects of something like that, hence this post. What you're seeing is classical "reader priority" behaviour. The alternative is "writer priority". I don't think POSIX specifies which behaviour to use, so it's up to the various implementations. If you really need writer priority, how about building your own lock object in userspace on top of fcntl locks? -- Chris Friesen Software Developer GENBAND chris.friesen(a)genband.com www.genband.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Friesen on 11 Aug 2010 13:10 On 08/11/2010 10:19 AM, Rob Donovan wrote: > Hi, > > Not sure it's about read or write 'priority' so much is it? > > I wouldn't want to particularly favour writes over reads either, or it will > just make the problem happen for reads wouldn't it? No, because readers can always share the lock with other readers if there is no writer waiting. If you have one or more readers already holding the lock, with a writer waiting, you have two choices: 1) let the new reader in under the assumption that they'll be quick and won't extend the current "read" usage by much, or 2) block the new reader until after any waiting writers get a chance to get in. The first is called reader priority, the second is writer priority. > And to do this, and make it favour writes, I presume it would have to be > coded into the kernel to do this, there isn't any 'switch' for me to try? The locks are written by glibc and the kernel. I haven't looked at fcntl locking so I'm not sure where the bulk of the code is. I'd suspect in the kernel. > Could we not have it 'fairly' process locks? So that if a read lock comes > along, and there is a write lock waiting for another read lock to unlock, > then the 2nd read has to wait for the write lock. Not particularly because > the write lock has priority, but because it was requested after the write > lock was. The behaviour you describe is called "writer priority". > To me, it seems like there needs to be something in the fcntl() routines so > that when a lock is called with F_SETLKW, if it gets blocked then it needs > to put its 'request' in some kind of queue, so that if any more reads come > along, they know there is already a lock waiting to get the lock before it, > so they queue up behind it. Again, this would be implementing writer priority. POSIX doesn't guarantee either form, so if you need a writer-priority lock then fcntl() isn't a good choice. In fact in most cases I suspect you'll find that read/write locks are implemented as reader priority since the expectation is that writes are infrequent. Chris -- Chris Friesen Software Developer GENBAND chris.friesen(a)genband.com www.genband.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|
Pages: 1 Prev: drivers/net/bnx2x: Adjust confusing if indentation Next: [GIT PULL] OMFS tree for 2.6.36 |