Prev: Delete empty files in the current directory but not the ones inthe sub-directories of current directory.
Next: Comm V/s XML data
From: T on 12 Feb 2010 09:13 Greetings: We have two Redhat linux boxes, one is a 5.2 system with GFS1 for a file system. The other is a 5.4 with EXT3 for a file system. I've setup the 5.2 system as a rsync server ( that's where the source data is) and it has between 700 to 800 Gbytes of data, and hundreds of thousands of files. I'm using the following command line from the client: time rsync --inplace --progress --stats -aPh rsync://root(a)src_server/accurev_d1/mnt/accurev_d1 I should also note I'm using rsync 3.0.7: # rsync --version rsync version 3.0.7 protocol version 30 It takes more than 8 hours to sync this file system, but only 7G of data is transferred. Here's the stats: Number of files: 3349997 Number of files transferred: 32815 Total file size: 760.00G bytes Total transferred file size: 61.58G bytes Literal data: 6.96G bytes Matched data: 54.74G bytes File list size: 56.84M File list generation time: 0.233 seconds File list transfer time: 0.000 seconds Total bytes sent: 7.43M Total bytes received: 7.02G sent 7.43M bytes received 7.02G bytes 222.83K bytes/sec total size is 760.00G speedup is 108.16 rsync warning: some files vanished before they could be transferred (code 24) at main.c(1508) [generator=3.0.7] real 525m33.328s user 8m25.662s sys 1m38.190s The question: Is there any way to speed this up? Am I doing something wrong? Have I some how mis-configured the server? Is this what should be expected? Thanks for any help in advanced. Tom
From: Kevin Collins on 12 Feb 2010 16:12
On 2010-02-12, T <g4173c(a)motorola.com> wrote: > Greetings: > > We have two Redhat linux boxes, one is a 5.2 system with GFS1 for a > file system. The other is a 5.4 with EXT3 for a file system. I've > setup the 5.2 system as a rsync server ( that's where the > source data is) and it has between 700 to 800 Gbytes of data, and > hundreds of thousands of files. I'm using the following command line > from the client: > > time rsync --inplace --progress --stats -aPh rsync://root(a)src_server/accurev_d1/mnt/accurev_d1 > > I should also note I'm using rsync 3.0.7: > > # rsync --version > rsync version 3.0.7 protocol version 30 > > It takes more than 8 hours to sync this file system, but only 7G of > data is transferred. Here's the stats: > > Number of files: 3349997 > Number of files transferred: 32815 > Total file size: 760.00G bytes > Total transferred file size: 61.58G bytes > Literal data: 6.96G bytes > Matched data: 54.74G bytes > File list size: 56.84M > File list generation time: 0.233 seconds > File list transfer time: 0.000 seconds > Total bytes sent: 7.43M > Total bytes received: 7.02G > sent 7.43M bytes received 7.02G bytes 222.83K bytes/sec > total size is 760.00G speedup is 108.16 > rsync warning: some files vanished before they could be transferred > (code 24) at main.c(1508) [generator=3.0.7] > > real 525m33.328s > user 8m25.662s > sys 1m38.190s > > The question: Is there any way to speed this up? Am I doing something > wrong? Have I some how mis-configured the server? Is this what should > be expected? The rsync will need to (at minimum) do a stat() of each file - this will require GFS locking to happen for each one, and you have 3.3 million! My (limited) experience plus research I have done shows that can be really slow. GFS takes out a lock on each file (even for stat). How many nodes in the GFS cluster? There is communication and lock checking done for EACH node before the lock is granted. Additionally, the way the rsync protocol works (a kind of block-based check-summing), I would think it could be quite slow to rsync large files from GFS... Hope this helps. Kevin |