From: Aragorn on 21 Jan 2010 07:05 On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody identifying as David Brown wrote... > Aragorn wrote: > >> On Wednesday 20 January 2010 15:48 in comp.os.linux.misc, somebody >> identifying as David Brown wrote... > > <snip to save a little space> Yeah, these posts themselves are getting quite long, but at least, it's one of those rare threads in which the conversation continues on-topic. :-) Quite honestly, I'm enjoying this thread, because I get to hear interesting feedback - and I think you do to, from your point of view - and I have a feeling that Rahul, the OP, is sitting there enjoying himself over all the valid arguments being discussed here in the debate over various RAID types. ;-) This is a good thread, and I recommend that any lurking newbies would save the posts for later reference in the event that they are faced with the decision on whether and how to implement RAID on one of their machines. Newbies, heads up! :p >>> The zeroth rule, which is often forgotten (until you learn the hard >>> way!), is "thou shalt make a plan for restoring from backups, test >>> that plan, document that plan, and find a way to ensure that all >>> backups are tested and restoreable in this way". /Then/ you can >>> start making your actual backups! >> >> Well, so far I've always used the tested and tried approach of >> tar'ing in conjunction with bzip2. Can't get any cleaner than >> that. ;-) > > rsync copying is even cleaner - the backup copy is directly > accessible. And when combined with hard link copies in some way (such > as rsnapshot) you can get snapshots. I have seen this method being discussed before, but to be honest I've never even looked into "rsnapshot". I do intend to explore it for the future, since the ability to make incremental backups seems very interesting. So far I have always made either data backups only - and on occasion, backups of important directories such as "/etc" - or complete filesystem backups, but never incremental backups. For IRC logs - I run an IRC server (which is currently inactive - see farther down) and I log the channels I'm in - I normally use "zip" every month, and then erase the logs themselves. This is not an incremental approach, of course. My reason for using "zip" rather than "tar" for IRC logs is that my colleagues run Windoze and so their options are limited. ;-) > Of course, .tar.bz2 is good too - /if/ you have it automated so that > it is actually done (or you are one of these rare people that can > regularly follow a manual procedure). To be honest, so far I've been doing that manually, but like I said, my approach is rather amateuristic, in the sense that it's not a systematic approach. But then again, so far the risk was rather limited because I only needed to save my own files. On the hosting server we used - which is now no longer operational as such - the hosting software itself made regular backups of the domains, but using the ".tar.bz2" approach. I'm not sure whether there was anything incremental about the backups as it was my colleague who occupied himself with the management of that machine - it was located at his home. > It also needs to be saved in a safe and reliable place - many people > have had regular backups saved to tape only to find later that the > tapes were unreadable. That is always a risk, just as it was with the old music cassette tapes. Magnetic storage is actually not advised for backups. > And of course it needs to be saved again, in a different place and > stored at a different site. That would indeed be the best approach. Like I said in my previous post, I use Iomega REV disks for backups to which I want to have immediate access, but I also forgot to mention that I back up stuff to DVDs, and I use DVD+RW media for that, since they tend to be of higher quality than DVD-/+R - likewise I prefer CD-RW over CD-R - and the advantage of optical storage is that it is the better choice in the event of magnetic corruption, which you *can* and eventually *do* get on tape drives. Hard disks are relatively cheap these days - at least, if we're talking about consumergrade SATA disks - and they are magnetically better than tapes, in the sense that the magnetic coating on the platters is more time-resilient than with tape drives. On the other hand, hard disks contain lots of moving components and if a hard disk fails - and here we go again - you lose all your data, unless you have a RAID set-up. So one can use hard disks for backups - it's fast, reasonably affordable and reasonably reliable, but it's not the final solution. If one stores one's backups on hard disks, then one needs to make backups of those backups on another kind of media. My advice would therefore be to make redundant backups on different types of media. Optical media are ideal in terms of the fact that they are not susceptible to electromagnetic interference, but they might in turn have other issues - especially older CDs and DVDs - since storage there is in fact mechanical, i.e. the data is stored via physical indentations in a kind of resin, made by a fairly high-powered laser. And some readers will not accept media that were burned using other CD/DVD writers. This is becoming more rare these days, but the problem still exists. > I know I'm preaching to the choir here, as you said before - but there > may be others in the congregation. Indeed, and people tend "not to care" until they burn their fingers. So we can't stress this enough. >>> And the second rule is "thou shalt make backups of your backups", >>> followed by "thou shalt have backups of critical hardware". (That's >>> another bonus of software raid - if your hardware raid card dies, >>> you may have to replace it with exactly the same type of card to get >>> your raid working again - with mdadm raid, you can use any PC.) >> >> Well, considering that my Big Machine has drained my piggy bank for >> about 17'000 Euros worth of hardware, having a duplicate machine is >> not really an option. The piggy bank's on a diet now. :-) >> > > You don't need a duplicate machine - you just need duplicates of any > parts that are important, specific, and may not always been easily > available. Well, just about everything in that machine is very expensive. And on the other hand, I did have another server here - which was malfunctioning but which has been repaired now - so I might as well put that one to use as a back-up machine in the event that my main machine would fail somehow - something which I am not looking forward to, of course! ;-) I also can't use the Xen live migration approach, because I intend to set up my main machine with 64-bit software, while the other server is a strictly 32-bit machine. But redundancy - i.e. a duplicate set-up of the main servers - should be adequate enough for my purposes. The other machine uses Ultra 320 SCSI drives, and I have a small stack of those lying around, as well as a couple of Ultra 160s, which can also be hooked up to the same RAID card. > There is no need to buy a new machine, but as soon as your particular > choice of hardware raid cards start going out of fashion, buy > a spare. Better still, buy a spare /now/ before the manufacturer > decides to update the firmware in new versions of the card and they > become incompatible with your raid drives. Of course, you can always > restore from backup in an emergency if the worst happens. Well, considering that this is an entirely private project and that there is no real risk involved in downtime - not that I don't care about downtime - I think I've got it all sufficiently covered. >> I'm not sure on the one on my SAS RAID adapter, but I think it's an >> Intel RISC processor. It's not a MIPS or an Alpha, that much I am >> certain of. > > Intel haven't made RISC processors for many years (discounting the > Itanium, which is an unlikely choice for a raid processor). The Itanium is not a RISC processor, it's a CISC. It's just not an x86. ;-) > They used to have StrongArms, and long, long ago they had a few other > designs, but I'm pretty certain you don't have an Intel RISC processor > on the card. It also will not be an Alpha - they have not been made > for years either (they were very nice chips until DEC, then HP+Compaq > totally screwed them up, with plenty of encouragement from Intel). > Realistic cores include MIPS in many flavours, PPC, and for more > recent designs, perhaps an ARM of some kind. If the heavy lifting is > being done by ASIC logic rather than the processor core, there is a > wider choice of possible cores. Apparently it's an Intel 80333 processor, clocked at 800 MHz. Hmm, I don't know whether that's a RISC processor; I've never heard of it before, actually. This is my RAID adapter card... http://www.adaptec.com/en-US/products/Controllers/Hardware/sas/value/SAS-31205/ >>>>>> This is quite a common distinction, mind you. There is even a >>>>>> "live spare" solution, but to my knowledge this is specific to >>>>>> Adaptec - they call it RAID 5E. >>>>>> >>>>>> In a "live spare" scenario, the spare disk is not used as such >>>>>> but is part of the live array, and both data and parity blocks >>>>>> are being written to it, but with the distinction that each disk >>>>>> in the array will also have empty blocks for the total capacity >>>>>> of a standard spare disk. These empty blocks are thus >>>>>> distributed across all disks in the array and are used for array >>>>>> reconstruction in the event of a disk failure. >>>>> >>>>> Is there any real advantage of such a setup compared to using raid >>>>> 6 (in which case, the "empty" blocks are second parity blocks)? >>>>> There would be a slightly greater write overhead (especially for >>>>> small writes), but that would not be seen by the host if there is >>>>> enough cache on the controller. >>>> >>>> Well, the advantage of this set-up is that you don't need to >>>> replace a failing disk, since there is already sufficient diskspace >>>> left blank on all disks in the array, and so the array can recreate >>>> itself using that extra blank diskspace. This is of course all >>>> nice in theory, but in practice one would eventually replace the >>>> disk anyway. >>> >>> The same is true of raid6 - if one disk dies, the degraded raid6 is >>> very similar to raid5 until you replace the disk. >>> >>> And I still don't see any significant advantage of spreading the >>> wholes around the drives rather than having them all on the one >>> drive (i.e., a normal hot spare). The rebuild still has to do as >>> many reads and writes, and takes as long. The rebuild writes will >>> be spread over all the disks rather than just on the one disk, but I >>> can't see any advantage in that. >> >> Well, the idea is simply to give the spare disk some exercise, i.e. >> to use it as part of the live array while still offering the extra >> redundancy of a spare. So in the event of a failure, the array can >> be fully rebuilt without the need to replace the broken drive, as >> opposed to that the array would stay in degraded mode until the >> broken drive is replaced. > > The array will be in degraded mode while the rebuild is being done, > just like if it were raid5 with a hot spare - and it will be equally > slow during the rebuild. So no points there. Well, it's not really something that - at least, in my impression - is advised as "a particular RAID solution", but rather as "a nice extension to RAID 5". > In fact, according to wikipedia, the controller will "compact" the > degraded raid set into a normal raid5, and when you replace the broken > drive it will "uncompact" it into the raid 5E arrangement again. The > "compact" and "uncompact" operations take much longer than a standard > raid5 rebuild. > > So all you get here is a marginal increase in the parallelisation of > multiple simultaneous small reads, which you could get anyway with > raid6 rather than raid5 with a spare. Well, yes, but the idea of RAID 5E is merely that you can have a RAID 5 with the extra disk being part of the array so as to spread the wear. I know it's not of much use, but we began speaking of this with regards to the terms "standby spare", "hot spare" and "live spare". ;-) >>> If you want more redundancy, you can use double mirrors for 33% disk >>> space and still have full speed. >> >> Yes, but that's a set-up which, due to understandable financial >> considerations, would be reserved only for the corporate world. Many >> people already consider me certifiably insane for having spent that >> much money - 17'000 Euro, as I wrote higher up - on a privately owned >> computer system. But then again, for the intended purposes, I need >> fast and reliable hardware and a lot of horsepower. :-) > > I'm curious - what is the intended purpose? I think I would have a > hard job spending more than about three or four thousand Euros on a > single system. Well, okay, here goes... It's intended to be a kind of "mainframe" - which is what I call it on occasion when referring to that machine among the other machines I own. I have had this machine over at my place for two years already, but I still needed a few extra hardware components - I want things pristine before I begin my set-up so as to exclude nasty surprises with changes to the hardware afterwards - and the person who was supposed to deliver this hardware to me pulled a no-show on me. At first he kept on stonewalling me - and, oh irony, I've been there before with another hardware vendor - and eventually he wouldn't even return my phone calls (to his voicemail) or my e-mails. So eventually I directly contacted the people who had actually built the machine, and for whom the other person was the mediator. These people also needed a lot of time to get all the extra components, but eventually they did, and the machine was delivered at my home again two days ago now, so I can begin the installation over the weekend. As for the hardware, it's a Tyan Thunder n6650W (S2915) motherboard - the original one, not the revised one - which is a twin-socket ccNUMA board for AMD Opterons. There are two 2218HE Opterons installed - dualcore, 68 Watt, 2.6 GHz. The motherboard has eight DIMM sockets (as two nodes of four DIMM sockets each), all of which are populated with ATP 4 GB ECC registered DDR-2 pc5300 modules, making for a total of 32 GB of RAM, or if you will, two 16 GB ccNUMA nodes. I've already shown you what RAID adapter card is installed, and this adapter connects to eight hard disks, four of which are 147 GB 15k Hitachi disks mounted in a "hidden" drive cage and to be used for the main system, and the four others being 1 TB 7.2k Western Digital RAID Edition SATA-2 disks, mounted in an IcyDock hotswap backplane drive cage. There is a Plextor PX810-SA SATA double layer DVD writer and no floppy drive. The motherboad also has a non-RAID on-board SAS controller (which I've disabled in the BIOS) and a Firewire controller. The original PSU was a CoolerMaster EPS12V 800 Watt, but considering the extra drives and certain negative reviews of that CoolerMaster PSU under heavy load, I have had it replaced now with a Zippy 1200 Watt EPS12V PSU. The chassis is a CoolerMaster CM832 Stacker, which is not the more commonly known Stacker but a model that now still only exists as the black-and-green "nVidia Edition" model. Mine is completely black, however. There are two videocards installed. One is an older GeCube PCI Radeon 9250 card (with 256 MB), connected to the second channel on one of my two SGI 21" CRT monitors. The other one is an Asus PCIe GeForce 8800 GTS (with 640 MB), connected to the first channel on both SGI monitors. There are also two keyboards and one mouse. One keyboard is connected via PS/2, the other one (and the mouse) via USB. So far the hardware. ;-) Now, as for my intended purposes, I am going to set up this machine with Xen, as I have mentioned earlier. There will be three primary XenLinux virtual machines running on this system, all of which will be Gentoo installations. The three main virtual machines will be set up as follows: (1) The Xen dom0 virtual machine. For those not familiar with Xen, it is a hypervisor that itself normally runs on the bare metal (although it can be nested if the hardware has virtualization extensions) but unlike the more familiar virtual machine monitors like VMWare Workstation/Player or VirtualBox which are commonly used on desktops and laptops, Xen does not have a "host" system. Instead Xen has a "privileged guest", and this is called "dom0", or "domain 0". This virtual machine is privileged because it is from there that one starts and stops the other Xen guests. It is also the system that has direct access to the hardware - i.e. "the driver domain". On my machine, this is the virtual machine that will be using the PCI Radeon card for video output and the PS/2 keyboard for input. It will however not have full access to all the hardware, because - and Xen allows this - the PCIe GeForce card, the soundchip on the motherboard and all USB hubs will be hidden from Xen and from dom0. (2) A workstation virtual machine. This is an unprivileged guest - which in a Xen context is called "domU" - but it will also be a driver domain, i.e. it will have direct access to the GeForce, the soundchip and the USB hubs. It'll boot up to runlevel 3, but it'll have KDE 4.x installed, along with loads of applications. As it has direct access to the USB hubs, it'll also be running a CUPS server for my USB-connected multifunctional device, a Brother MFC-9880. It'll also be running an NFS server for multimedia files. (3) A server virtual machine which I intend to set up - if possible - with an OpenVZ kernel. Again for those who are not familiar with it, OpenVZ is a modified Linux kernel which offers operating system level virtualization. This means that you have one common kernel running multiple, virtualized userspaces, each with their own filesystems and user accounts, and their own "init" set up. I am not sure yet whether I will be hiding the second Gbit Ethernet adapter on the motherboard from dom0 and have this server domU access it directly, or whether I will simply have this domU connect to the dom0's Ethernet bridge. The OpenVZ system will be running several isolated userspaces - which are called "zones", just as in (Open)Solaris - one of which I intend to set up as the sole system from which /ssh/ login from the internet is allowed, and doing nothing else. The idea is that access to any other machine in the network - physical or virtual - must pass through this one virtual machine, making it harder for an eventual black hat to do any damage. Then, there will also be a generic "zone" for a DNS server and one, possibly two websites, and one, possibly two mailservers. Lastly, another "zone" will be running an IRC server and an IRC services package, possibly also with a few eggdrops. Systems (1) and (2) will be installed on the SAS disks, which are currently set up as a RAID 5, but which I am now going to set up as a RAID 10. System (3) itself will be installed on the same array as well whereas the privileged userspace and the "ssh honeypot" are concerned. The other "zones" will be installed on the SATA-2 array - currently also set up as RAID 5 but also to be converted to RAID 10 - together with the NFS share exported by system (2) and an additional volume for backups. These backups will then be backed up themselves to the other physical server - i.e. the 32-bit dual Xeon machine - as well as to DVDs and REV disks. As for the IRC part, I'll try to cut a very, very long story short... A number of years ago - in July 2002, to be precise - I was part of a group of people who started a new and small IRC network. Actually, it all started when we decided to take over an existing but dying IRC network in order to save it, but that's a whole other story. Over the years, people came and went in our team - and as our team was quite large, there were also a number of intrigues and hidden agendas going on, resulting in some people getting fired on the spot - and we also experienced a number of difficulties with hosting solutions - primarily, having to pay too much money for too poor a service - and so a little over three years ago, the remaining team members decided jointly that it would be more cost-effective if we started self-hosting our domain. We obtained a few second-hand servers and regular consumergrade PCs via eBay and some SCSI disks, and we set the whole thing up on an entry-level professional ADSL connection, all housed at the home of one guy of our team, who was and still is living at his parents' house. We also made up a contract that each of us would pay a monthly contribution for the rent of the ADSL connection and electricity, with a small margin for unexpected expenses. So far so good, but already right from the beginning, one of us squirmed his way out of having to pay monthly contributions, and then some ego clashes occurred within the team - both the guy at whose home the servers are set-up and another team member who was his best buddy are what you could consider "socially dysfunctional" - resulting in the loss of virtually all our users. To cut /that/ story short as well, the guy who was running the servers at his parents' home set up a shadow network behind my back (and on a machine of his own to which I had no /ssh/ access) and moved over all our users to that other domain. I only found out about it because one of our users was confused over the two different domains and came to ask me why we had two IRC networks which were not linked to one another. The guy who set up that shadow network did however stay true to the contract and kept the servers up and running, contributed financially to the costs for the domain, and even still offered some technical support for when things went bad - it's old hardware, and every once in a while something breaks down and needs to be replaced. He also meticulously kept the accounting up to date in terms of contributions and expenses. Then, as our contract was drawn up for an effective term of three years - since that was the minimum rental term for the "businessgrade" ADSL connection - and as this contract was about to end (on November 1st 2009), the guy sent an e-mail to our mailing list - sufficiently in advance - that he had decided to step out of the IRC team at the expiry date of the contract, but that he would help those who were still interested in moving the domain over, and that he would still keep the servers running until that day. So far he's still keeping the IRC server up until I've set everything up myself, but the mail- and webservers are down. So at present, the IRC network that we had jointly started in 2002 is now in suspended animation, with only one or two users (apart from the other guy and myself) still regularly connecting, and a bunch of people who seek to leech MP3s and pr0n - both of which are not to be found on our network because for legal reasons we have decided to ban public filesharing. The fines for copyright infringement or illegal "warez" distribution over here are quite high, and I'm not prepared to go to jail over something that stupid. I'm not sure how I am going to revive the IRC network again - and it will be a network again (as opposed to a single server) because one of our old users and a girl who was on my team have both offered to set up a server and link it to my new server - but I feel that it would be a shame to give up on something that I have co-founded now eight years ago and of which I have all that time been the chairman. (I was elected chairman from the start and when someone challenged my position and demanded re-elections one year later - as he wanted a shot at the position - I was, with the exception of by that one person, unanimously re-elected as chairman.) So there will eventually be three servers on the new network (plus the IRC services, which are considered a separate server by the IRCd software). My now ex-colleage at whose place the main server is at present still running did however overdo it a bit in terms of the required set-up, hardwarewise. As I wrote higher up, it was an entry-level businessgrade ADSL connection with eight public IP addresses. Way too much, but the guy's an IT maniac and even more so than I am. He's also a lot younger and still lacks some wisdom in terms of spending. So I am simply going to convert my residential cable internet connection to what they call an "Office Line" over here, i.e. a single static IP address via cable, requiring no extra hardware (as the cable modem can handle the higher speeds) and a larger threshold for the traffic volume, with (non-guaranteed) down/up speeds of 20 Mb/sec and 10 Mb/sec respectively. I have a simple Linksys WRT45GL router now with the standard firmware - which is Linux, by the way ;-) - and it'll do well enough to do port forwarding to the respective virtual machines. Additional firewalling can be done via /iptables/ on the respective virtual machines. So there you have it. Not quite as short a description as I had announced higher up, but then again, you wanted to know. :-) >> In the event of the OP on the other hand, 45 SAS disks of 300 GB each >> and three SAS RAID storage enclosures also doesn't seem like quite an >> affordable buy, so I take it he intends to use it for a business. > > It also does not strike me as a high value-for-money system - I can't > help feeling that this is way more bandwidth than you could actually > make use of in the rest of the system, so it would be better to have > fewer larger drives and less layers to reduce the latencies. Spent > the cash saved on even more ram :-) Well, what I personally find overkill in this is that he intends to use the entire array only for the "/home" filesystem. That seems like an awful waste of some great resources that I personally would put to use more efficiently - e.g. you could have the entire "/var" tree on it, and an additional "/srv" tree. Of course, a lot depends on the software. As I have come to experience myself, lots of hosting software parks all the domains under "/home" instead of under "/var" or "/srv". In fact, one could say that on a general note, the implementation of "/srv" in just about every GNU/Linux distribution is abominable. Some distros create a "/srv" dir at install time but that's about as far as it goes. All the packages are still configured to use "/var" for websites and FTP repositories - which I suppose you could circumvent through symlinks - but like I said, most hosting software typically parks everything under "/home". > 45 disks at a throughput of say 75 MBps each gives about 3.3 GBps - > say 3 GBps since some are hot spares. Ultimately, being a server, > this is going to be pumped out on Ethernet links. That's a lot of > bandwidth - it would effectively saturate four 10 Gbit links. Well, since he talks of a high performance computing set-up, I would imagine that he has plenty of 10 Gbit links at his disposal, or possibly something a lot faster still. ;-) > I have absolutely no real-world experience with these sorts of > systems, and could therefore be totally wrong, but my gut feeling is > that the theoretical numbers will not scale with so many drives - > something like 15 1 TB SATA drives would be similar in speed in > practice. No real world experience with that sort of thing here either, but like I said, in my opinion using 45 disks - or perhaps 42 if he keeps three hot spares - for a single "/home" filesystem does seem like overkill to me, and yes, there is the bandwidth issue too. >> I have been looking into NexentaOS (i.e. GNU/kOpenSolaris) for a >> while, which uses ZFS, albeit that ZFS was not my reason for being >> interested in the project. I was more interested in the fact that it >> supports both Solaris Zones - of which the Linux equivalents are >> OpenVZ and VServer - and running paravirtualized on top of Xen. >> >> [...] >> The big problem with NexentaOS however is that it's based on Ubuntu >> and that it uses binary .deb packages, whereas I would rather have a >> Gentoo approach, where you can build the whole thing from sources >> without having to go "the LFS way". > > Why is it always so hard to get /everything/ you want when building a > system :-( True... Putting a binary "one size fits all"-optimized distribution on an unimportant PC or laptop is okay by me, but for a system so specialized and geared for performance as the one I have, I want everything to be optimized for the underlying hardware, and I also don't need or want all those typical "Windoze-style desktop optimizations" most distribution vendors now build into their systems. Gentoo is far from ideal - given some issues over at the Gentoo Foundation itself and the fact that the developers seem mostly occupied with discussing how cool they think they are, rather than to actually do something sensible, and they've also started to implement a few defaults of which they themselves say that these are not the best choices but that they are the choices of which they think most users will opt for them - but at least the basic premise is still there, i.e. you do build it from sources, and as such you have more control over how the resulting system will be set-up, both in terms of hardware optimizations and in terms of software interoperability. -- *Aragorn* (registered GNU/Linux user #223157)
From: unruh on 21 Jan 2010 12:32 On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote: > On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody > identifying as David Brown wrote... > >> >> rsync copying is even cleaner - the backup copy is directly >> accessible. And when combined with hard link copies in some way (such >> as rsnapshot) you can get snapshots. > > I have seen this method being discussed before, but to be honest I've > never even looked into "rsnapshot". I do intend to explore it for the > future, since the ability to make incremental backups seems very > interesting. It is actually far better than that. EAch of the backups is a complete backup. Ie, you do not have to restore sequentially (full backup and then each of the incrementals to get you back). What rsnapshot ( rsync) does is to use hard links to store stuff that does not change, so there is only one copy of those files, and then puts in new copies of stuff which has changed. It that has the advantage of incremental, but also the advantage of a full backup in that each backup really is a full backup. It of course has the disadvantage of incremental that there is only one copy of files which have not changed and if something alters that copy all copies are altered. > > So far I have always made either data backups only - and on occasion, > backups of important directories such as "/etc" - or complete > filesystem backups, but never incremental backups. For IRC logs - I > run an IRC server (which is currently inactive - see farther down) and > I log the channels I'm in - I normally use "zip" every month, and then > erase the logs themselves. This is not an incremental approach, of > course. > > My reason for using "zip" rather than "tar" for IRC logs is that my > colleagues run Windoze and so their options are limited. ;-) > >> Of course, .tar.bz2 is good too - /if/ you have it automated so that >> it is actually done (or you are one of these rare people that can >> regularly follow a manual procedure). > > To be honest, so far I've been doing that manually, but like I said, my > approach is rather amateuristic, in the sense that it's not a > systematic approach. But then again, so far the risk was rather > limited because I only needed to save my own files. rsnapshot is easily automated. You ALWAYS find that if something goes wrong, it does so when you forgot to make backups for 3 months.
From: Aragorn on 21 Jan 2010 12:47 On Thursday 21 January 2010 18:32 in comp.os.linux.misc, somebody identifying as unruh wrote... > On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote: > >> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody >> identifying as David Brown wrote... >> >>> rsync copying is even cleaner - the backup copy is directly >>> accessible. And when combined with hard link copies in some way >>> (such as rsnapshot) you can get snapshots. >> >> I have seen this method being discussed before, but to be honest I've >> never even looked into "rsnapshot". I do intend to explore it for >> the future, since the ability to make incremental backups seems very >> interesting. > > It is actually far better than that. EAch of the backups is a complete > backup. Ie, you do not have to restore sequentially (full backup and > then each of the incrementals to get you back). What rsnapshot ( > rsync) does is to use hard links to store stuff that does not change, > so there is only one copy of those files, and then puts in new copies > of stuff which has changed. But I am curious... What exactly happens at the filesystem level if a hard link is used for any given file when the filesystem holding the backups is a different filesystem than the source? Wouldn't it make a copy then anyway? > It that has the advantage of incremental, but also the advantage of a > full backup in that each backup really is a full backup. It of course > has the disadvantage of incremental that there is only one copy of > files which have not changed and if something alters that copy all > copies are altered. Ah yes, I can see how that would of course not be desirable. Imagine you terribly screw up a file and then its copies in the backups will be screwed up as well. Hmm... >>> Of course, .tar.bz2 is good too - /if/ you have it automated so that >>> it is actually done (or you are one of these rare people that can >>> regularly follow a manual procedure). >> >> To be honest, so far I've been doing that manually, but like I said, >> my approach is rather amateuristic, in the sense that it's not a >> systematic approach. But then again, so far the risk was rather >> limited because I only needed to save my own files. > > rsnapshot is easily automated. You ALWAYS find that if something goes > wrong, it does so when you forgot to make backups for 3 months. Well, I'll be looking into it more closely once I'll have my new machine set up... which is a *lot* of work... 8-) -- *Aragorn* (registered GNU/Linux user #223157)
From: unruh on 21 Jan 2010 13:36 On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote: > On Thursday 21 January 2010 18:32 in comp.os.linux.misc, somebody > identifying as unruh wrote... > >> On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote: >> >>> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody >>> identifying as David Brown wrote... >>> >>>> rsync copying is even cleaner - the backup copy is directly >>>> accessible. And when combined with hard link copies in some way >>>> (such as rsnapshot) you can get snapshots. >>> >>> I have seen this method being discussed before, but to be honest I've >>> never even looked into "rsnapshot". I do intend to explore it for >>> the future, since the ability to make incremental backups seems very >>> interesting. >> >> It is actually far better than that. EAch of the backups is a complete >> backup. Ie, you do not have to restore sequentially (full backup and >> then each of the incrementals to get you back). What rsnapshot ( >> rsync) does is to use hard links to store stuff that does not change, >> so there is only one copy of those files, and then puts in new copies >> of stuff which has changed. > > But I am curious... What exactly happens at the filesystem level if a > hard link is used for any given file when the filesystem holding the > backups is a different filesystem than the source? Wouldn't it make a > copy then anyway? The hard links are on the backups, not between the backups and the source, -- that is not a backup at all. Ie, if you have 3 backups with files f1, f2 and f3. file f2 changed between backup 1 and 2 and then stayed the same, f3 was added after backup 2. B1 will have f1 and f2. B2 will have f1 hard liked with the B1 version of f1, and will have the different ( new) f2'. B3 willhave F1 and F2 hard liked to the B2 versions of f1 and f2, and will have the additional file f3. Ie, each backup has the full complement of files for its backup, and yo ucan restore then completely just from those backups. rsync -av B2/ /home/tipple/ will restore all of the files to tipple that were there whan backu B2 was made, and no reference to B1 or B3 need be made. In fact if you do rm -r B1, this will make no difference to B2. However the storage space requirement will be that for only one copy of f1, two different copies of f2 ( the old B1 and the new B2 version) and one copy of f3, instead of 3 copies of f1, three of f2 and one of f3. You of course will at all times have a separate copy of the files in the orginal. > >> It that has the advantage of incremental, but also the advantage of a >> full backup in that each backup really is a full backup. It of course >> has the disadvantage of incremental that there is only one copy of >> files which have not changed and if something alters that copy all >> copies are altered. > > Ah yes, I can see how that would of course not be desirable. Imagine > you terribly screw up a file and then its copies in the backups will be > screwed up as well. Hmm... NONONONONO. There is no link from the files to the backups, only amongst the backups. If you screw up the original, you just replace it. The problem comes if you happenen to for some reason go into say B2 and edit the file f1 in B2 ( what the hell your are doing editing a backup file I do not know-- maybe you are a prime minister trying to alter the emails you sent your mistress), then all three backups B1/f1, B2/f1 and B3/f1 will be altered. However if you erase say B1/f1 then you will still have the complete B2/f1 and B3/f1 (that is how hard links work. > >>>> Of course, .tar.bz2 is good too - /if/ you have it automated so that >>>> it is actually done (or you are one of these rare people that can >>>> regularly follow a manual procedure). >>> >>> To be honest, so far I've been doing that manually, but like I said, >>> my approach is rather amateuristic, in the sense that it's not a >>> systematic approach. But then again, so far the risk was rather >>> limited because I only needed to save my own files. >> >> rsnapshot is easily automated. You ALWAYS find that if something goes >> wrong, it does so when you forgot to make backups for 3 months. > > Well, I'll be looking into it more closely once I'll have my new machine > set up... which is a *lot* of work... 8-) So reduce the work of making backups.
From: Aragorn on 21 Jan 2010 14:08
On Thursday 21 January 2010 19:36 in comp.os.linux.misc, somebody identifying as unruh wrote... > On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote: >> On Thursday 21 January 2010 18:32 in comp.os.linux.misc, somebody >> identifying as unruh wrote... >> >>> On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote: >>> >>>> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody >>>> identifying as David Brown wrote... >>>> >>>>> rsync copying is even cleaner - the backup copy is directly >>>>> accessible. And when combined with hard link copies in some way >>>>> (such as rsnapshot) you can get snapshots. >>>> >>>> I have seen this method being discussed before, but to be honest >>>> I've never even looked into "rsnapshot". I do intend to explore it >>>> for the future, since the ability to make incremental backups seems >>>> very interesting. >>> >>> It is actually far better than that. EAch of the backups is a >>> complete backup. Ie, you do not have to restore sequentially (full >>> backup and then each of the incrementals to get you back). What >>> rsnapshot ( rsync) does is to use hard links to store stuff that >>> does not change, so there is only one copy of those files, and then >>> puts in new copies of stuff which has changed. >> >> But I am curious... What exactly happens at the filesystem level if >> a hard link is used for any given file when the filesystem holding >> the backups is a different filesystem than the source? Wouldn't it >> make a copy then anyway? > > The hard links are on the backups, not between the backups and the > source, -- that is not a backup at all. Ie, if you have 3 backups with > files f1, f2 and f3. file f2 changed between backup 1 and 2 and then > stayed the same, f3 was added after backup 2. > > B1 will have f1 and f2. B2 will have f1 hard liked with the B1 version > of f1, and will have the different ( new) f2'. B3 willhave F1 and F2 > hard liked to the B2 versions of f1 and f2, and will have the > additional file f3. Ahh, okay, I get it now. ;-) >>> It that has the advantage of incremental, but also the advantage of >>> a full backup in that each backup really is a full backup. It of >>> course has the disadvantage of incremental that there is only one >>> copy of files which have not changed and if something alters that >>> copy all copies are altered. >> >> Ah yes, I can see how that would of course not be desirable. Imagine >> you terribly screw up a file and then its copies in the backups will >> be screwed up as well. Hmm... > > NONONONONO. There is no link from the files to the backups, only > amongst the backups. > If you screw up the original, you just replace it. Okay, I see what you mean. ;-) > The problem comes if you happenen to for some reason go into say B2 > and edit the file f1 in B2 ( what the hell your are doing editing a > backup file I do not know-- maybe you are a prime minister trying to > alter the emails you sent your mistress), [... Well, that seems like a fashionable thing to do these days, but then again, I've never been that fashionable. :p > ...] then all three backups B1/f1, B2/f1 and B3/f1 will be altered. > However if you erase say B1/f1 then you will still have the complete > B2/f1 and B3/f1 (that is how hard links work. Yes, I know what a hard link is. It just seemed weird to me - in my orignal misunderstanding of your explanation - that one could create a hard link across filesystem boundaries. ;-) >>> rsnapshot is easily automated. You ALWAYS find that if something >>> goes wrong, it does so when you forgot to make backups for 3 months. >> >> Well, I'll be looking into it more closely once I'll have my new >> machine set up... which is a *lot* of work... 8-) > > So reduce the work of making backups. Well, you've convinced me in favor of /rsnapshot/ so I will give that a closer look. ;-) -- *Aragorn* (registered GNU/Linux user #223157) |