From: Aragorn on
On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody
identifying as David Brown wrote...

> Aragorn wrote:
>
>> On Wednesday 20 January 2010 15:48 in comp.os.linux.misc, somebody
>> identifying as David Brown wrote...
>
> <snip to save a little space>

Yeah, these posts themselves are getting quite long, but at least, it's
one of those rare threads in which the conversation continues
on-topic. :-)

Quite honestly, I'm enjoying this thread, because I get to hear
interesting feedback - and I think you do to, from your point of view -
and I have a feeling that Rahul, the OP, is sitting there enjoying
himself over all the valid arguments being discussed here in the debate
over various RAID types. ;-)

This is a good thread, and I recommend that any lurking newbies would
save the posts for later reference in the event that they are faced
with the decision on whether and how to implement RAID on one of their
machines. Newbies, heads up! :p

>>> The zeroth rule, which is often forgotten (until you learn the hard
>>> way!), is "thou shalt make a plan for restoring from backups, test
>>> that plan, document that plan, and find a way to ensure that all
>>> backups are tested and restoreable in this way". /Then/ you can
>>> start making your actual backups!
>>
>> Well, so far I've always used the tested and tried approach of
>> tar'ing in conjunction with bzip2. Can't get any cleaner than
>> that. ;-)
>
> rsync copying is even cleaner - the backup copy is directly
> accessible. And when combined with hard link copies in some way (such
> as rsnapshot) you can get snapshots.

I have seen this method being discussed before, but to be honest I've
never even looked into "rsnapshot". I do intend to explore it for the
future, since the ability to make incremental backups seems very
interesting.

So far I have always made either data backups only - and on occasion,
backups of important directories such as "/etc" - or complete
filesystem backups, but never incremental backups. For IRC logs - I
run an IRC server (which is currently inactive - see farther down) and
I log the channels I'm in - I normally use "zip" every month, and then
erase the logs themselves. This is not an incremental approach, of
course.

My reason for using "zip" rather than "tar" for IRC logs is that my
colleagues run Windoze and so their options are limited. ;-)

> Of course, .tar.bz2 is good too - /if/ you have it automated so that
> it is actually done (or you are one of these rare people that can
> regularly follow a manual procedure).

To be honest, so far I've been doing that manually, but like I said, my
approach is rather amateuristic, in the sense that it's not a
systematic approach. But then again, so far the risk was rather
limited because I only needed to save my own files.

On the hosting server we used - which is now no longer operational as
such - the hosting software itself made regular backups of the domains,
but using the ".tar.bz2" approach. I'm not sure whether there was
anything incremental about the backups as it was my colleague who
occupied himself with the management of that machine - it was located
at his home.

> It also needs to be saved in a safe and reliable place - many people
> have had regular backups saved to tape only to find later that the
> tapes were unreadable.

That is always a risk, just as it was with the old music cassette tapes.
Magnetic storage is actually not advised for backups.

> And of course it needs to be saved again, in a different place and
> stored at a different site.

That would indeed be the best approach. Like I said in my previous
post, I use Iomega REV disks for backups to which I want to have
immediate access, but I also forgot to mention that I back up stuff to
DVDs, and I use DVD+RW media for that, since they tend to be of higher
quality than DVD-/+R - likewise I prefer CD-RW over CD-R - and the
advantage of optical storage is that it is the better choice in the
event of magnetic corruption, which you *can* and eventually *do* get
on tape drives.

Hard disks are relatively cheap these days - at least, if we're talking
about consumergrade SATA disks - and they are magnetically better than
tapes, in the sense that the magnetic coating on the platters is more
time-resilient than with tape drives. On the other hand, hard disks
contain lots of moving components and if a hard disk fails - and here
we go again - you lose all your data, unless you have a RAID set-up.

So one can use hard disks for backups - it's fast, reasonably affordable
and reasonably reliable, but it's not the final solution. If one
stores one's backups on hard disks, then one needs to make backups of
those backups on another kind of media.

My advice would therefore be to make redundant backups on different
types of media. Optical media are ideal in terms of the fact that they
are not susceptible to electromagnetic interference, but they might in
turn have other issues - especially older CDs and DVDs - since storage
there is in fact mechanical, i.e. the data is stored via physical
indentations in a kind of resin, made by a fairly high-powered laser.
And some readers will not accept media that were burned using other
CD/DVD writers. This is becoming more rare these days, but the problem
still exists.

> I know I'm preaching to the choir here, as you said before - but there
> may be others in the congregation.

Indeed, and people tend "not to care" until they burn their fingers. So
we can't stress this enough.

>>> And the second rule is "thou shalt make backups of your backups",
>>> followed by "thou shalt have backups of critical hardware". (That's
>>> another bonus of software raid - if your hardware raid card dies,
>>> you may have to replace it with exactly the same type of card to get
>>> your raid working again - with mdadm raid, you can use any PC.)
>>
>> Well, considering that my Big Machine has drained my piggy bank for
>> about 17'000 Euros worth of hardware, having a duplicate machine is
>> not really an option. The piggy bank's on a diet now. :-)
>>
>
> You don't need a duplicate machine - you just need duplicates of any
> parts that are important, specific, and may not always been easily
> available.

Well, just about everything in that machine is very expensive. And on
the other hand, I did have another server here - which was
malfunctioning but which has been repaired now - so I might as well put
that one to use as a back-up machine in the event that my main machine
would fail somehow - something which I am not looking forward to, of
course! ;-)

I also can't use the Xen live migration approach, because I intend to
set up my main machine with 64-bit software, while the other server is
a strictly 32-bit machine. But redundancy - i.e. a duplicate set-up of
the main servers - should be adequate enough for my purposes.

The other machine uses Ultra 320 SCSI drives, and I have a small stack
of those lying around, as well as a couple of Ultra 160s, which can
also be hooked up to the same RAID card.

> There is no need to buy a new machine, but as soon as your particular
> choice of hardware raid cards start going out of fashion, buy
> a spare. Better still, buy a spare /now/ before the manufacturer
> decides to update the firmware in new versions of the card and they
> become incompatible with your raid drives. Of course, you can always
> restore from backup in an emergency if the worst happens.

Well, considering that this is an entirely private project and that
there is no real risk involved in downtime - not that I don't care
about downtime - I think I've got it all sufficiently covered.

>> I'm not sure on the one on my SAS RAID adapter, but I think it's an
>> Intel RISC processor. It's not a MIPS or an Alpha, that much I am
>> certain of.
>
> Intel haven't made RISC processors for many years (discounting the
> Itanium, which is an unlikely choice for a raid processor).

The Itanium is not a RISC processor, it's a CISC. It's just not an
x86. ;-)

> They used to have StrongArms, and long, long ago they had a few other
> designs, but I'm pretty certain you don't have an Intel RISC processor
> on the card. It also will not be an Alpha - they have not been made
> for years either (they were very nice chips until DEC, then HP+Compaq
> totally screwed them up, with plenty of encouragement from Intel).
> Realistic cores include MIPS in many flavours, PPC, and for more
> recent designs, perhaps an ARM of some kind. If the heavy lifting is
> being done by ASIC logic rather than the processor core, there is a
> wider choice of possible cores.

Apparently it's an Intel 80333 processor, clocked at 800 MHz. Hmm, I
don't know whether that's a RISC processor; I've never heard of it
before, actually.

This is my RAID adapter card...

http://www.adaptec.com/en-US/products/Controllers/Hardware/sas/value/SAS-31205/

>>>>>> This is quite a common distinction, mind you. There is even a
>>>>>> "live spare" solution, but to my knowledge this is specific to
>>>>>> Adaptec - they call it RAID 5E.
>>>>>>
>>>>>> In a "live spare" scenario, the spare disk is not used as such
>>>>>> but is part of the live array, and both data and parity blocks
>>>>>> are being written to it, but with the distinction that each disk
>>>>>> in the array will also have empty blocks for the total capacity
>>>>>> of a standard spare disk. These empty blocks are thus
>>>>>> distributed across all disks in the array and are used for array
>>>>>> reconstruction in the event of a disk failure.
>>>>>
>>>>> Is there any real advantage of such a setup compared to using raid
>>>>> 6 (in which case, the "empty" blocks are second parity blocks)?
>>>>> There would be a slightly greater write overhead (especially for
>>>>> small writes), but that would not be seen by the host if there is
>>>>> enough cache on the controller.
>>>>
>>>> Well, the advantage of this set-up is that you don't need to
>>>> replace a failing disk, since there is already sufficient diskspace
>>>> left blank on all disks in the array, and so the array can recreate
>>>> itself using that extra blank diskspace. This is of course all
>>>> nice in theory, but in practice one would eventually replace the
>>>> disk anyway.
>>>
>>> The same is true of raid6 - if one disk dies, the degraded raid6 is
>>> very similar to raid5 until you replace the disk.
>>>
>>> And I still don't see any significant advantage of spreading the
>>> wholes around the drives rather than having them all on the one
>>> drive (i.e., a normal hot spare). The rebuild still has to do as
>>> many reads and writes, and takes as long. The rebuild writes will
>>> be spread over all the disks rather than just on the one disk, but I
>>> can't see any advantage in that.
>>
>> Well, the idea is simply to give the spare disk some exercise, i.e.
>> to use it as part of the live array while still offering the extra
>> redundancy of a spare. So in the event of a failure, the array can
>> be fully rebuilt without the need to replace the broken drive, as
>> opposed to that the array would stay in degraded mode until the
>> broken drive is replaced.
>
> The array will be in degraded mode while the rebuild is being done,
> just like if it were raid5 with a hot spare - and it will be equally
> slow during the rebuild. So no points there.

Well, it's not really something that - at least, in my impression - is
advised as "a particular RAID solution", but rather as "a nice
extension to RAID 5".

> In fact, according to wikipedia, the controller will "compact" the
> degraded raid set into a normal raid5, and when you replace the broken
> drive it will "uncompact" it into the raid 5E arrangement again. The
> "compact" and "uncompact" operations take much longer than a standard
> raid5 rebuild.
>
> So all you get here is a marginal increase in the parallelisation of
> multiple simultaneous small reads, which you could get anyway with
> raid6 rather than raid5 with a spare.

Well, yes, but the idea of RAID 5E is merely that you can have a RAID 5
with the extra disk being part of the array so as to spread the wear.
I know it's not of much use, but we began speaking of this with regards
to the terms "standby spare", "hot spare" and "live spare". ;-)

>>> If you want more redundancy, you can use double mirrors for 33% disk
>>> space and still have full speed.
>>
>> Yes, but that's a set-up which, due to understandable financial
>> considerations, would be reserved only for the corporate world. Many
>> people already consider me certifiably insane for having spent that
>> much money - 17'000 Euro, as I wrote higher up - on a privately owned
>> computer system. But then again, for the intended purposes, I need
>> fast and reliable hardware and a lot of horsepower. :-)
>
> I'm curious - what is the intended purpose? I think I would have a
> hard job spending more than about three or four thousand Euros on a
> single system.

Well, okay, here goes... It's intended to be a kind of "mainframe" -
which is what I call it on occasion when referring to that machine
among the other machines I own.

I have had this machine over at my place for two years already, but I
still needed a few extra hardware components - I want things pristine
before I begin my set-up so as to exclude nasty surprises with changes
to the hardware afterwards - and the person who was supposed to deliver
this hardware to me pulled a no-show on me. At first he kept on
stonewalling me - and, oh irony, I've been there before with another
hardware vendor - and eventually he wouldn't even return my phone calls
(to his voicemail) or my e-mails.

So eventually I directly contacted the people who had actually built the
machine, and for whom the other person was the mediator. These people
also needed a lot of time to get all the extra components, but
eventually they did, and the machine was delivered at my home again two
days ago now, so I can begin the installation over the weekend.

As for the hardware, it's a Tyan Thunder n6650W (S2915) motherboard -
the original one, not the revised one - which is a twin-socket ccNUMA
board for AMD Opterons. There are two 2218HE Opterons installed -
dualcore, 68 Watt, 2.6 GHz. The motherboard has eight DIMM sockets (as
two nodes of four DIMM sockets each), all of which are populated with
ATP 4 GB ECC registered DDR-2 pc5300 modules, making for a total of 32
GB of RAM, or if you will, two 16 GB ccNUMA nodes.

I've already shown you what RAID adapter card is installed, and this
adapter connects to eight hard disks, four of which are 147 GB 15k
Hitachi disks mounted in a "hidden" drive cage and to be used for the
main system, and the four others being 1 TB 7.2k Western Digital RAID
Edition SATA-2 disks, mounted in an IcyDock hotswap backplane drive
cage. There is a Plextor PX810-SA SATA double layer DVD writer and no
floppy drive. The motherboad also has a non-RAID on-board SAS
controller (which I've disabled in the BIOS) and a Firewire controller.

The original PSU was a CoolerMaster EPS12V 800 Watt, but considering the
extra drives and certain negative reviews of that CoolerMaster PSU
under heavy load, I have had it replaced now with a Zippy 1200 Watt
EPS12V PSU. The chassis is a CoolerMaster CM832 Stacker, which is not
the more commonly known Stacker but a model that now still only exists
as the black-and-green "nVidia Edition" model. Mine is completely
black, however.

There are two videocards installed. One is an older GeCube PCI Radeon
9250 card (with 256 MB), connected to the second channel on one of my
two SGI 21" CRT monitors. The other one is an Asus PCIe GeForce 8800
GTS (with 640 MB), connected to the first channel on both SGI monitors.

There are also two keyboards and one mouse. One keyboard is connected
via PS/2, the other one (and the mouse) via USB. So far the
hardware. ;-)

Now, as for my intended purposes, I am going to set up this machine with
Xen, as I have mentioned earlier. There will be three primary XenLinux
virtual machines running on this system, all of which will be Gentoo
installations.

The three main virtual machines will be set up as follows:

(1) The Xen dom0 virtual machine. For those not familiar with Xen, it
is a hypervisor that itself normally runs on the bare metal
(although it can be nested if the hardware has virtualization
extensions) but unlike the more familiar virtual machine monitors
like VMWare Workstation/Player or VirtualBox which are commonly used
on desktops and laptops, Xen does not have a "host" system. Instead
Xen has a "privileged guest", and this is called "dom0", or "domain
0". This virtual machine is privileged because it is from there
that one starts and stops the other Xen guests. It is also the
system that has direct access to the hardware - i.e. "the driver
domain".

On my machine, this is the virtual machine that will be using the
PCI Radeon card for video output and the PS/2 keyboard for input.
It will however not have full access to all the hardware, because
- and Xen allows this - the PCIe GeForce card, the soundchip on the
motherboard and all USB hubs will be hidden from Xen and from dom0.

(2) A workstation virtual machine. This is an unprivileged guest -
which in a Xen context is called "domU" - but it will also be a
driver domain, i.e. it will have direct access to the GeForce, the
soundchip and the USB hubs. It'll boot up to runlevel 3, but it'll
have KDE 4.x installed, along with loads of applications. As it has
direct access to the USB hubs, it'll also be running a CUPS server
for my USB-connected multifunctional device, a Brother MFC-9880.
It'll also be running an NFS server for multimedia files.

(3) A server virtual machine which I intend to set up - if possible -
with an OpenVZ kernel. Again for those who are not familiar with
it, OpenVZ is a modified Linux kernel which offers operating system
level virtualization. This means that you have one common kernel
running multiple, virtualized userspaces, each with their own
filesystems and user accounts, and their own "init" set up. I am
not sure yet whether I will be hiding the second Gbit Ethernet
adapter on the motherboard from dom0 and have this server domU
access it directly, or whether I will simply have this domU connect
to the dom0's Ethernet bridge.

The OpenVZ system will be running several isolated userspaces -
which are called "zones", just as in (Open)Solaris - one of which
I intend to set up as the sole system from which /ssh/ login from
the internet is allowed, and doing nothing else. The idea is that
access to any other machine in the network - physical or virtual -
must pass through this one virtual machine, making it harder for
an eventual black hat to do any damage. Then, there will also be
a generic "zone" for a DNS server and one, possibly two websites,
and one, possibly two mailservers. Lastly, another "zone" will be
running an IRC server and an IRC services package, possibly also
with a few eggdrops.

Systems (1) and (2) will be installed on the SAS disks, which are
currently set up as a RAID 5, but which I am now going to set up as a
RAID 10. System (3) itself will be installed on the same array as well
whereas the privileged userspace and the "ssh honeypot" are concerned.
The other "zones" will be installed on the SATA-2 array - currently
also set up as RAID 5 but also to be converted to RAID 10 - together
with the NFS share exported by system (2) and an additional volume for
backups. These backups will then be backed up themselves to the other
physical server - i.e. the 32-bit dual Xeon machine - as well as to
DVDs and REV disks.

As for the IRC part, I'll try to cut a very, very long story short... A
number of years ago - in July 2002, to be precise - I was part of a
group of people who started a new and small IRC network. Actually, it
all started when we decided to take over an existing but dying IRC
network in order to save it, but that's a whole other story.

Over the years, people came and went in our team - and as our team was
quite large, there were also a number of intrigues and hidden agendas
going on, resulting in some people getting fired on the spot - and we
also experienced a number of difficulties with hosting solutions -
primarily, having to pay too much money for too poor a service - and so
a little over three years ago, the remaining team members decided
jointly that it would be more cost-effective if we started self-hosting
our domain. We obtained a few second-hand servers and regular
consumergrade PCs via eBay and some SCSI disks, and we set the whole
thing up on an entry-level professional ADSL connection, all housed at
the home of one guy of our team, who was and still is living at his
parents' house. We also made up a contract that each of us would pay a
monthly contribution for the rent of the ADSL connection and
electricity, with a small margin for unexpected expenses.

So far so good, but already right from the beginning, one of us squirmed
his way out of having to pay monthly contributions, and then some ego
clashes occurred within the team - both the guy at whose home the
servers are set-up and another team member who was his best buddy are
what you could consider "socially dysfunctional" - resulting in the
loss of virtually all our users. To cut /that/ story short as well,
the guy who was running the servers at his parents' home set up a
shadow network behind my back (and on a machine of his own to which I
had no /ssh/ access) and moved over all our users to that other domain.
I only found out about it because one of our users was confused over
the two different domains and came to ask me why we had two IRC
networks which were not linked to one another.

The guy who set up that shadow network did however stay true to the
contract and kept the servers up and running, contributed financially
to the costs for the domain, and even still offered some technical
support for when things went bad - it's old hardware, and every once in
a while something breaks down and needs to be replaced. He also
meticulously kept the accounting up to date in terms of contributions
and expenses.

Then, as our contract was drawn up for an effective term of three
years - since that was the minimum rental term for the "businessgrade"
ADSL connection - and as this contract was about to end (on November
1st 2009), the guy sent an e-mail to our mailing list - sufficiently in
advance - that he had decided to step out of the IRC team at the expiry
date of the contract, but that he would help those who were still
interested in moving the domain over, and that he would still keep the
servers running until that day. So far he's still keeping the IRC
server up until I've set everything up myself, but the mail- and
webservers are down.

So at present, the IRC network that we had jointly started in 2002 is
now in suspended animation, with only one or two users (apart from the
other guy and myself) still regularly connecting, and a bunch of people
who seek to leech MP3s and pr0n - both of which are not to be found on
our network because for legal reasons we have decided to ban public
filesharing. The fines for copyright infringement or illegal "warez"
distribution over here are quite high, and I'm not prepared to go to
jail over something that stupid.

I'm not sure how I am going to revive the IRC network again - and it
will be a network again (as opposed to a single server) because one of
our old users and a girl who was on my team have both offered to set up
a server and link it to my new server - but I feel that it would be a
shame to give up on something that I have co-founded now eight years
ago and of which I have all that time been the chairman. (I was
elected chairman from the start and when someone challenged my position
and demanded re-elections one year later - as he wanted a shot at the
position - I was, with the exception of by that one person, unanimously
re-elected as chairman.)

So there will eventually be three servers on the new network (plus the
IRC services, which are considered a separate server by the IRCd
software). My now ex-colleage at whose place the main server is at
present still running did however overdo it a bit in terms of the
required set-up, hardwarewise. As I wrote higher up, it was an
entry-level businessgrade ADSL connection with eight public IP
addresses. Way too much, but the guy's an IT maniac and even more so
than I am. He's also a lot younger and still lacks some wisdom in
terms of spending.

So I am simply going to convert my residential cable internet connection
to what they call an "Office Line" over here, i.e. a single static IP
address via cable, requiring no extra hardware (as the cable modem can
handle the higher speeds) and a larger threshold for the traffic
volume, with (non-guaranteed) down/up speeds of 20 Mb/sec and 10 Mb/sec
respectively. I have a simple Linksys WRT45GL router now with the
standard firmware - which is Linux, by the way ;-) - and it'll do well
enough to do port forwarding to the respective virtual machines.
Additional firewalling can be done via /iptables/ on the respective
virtual machines.

So there you have it. Not quite as short a description as I had
announced higher up, but then again, you wanted to know. :-)

>> In the event of the OP on the other hand, 45 SAS disks of 300 GB each
>> and three SAS RAID storage enclosures also doesn't seem like quite an
>> affordable buy, so I take it he intends to use it for a business.
>
> It also does not strike me as a high value-for-money system - I can't
> help feeling that this is way more bandwidth than you could actually
> make use of in the rest of the system, so it would be better to have
> fewer larger drives and less layers to reduce the latencies. Spent
> the cash saved on even more ram :-)

Well, what I personally find overkill in this is that he intends to use
the entire array only for the "/home" filesystem. That seems like an
awful waste of some great resources that I personally would put to use
more efficiently - e.g. you could have the entire "/var" tree on it,
and an additional "/srv" tree.

Of course, a lot depends on the software. As I have come to experience
myself, lots of hosting software parks all the domains under "/home"
instead of under "/var" or "/srv". In fact, one could say that on a
general note, the implementation of "/srv" in just about every
GNU/Linux distribution is abominable. Some distros create a "/srv" dir
at install time but that's about as far as it goes. All the packages
are still configured to use "/var" for websites and FTP repositories -
which I suppose you could circumvent through symlinks - but like I
said, most hosting software typically parks everything under "/home".

> 45 disks at a throughput of say 75 MBps each gives about 3.3 GBps -
> say 3 GBps since some are hot spares. Ultimately, being a server,
> this is going to be pumped out on Ethernet links. That's a lot of
> bandwidth - it would effectively saturate four 10 Gbit links.

Well, since he talks of a high performance computing set-up, I would
imagine that he has plenty of 10 Gbit links at his disposal, or
possibly something a lot faster still. ;-)

> I have absolutely no real-world experience with these sorts of
> systems, and could therefore be totally wrong, but my gut feeling is
> that the theoretical numbers will not scale with so many drives -
> something like 15 1 TB SATA drives would be similar in speed in
> practice.

No real world experience with that sort of thing here either, but like I
said, in my opinion using 45 disks - or perhaps 42 if he keeps three
hot spares - for a single "/home" filesystem does seem like overkill to
me, and yes, there is the bandwidth issue too.

>> I have been looking into NexentaOS (i.e. GNU/kOpenSolaris) for a
>> while, which uses ZFS, albeit that ZFS was not my reason for being
>> interested in the project. I was more interested in the fact that it
>> supports both Solaris Zones - of which the Linux equivalents are
>> OpenVZ and VServer - and running paravirtualized on top of Xen.
>>
>> [...]
>> The big problem with NexentaOS however is that it's based on Ubuntu
>> and that it uses binary .deb packages, whereas I would rather have a
>> Gentoo approach, where you can build the whole thing from sources
>> without having to go "the LFS way".
>
> Why is it always so hard to get /everything/ you want when building a
> system :-(

True... Putting a binary "one size fits all"-optimized distribution on
an unimportant PC or laptop is okay by me, but for a system so
specialized and geared for performance as the one I have, I want
everything to be optimized for the underlying hardware, and I also
don't need or want all those typical "Windoze-style desktop
optimizations" most distribution vendors now build into their systems.

Gentoo is far from ideal - given some issues over at the Gentoo
Foundation itself and the fact that the developers seem mostly occupied
with discussing how cool they think they are, rather than to actually
do something sensible, and they've also started to implement a few
defaults of which they themselves say that these are not the best
choices but that they are the choices of which they think most users
will opt for them - but at least the basic premise is still there, i.e.
you do build it from sources, and as such you have more control over
how the resulting system will be set-up, both in terms of hardware
optimizations and in terms of software interoperability.

--
*Aragorn*
(registered GNU/Linux user #223157)
From: unruh on
On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote:
> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody
> identifying as David Brown wrote...
>
>>
>> rsync copying is even cleaner - the backup copy is directly
>> accessible. And when combined with hard link copies in some way (such
>> as rsnapshot) you can get snapshots.
>
> I have seen this method being discussed before, but to be honest I've
> never even looked into "rsnapshot". I do intend to explore it for the
> future, since the ability to make incremental backups seems very
> interesting.

It is actually far better than that. EAch of the backups is a complete
backup. Ie, you do not have to restore sequentially (full backup and
then each of the incrementals to get you back). What rsnapshot ( rsync)
does is to use hard links to store stuff that does not change, so there
is only one copy of those files, and then puts in new copies of stuff
which has changed. It that has the advantage of incremental, but also
the advantage of a full backup in that each backup really is a full
backup. It of course has the disadvantage of incremental that there is
only one copy of files which have not changed and if something alters
that copy all copies are altered.


>
> So far I have always made either data backups only - and on occasion,
> backups of important directories such as "/etc" - or complete
> filesystem backups, but never incremental backups. For IRC logs - I
> run an IRC server (which is currently inactive - see farther down) and
> I log the channels I'm in - I normally use "zip" every month, and then
> erase the logs themselves. This is not an incremental approach, of
> course.
>
> My reason for using "zip" rather than "tar" for IRC logs is that my
> colleagues run Windoze and so their options are limited. ;-)
>
>> Of course, .tar.bz2 is good too - /if/ you have it automated so that
>> it is actually done (or you are one of these rare people that can
>> regularly follow a manual procedure).
>
> To be honest, so far I've been doing that manually, but like I said, my
> approach is rather amateuristic, in the sense that it's not a
> systematic approach. But then again, so far the risk was rather
> limited because I only needed to save my own files.

rsnapshot is easily automated. You ALWAYS find that if something goes
wrong, it does so when you forgot to make backups for 3 months.


From: Aragorn on
On Thursday 21 January 2010 18:32 in comp.os.linux.misc, somebody
identifying as unruh wrote...

> On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote:
>
>> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody
>> identifying as David Brown wrote...
>>
>>> rsync copying is even cleaner - the backup copy is directly
>>> accessible. And when combined with hard link copies in some way
>>> (such as rsnapshot) you can get snapshots.
>>
>> I have seen this method being discussed before, but to be honest I've
>> never even looked into "rsnapshot". I do intend to explore it for
>> the future, since the ability to make incremental backups seems very
>> interesting.
>
> It is actually far better than that. EAch of the backups is a complete
> backup. Ie, you do not have to restore sequentially (full backup and
> then each of the incrementals to get you back). What rsnapshot (
> rsync) does is to use hard links to store stuff that does not change,
> so there is only one copy of those files, and then puts in new copies
> of stuff which has changed.

But I am curious... What exactly happens at the filesystem level if a
hard link is used for any given file when the filesystem holding the
backups is a different filesystem than the source? Wouldn't it make a
copy then anyway?

> It that has the advantage of incremental, but also the advantage of a
> full backup in that each backup really is a full backup. It of course
> has the disadvantage of incremental that there is only one copy of
> files which have not changed and if something alters that copy all
> copies are altered.

Ah yes, I can see how that would of course not be desirable. Imagine
you terribly screw up a file and then its copies in the backups will be
screwed up as well. Hmm...

>>> Of course, .tar.bz2 is good too - /if/ you have it automated so that
>>> it is actually done (or you are one of these rare people that can
>>> regularly follow a manual procedure).
>>
>> To be honest, so far I've been doing that manually, but like I said,
>> my approach is rather amateuristic, in the sense that it's not a
>> systematic approach. But then again, so far the risk was rather
>> limited because I only needed to save my own files.
>
> rsnapshot is easily automated. You ALWAYS find that if something goes
> wrong, it does so when you forgot to make backups for 3 months.

Well, I'll be looking into it more closely once I'll have my new machine
set up... which is a *lot* of work... 8-)

--
*Aragorn*
(registered GNU/Linux user #223157)
From: unruh on
On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote:
> On Thursday 21 January 2010 18:32 in comp.os.linux.misc, somebody
> identifying as unruh wrote...
>
>> On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote:
>>
>>> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody
>>> identifying as David Brown wrote...
>>>
>>>> rsync copying is even cleaner - the backup copy is directly
>>>> accessible. And when combined with hard link copies in some way
>>>> (such as rsnapshot) you can get snapshots.
>>>
>>> I have seen this method being discussed before, but to be honest I've
>>> never even looked into "rsnapshot". I do intend to explore it for
>>> the future, since the ability to make incremental backups seems very
>>> interesting.
>>
>> It is actually far better than that. EAch of the backups is a complete
>> backup. Ie, you do not have to restore sequentially (full backup and
>> then each of the incrementals to get you back). What rsnapshot (
>> rsync) does is to use hard links to store stuff that does not change,
>> so there is only one copy of those files, and then puts in new copies
>> of stuff which has changed.
>
> But I am curious... What exactly happens at the filesystem level if a
> hard link is used for any given file when the filesystem holding the
> backups is a different filesystem than the source? Wouldn't it make a
> copy then anyway?

The hard links are on the backups, not between the backups and the
source, -- that is not a backup at all. Ie, if you have 3 backups with
files f1, f2 and f3. file f2 changed between backup 1 and 2 and then
stayed the same, f3 was added after backup 2.

B1 will have f1 and f2. B2 will have f1 hard liked with the B1 version
of f1, and will have the different ( new) f2'. B3 willhave F1 and F2
hard liked to the B2 versions of f1 and f2, and will have the additional
file f3.

Ie, each backup has the full complement of files for its backup, and yo
ucan restore then completely just from those backups.

rsync -av B2/ /home/tipple/
will restore all of the files to tipple that were there whan backu B2
was made, and no reference to B1 or B3 need be made. In fact if you do
rm -r B1, this will make no difference to B2.
However the storage space requirement will be that for only one copy of
f1, two different copies of f2 ( the old B1 and the new B2 version) and
one copy of f3, instead of 3 copies of f1, three of f2 and one of f3.

You of course will at all times have a separate copy of the files in the
orginal.


>
>> It that has the advantage of incremental, but also the advantage of a
>> full backup in that each backup really is a full backup. It of course
>> has the disadvantage of incremental that there is only one copy of
>> files which have not changed and if something alters that copy all
>> copies are altered.
>
> Ah yes, I can see how that would of course not be desirable. Imagine
> you terribly screw up a file and then its copies in the backups will be
> screwed up as well. Hmm...

NONONONONO. There is no link from the files to the backups, only amongst
the backups.
If you screw up the original, you just replace it.

The problem comes if you happenen to for some reason go into say B2 and
edit the file f1 in B2 ( what the hell your are doing editing a backup
file I do not know-- maybe you are a prime minister trying to alter the
emails you sent your mistress), then all three backups B1/f1, B2/f1 and
B3/f1 will be altered. However if you erase say B1/f1 then you will
still have the complete B2/f1 and B3/f1 (that is how hard links work.


>
>>>> Of course, .tar.bz2 is good too - /if/ you have it automated so that
>>>> it is actually done (or you are one of these rare people that can
>>>> regularly follow a manual procedure).
>>>
>>> To be honest, so far I've been doing that manually, but like I said,
>>> my approach is rather amateuristic, in the sense that it's not a
>>> systematic approach. But then again, so far the risk was rather
>>> limited because I only needed to save my own files.
>>
>> rsnapshot is easily automated. You ALWAYS find that if something goes
>> wrong, it does so when you forgot to make backups for 3 months.
>
> Well, I'll be looking into it more closely once I'll have my new machine
> set up... which is a *lot* of work... 8-)

So reduce the work of making backups.

From: Aragorn on
On Thursday 21 January 2010 19:36 in comp.os.linux.misc, somebody
identifying as unruh wrote...

> On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote:
>> On Thursday 21 January 2010 18:32 in comp.os.linux.misc, somebody
>> identifying as unruh wrote...
>>
>>> On 2010-01-21, Aragorn <aragorn(a)chatfactory.invalid> wrote:
>>>
>>>> On Wednesday 20 January 2010 23:59 in comp.os.linux.misc, somebody
>>>> identifying as David Brown wrote...
>>>>
>>>>> rsync copying is even cleaner - the backup copy is directly
>>>>> accessible. And when combined with hard link copies in some way
>>>>> (such as rsnapshot) you can get snapshots.
>>>>
>>>> I have seen this method being discussed before, but to be honest
>>>> I've never even looked into "rsnapshot". I do intend to explore it
>>>> for the future, since the ability to make incremental backups seems
>>>> very interesting.
>>>
>>> It is actually far better than that. EAch of the backups is a
>>> complete backup. Ie, you do not have to restore sequentially (full
>>> backup and then each of the incrementals to get you back). What
>>> rsnapshot ( rsync) does is to use hard links to store stuff that
>>> does not change, so there is only one copy of those files, and then
>>> puts in new copies of stuff which has changed.
>>
>> But I am curious... What exactly happens at the filesystem level if
>> a hard link is used for any given file when the filesystem holding
>> the backups is a different filesystem than the source? Wouldn't it
>> make a copy then anyway?
>
> The hard links are on the backups, not between the backups and the
> source, -- that is not a backup at all. Ie, if you have 3 backups with
> files f1, f2 and f3. file f2 changed between backup 1 and 2 and then
> stayed the same, f3 was added after backup 2.
>
> B1 will have f1 and f2. B2 will have f1 hard liked with the B1 version
> of f1, and will have the different ( new) f2'. B3 willhave F1 and F2
> hard liked to the B2 versions of f1 and f2, and will have the
> additional file f3.

Ahh, okay, I get it now. ;-)
>>> It that has the advantage of incremental, but also the advantage of
>>> a full backup in that each backup really is a full backup. It of
>>> course has the disadvantage of incremental that there is only one
>>> copy of files which have not changed and if something alters that
>>> copy all copies are altered.
>>
>> Ah yes, I can see how that would of course not be desirable. Imagine
>> you terribly screw up a file and then its copies in the backups will
>> be screwed up as well. Hmm...
>
> NONONONONO. There is no link from the files to the backups, only
> amongst the backups.
> If you screw up the original, you just replace it.

Okay, I see what you mean. ;-)

> The problem comes if you happenen to for some reason go into say B2
> and edit the file f1 in B2 ( what the hell your are doing editing a
> backup file I do not know-- maybe you are a prime minister trying to
> alter the emails you sent your mistress), [...

Well, that seems like a fashionable thing to do these days, but then
again, I've never been that fashionable. :p

> ...] then all three backups B1/f1, B2/f1 and B3/f1 will be altered.
> However if you erase say B1/f1 then you will still have the complete
> B2/f1 and B3/f1 (that is how hard links work.

Yes, I know what a hard link is. It just seemed weird to me - in my
orignal misunderstanding of your explanation - that one could create a
hard link across filesystem boundaries. ;-)

>>> rsnapshot is easily automated. You ALWAYS find that if something
>>> goes wrong, it does so when you forgot to make backups for 3 months.
>>
>> Well, I'll be looking into it more closely once I'll have my new
>> machine set up... which is a *lot* of work... 8-)
>
> So reduce the work of making backups.

Well, you've convinced me in favor of /rsnapshot/ so I will give that a
closer look. ;-)


--
*Aragorn*
(registered GNU/Linux user #223157)