"Linux Gazette...making Linux just a little more fun!"
A Convenient and Practical Approach to Backing Up Your Data
July 19,1998
Every tool I have found for Linux and other UNIX environments seems to
be designed primarily to backup files to tape or any device that can
be used for streaming backups. Often this method of backing up is
infeasible, especially on small budgets. This led to the development
of bu, a tool for backing up by mirroring the files on another file
system. bu is not necessarily meant as a replacement for the other
tools (although I have set up our entire disaster recovery system
based on it for our development servers), but more commonly as a
supplement to a tape backup system. The approach I discuss below is a
way to manage your backups much more efficiently and stay better
backed up without spending so much money.
* Some problems I have found with streaming backups
-
1. The prices and storage capacities often make it infeasible.
The sizes of hard drives and the amount of data stored on an
average server or even workstation is growing faster than the
capacity of the lower end tape drives that are affordable to
the individual or small business. 5 and 8 gig hard drives are
cheap and common place now and the latest drives go up to at
least 11 gig. However, the most common tape drives are only a
few gig. Higher capacity/performance tape drives are
available but the costs are out of the range of all but the
larger companies.
For example:
Staying properly backing up with 30GB of data (which can be
just 3 or 4 hard drives) to a midrange tape drive, can cost
$15,000 to $25,000 or more inside of just 2 to 4 years. There
is a typical cost scenario on
http://www.exabyte.com/home/press.html.
This is just the cost for the drive and tapes. It does not
include the cost of time and labor to manage the backup system.
I discuss that more below. With that in mind, the comments I
make on reliability, etc, in the rest of this article are based
on my experience with lower end drives. I haven't had thousands
of extra dollars to throw around to try the higher end drives.
2. The cost of squandered sys admin time and the lost productivity
of users or developers waiting for lost files to be restored, can
get much more expensive than buying extra hard drives.
To backup or restore several gig of data to/from a tape can take
up to several hours. The same goes for trying to restore a
single file that is near the end of the tape. I can't tell you
how frustrating it is to wait a couple of hours to restore a
lost file only to discover you made some minor typo in the
filename or the path to the file so it didn't find it and you
have to start all over. Also, if you are backing up many gig of
data, and you want to be fully backed up every day, you either
have to keep a close eye on it and change tapes several times
throughout the day, every day, or do that periodically and do
incremental backups onto a single tape the rest of the days.
With tapes, the incremental approach has other problems, which
leads me to number 3.
3. Incremental backups to tape can be expensive, undependable
and time consuming to restore.
First, this kind of backup system can consume a lot of time
labeling, and tracking tapes to keep track of the dates and
which ones are incremental and which ones are full backups, etc.
Also, if you do incremental backups throughout a week, for
example, and then have to restore a crashed machine, you can
easily consume up to an entire day restoring from all the tapes
in sequence in order to restore all the data back the way it
was. Then you have Murphy to deal with. I'm sure everybody is
familiar with Murphy's laws. When you need it most, it will
fail. My experience with tapes has revealed a very high failure
rate. Probably 20 or 30% of the tapes I have tried to restore
on various types of tape drives have failed because of one
problem or another. This includes our current 2GB DAT drive.
Bad tape, dirty heads when it was recored, who knows. To
restore from a sequence of tapes of an incremental backup, you
are dependent on all the tapes in the sequence being good. Your
chances of a failure are very high. You can decrease your
chance of failure, of course, by verifying the tape after each
backup but then you double your backup time which is already to
long in many cases.
* A solution (The history of the bu utility)
-
With all the problems I described above, I found that, like most
other people I know, it was so inconvenient to back up that I
never stayed adequately backed up, and have payed the price a time
or two. So I set up file system space on one of our servers and
periodically backed up my file systems over nfs just using cp.
This way I would always be backed up to another machine if mine
went down and I could quickly backup just one or a few files
without having to mess with the time and cost of tapes. This
still wasn't enough. There were still times I was in a hurry and
didn't want to spend the time making sure my backup file system
was NFS mounted, verifying the pathname to it, etc, before doing
the copy. Manually dealing with symbolic links also was
cumbersome. If I specified a file to copy that was a symbolic
link, I didn't want it to follow the link and copy it to the same
location on the backup file system as the link. I wanted it to
copy the real file it points to with it's path so that the backup
file system was just like the original. I also wanted other
sophisticated features of an incremental backup system without
having to use tapes. So, I wrote bu. bu intelligently handles
symbolic links, can do incremental backups on a per directory
basis with the ability to configure what files or directories
should be included and excluded, has a verbose mode, and keeps log
files. Pretty much everything you would expect from a fairly
sophisticated tape backup tool (except a GUI interface :-) but is
a fairly small and straight forward shell script.
* Backup strategy
-
Using bu to backup to another machine may or may not be a good
replacement for a tape backup system for others as it has for us,
but it is an excellent supplement. When you have done a lot of
work and have to wait hours or even days until the next scheduled
tape backup, you are at the mercy of Murphy until that time, then
you cross your fingers and hope the tape is good. To me, it is a
great convenience and a big relief to just say "bu src" to do an
incremental backup of my whole src directory and know I
immediately have an extra copy of my work if something goes wrong.
It is much easier and faster to restore a whole file system over
NFS than it is from a tape. This includes root (at least with
Linux). And, it is vastly faster and easier to restore just one
file or directory just using the cp command.
So far as cost: You can get extra 6GB hard drives now for less
than $200 dollars. In fact I can buy a whole new computer with
extra hard drives to use as a backup server for $1000 or less now.
Much less than the cost of buying just a mid to high end tape
drive, not counting the cost of all the tapes and extra time spent
managing them. In fact, one of the beauties of Linux is, even
your old 386 or 486 boat anchors make nice file servers for such
things as backups.
For those individuals and small businesses who use zip
drives and jaz drives for backing up so they can have multiple
copies or take them off site, bu is also perfect, since
incremental backups can be done to any file system. I often use
it to back up to floppies to take my most critical data and recent
work off site.
Here is an interesting strategy we have come up with using bu that
is the least expensive way to stay backed up we could come up with
for our environment. It is the backup strategy we are setting up
for our development machines which house several GB of data. Use
bu to backup daily and right after doing work, to file systems
that are no more than 650 mb. Then, once or twice a month, cut
worm CD's from those file systems to take off site. WORM CD's are
only about a dollar each in quantities of 100, and CD WORM writers
have gotten cheap. This way your backups are on media that
doesn't decay like tapes and floppies tend to do. Re-writable
CD's are also an option if you don't mind spending a bit more
money. If you have just too much data for that to be practical,
hard drives are cheap enough now that it is feasible to have extra
hard drives and rotate them off site. It is nice to have one of
those drive bays that allow you to un-plug the drive from the
front of the machine if you take this approach. Where bu will
really shine with large amounts of data, is when we finally can
get re-writable DVD drives with cheap media. I think, in the
future, with re-writable DVD or other similar media on the
horizon, doing backups to non-random access devices such as tape
will become obsolete and other backup tools will likely follow the
bu approach anyway.
* Getting bu
-
bu is freely re-distributable under the GNU copyright.
http://www.hightek.org/bu/
ftp://www.hightek.org/pub/vstemen/bu/bu.tar.gz
Copyright © 1998, Vincent Stemen
Published in Issue 32 of Linux Gazette, September 1998