Site Tools


Hotfix release available: 2024-02-06a "Kaos". upgrade now! [55.1] (what's this?)
New release available: 2024-02-06 "Kaos". upgrade now! [55] (what's this?)
cs_backup_and_restore

WPICS Backups and Restore procedures

This page answers:

  • What is backed up?
  • When are the machines backed up?
  • How do we back it up?
  • How does one do a restore?
  • WHO may request a restore?

FIRST THINGS FIRST -- Do you need to do a restore at all??

If a user has nuked a file from /csusers or their M: drive on a PC, a restore may not be necessary at all. Login to a UNIX machine like cs.wpi.edu (or even better: owl.cs.wpi.edu) and check for a zfs SNAPSHOT copy of their deleted file:

 /csusers/SNAPSHOTS/[SNAPSHOT-DIRECTORY]/[username]

Snapshots are taken of the user direcoties once every hour. The hourly backups are kept for 48 hours. Dailies, weeklies and monthlies are also taken and are all kept for long periods of time. Check there for the files that the user mistakenly deleted, and you might not need to go to tape, or to the rdiff-backups at all.

What is backed up.

WPICS Backs up user files on the departmental fileserver, and on certain research machines. A small number of faculty office machines get backed up as well.

The the list of machines (and their disks) that get backed up are located on the backup server, ROUS. Machines backed up with AMANDA are listed in the “disklist” files which are to be found under the operator user's home directory (see the “Daily2” and “Monthly” directories) on that machine. See the machine entry for ROUS for more information.

An exception is CSMAIL, which backs up its own /var/mail directory every TWO HOURS to a special area under /backups/csmail. In the future this snapshot directory may be moved to ROUS. If someone requests a /var/mail/[username] restore, it is best to do that restore on CSMAIL.

departmental infrastructure that gets backed up

research machines that get backed up

When are the machines backed up?

Nightly AMANDA backups start at roughly 23:30 each night. RDIFF_BACKUP snapshots happen twice a day–at noon and midnight. Monthlies happen at a miscellaneous time at the end of each month, when the Lab Manager gets around to it. Monthly backups are not automated, so if the LM forgets, the Monthly will not happen.

How do we back it up?

Amanda

WPICS uses the AMANDA automated tape backup system. This system is used to backup all of the machines listed above into a central location, the CS Dept backups server, ROUS.

rdiff-backup

We're starting off on use of RDIFF-BACKUP in the department also, especially as some server filesystems grow larger.

RDIFF-BACKUP is setup according to the instructions here.

How do I do a restore?

Make sure you can do a restore before telling someone that you can save them from their disaster. Only the machines listed above are backed up on a nightly basis.

If you're a user and you're going to mistakenly nuke one of your files, the best time to do it is early in the morning, after perhaps 5AM. The worst time to lose a file is 23:25 or so, immediately before the backups start.

Procedure:

  • Login to rous.wpi.edu
  • Become the operator:
      sudo su - operator
  • find out if the machine you're doing a restore from is being backed up using AMANDA or using RDIFF-BACKUP. If a machine is backed up with RDIFF-BACKUP you will see in /home/operator on ROUS a symlink for the machine pointing to the /rdbackups directory; if there isn't such a symlink the machine may be backed up using AMANDA. The restore technique is different, of course, for each one. Restoring from RDIFF-BACKUP is quite simple when compared to doing an amanda restore. Fortunately our most-used machines are backed up using RD-B.

Restore using Rdiff-backup

RDIFF-BACKUP is more convenient than Amanda in that yo don't need to remember any seemingly-exotic unix commands to get your restored file. Login to ROUS, become the operator (sudo su - operator) and then CD into the directory for the machine you need to do a restore from:

 su - operator
 cd OWL
 cd csusers/[some-username]
 ls

You'll see a listing of the files in the “some-username” home directory–as they appeared the last time the backup happened. What you're looking at is a “snapshot” of the user's directory at the time the last backup happened. It could be that the file you're looking for is there already…. if this is the case, you can copy the file out of there (see below). Do NOT move the file, or alter the backup directory in any way–you're working in the BACKUP AREA and we don't want to foul up what may be our only copy of it!

If the file was mistakenly deleted more than a day ago, you won't see the file when you do your directory listing–the user deleted it, so it isn't in the snapshot. That doesn't mean that we don't have a copy of it, though. The RDIFF-BACKUP snapshot also contains historical information going back 60 days.

Restoring from a certain amount of time ago .....

Sometimes people don't know the date from which they'd like their file restored. So you can grab a copy from some amount of time ago. Restoring a file from its version, say, three days ago can be done this way:

  rdiff-backup -r 3D ~operator/OWL/csusers/mvoorhis/filename /tmp/restored-filename-for-mvoorhis
  • The “-r” flag tells rdiff-backup that we want to do a restore.
  • The “3D” means that we want a file from 3 days ago. We could just as easily use “2W” for two weeks, etc.
  • The first filename, “~operator/OWL/csusers/…” is the name of the file (or directory!) that the user has requested be restored, and
  • The “/tmp/restored-file…” will be our copy of it.

If you want 18 hours ago instead of 3 days, use “18h” instead of the “3D” we used in our example. The available time-increments are s, m, h, D, W, M, or Y (indicating seconds, minutes, hours, days, weeks, months, or years respectively). The abbreviations may be combined, i.e., “3D12h13m12s” is 3 days, 12 hours, 13 minutes and 12 seconds. The time is measured backwards from the current moment when you issue the command.

Restoring a file from a specific date .....

If you do not know how recently the restore needs to be, you can find out what copies of the requested file exist on the backup server. For instance, if you've been asked to restore a file from the machine YUKON:

/home/claypool/proj/dragonfly/book/engine.tex

From “some time close to 11 July” you can find the best file without repeatedly restoring many versions from the archive and comparing them (which is tedious). In our example here, find the available versions of yukon:/home/claypool/proj/dragonfly/book/engine.tex by CDing into the YUKON backup area on our backup server:

cd /rdbackups/YUKON/home-claypool/proj/dragonfly/book

and asking for a listing of the existing increments:

rdiff-backup -l ./engine.tex

example output:

operator@backups 206> pwd
/rdbackups/YUKON/home-claypool/proj/dragonfly/book
operator@backups 207> rdiff-backup -l engine.tex
Found 6 increments:
    engine.tex.2015-05-28T12:36:32-04:00.diff.gz   Thu May 28 12:36:32 2015
    engine.tex.2015-05-29T00:45:44-04:00.diff.gz   Fri May 29 00:45:44 2015
    engine.tex.2015-07-01T00:28:40-04:00.diff.gz   Wed Jul  1 00:28:40 2015
    engine.tex.2015-07-11T12:20:56-04:00.diff.gz   Sat Jul 11 12:20:56 2015
    engine.tex.2015-07-13T00:24:30-04:00.diff.gz   Mon Jul 13 00:24:30 2015
    engine.tex.2015-07-16T12:24:23-04:00.diff.gz   Thu Jul 16 12:24:23 2015
Current mirror: Sun Jul 19 12:20:24 2015

Then ask for the version closest to the 11th as requested, using this command (still from the YUKON backup area on the backup server that you just CD'd to):

rdiff-backup -r 'Sat Jul 11 12:20:56 2015' ./engine.tex /tmp/restored-engine.tex-from-11-July

This will restore the version of the file from July 11th 12:20:56 and place the restored file in /tmp/restored-engine.tex[…]. The date after the “-r ” is simply copied from the last column of the increment list, above. Make sure not to forget the single-quotes around the date.. if you leave the quotes out, rdiff-backup will be confused by your request.

When the restore is done, use SCP to copy the restored stuff over to the machine where the user wants the restore.

continue with rdiff-backup restore narrative here

Restore using Amanda

  • cd into the Daily backups directory:
      cd Daily2
  • Find out which tape the latest backup is on:
      amadmin Daily2 find [machine] [disk]
  • An example command:
      ROUS /home/operator> amadmin Daily2 find davis home1
      date                host  disk   lv tape or file file part status
      [...lots of output...]
      2009-04-19          davis /home1  3 Daily2-37      16   -- OK
      2009-04-19          davis /home1  3 Daily2-40      17   -- OK
      2009-04-20          davis /home1  3 Daily2-41      21   -- OK
      2009-04-21 11:11:27 davis /home1  3 Daily2-42      17  1/1 OK
      2009-04-21 23:30:01 davis /home1  3 Daily2-43      19  1/1 OK
      2009-04-22 23:30:02 davis /home1  3 Daily2-44      18  1/1 OK
      2009-04-23 23:30:01 davis /home1  3 Daily2-45      15  1/1 OK
  • From the output above, we see that the davis disk /home1 got its last backup on the 23rd of April, and that it was a level 3 backup and is on the tape Daily2-45, and is the 15th file on that tape.
  • To restore from the davis level three we found (above) use this command on ROUS:
    cd /oubliette/amhold/restore 
    dd if=`ls /oubliette/tapes/t45/data/*15.davis*` bs=32k skip=1 | gunzip | restore -ivf -
  • In the command above, “t45” means we want the 45th tape (i.e., Daily2-45), “*15.davis* means we want the 15th file from the tape, containing a file from davis. You will be presented with a RESTORE prompt. If you get an error saying “restore: Tape is not a dump tape” then the backup is probably a TAR file, and you should use a different command to do the extraction:
    dd if=`ls /oubliette/tapes/t45/data/*15.davis*` bs=32k skip=1 | gunzip | tar tvf - > contents
  • which creates a table of contents for you. Look through the TOC, and find the name of the file or directory you want, and extract that with a second command:
    dd if=`ls /oubliette/tapes/t45/data/*15.davis*` bs=32k skip=1 | gunzip | tar xvf - ./dir/to/extract
  • If you're restoring from a level-zero backup, there can be a substantial wait involved while the extract happens. A full level-zero backup of tank1:/export/users1 for instance is a 250 gigabyte compressed tarfile, and the extract can take the better part of an hour to crunch all the way through the file.

WHO may request a restore?

THanks for reading all the way down to this point in the page, because this is a very important question. Clearly only the real owner of a file should be able to ask for a restore of that file. If user A were able to get a version of user B's file just by asking, that would defeat the whole purpose of file protection and a secure system. So don't perform restores casually for anyone.

From time to time though, there is an exception to this rule. Typically the exception is made for FACULTY requesting the files of a departed advisee who has become unreachable.

The other exception of course is for carless sysadmins who might have accidentally deleted something. :)

cs_backup_and_restore.txt · Last modified: 2015/07/19 22:27 by mvoorhis