Name

rsback -- Program to backup file trees in rotating archives on Unix-based hosts

back to index

Version

$Id: rsback,v 0.4.2 2002/10/26 00:04:02 hjb Exp $

back to index

Synopsis

To start one ore more backup tasks:

        rsback [options] list-of-tasks

To get help:

        rsback -h

back to index

Description

rsback makes rotating backups using the common rsync program (http://rsync.samba.org) and some standard file utilities on Unix-based backup hosts. Its purpose is to mirror certain file trees from a remote host or from the local system and to store them as rotating archives in backup repositories on the local backup host. The file structure, permissions, ownerships and time stamps of the mirrored data are the same as in the original sources.

rsback is a kind of front end to rsync written in Perl which allows a system administrator to configure and excute backups of different file trees located on remote hosts or on the local system (e.g tasks for hourly, daily, weekly, monthly, ... backups).

If rsback is executed at regular intervals (preferably scheduled by cron jobs), it maintains rotating backup archives. To restore files from the backup repository no special restore procedure is necessary. To recover files or directories, you just copy them from the archive tree back to the original location or wherever you want to place them.

The combination of rsync's powerful capabilities and the extensive use of hard links for copying archives within the local file system results in a fast and disk space saving backup technique.

back to index

Prerequisites

rsback runs on Unix-based hosts. I tested it on some Linux boxes running RedHat 6.x and 7.x. It should also run on other Unix-based systems if the following programs and utilites are installed:

  • rsync: I recommend the most recent version [1]. If you want to mirror file trees from remote Windows boxes you also need Cygwin [2].
  • Perl 5.005 (or higher version) [3]
  • The common file utilities cp, rm, mv, and mkdir [4]
  • cron or similar program to execute scheduled commands (recommended)
  • You should have some knowledge about rsync [5]
  • You need root privileges to install and run rsback on a backup host

back to index

Motivation

I was looking for a backup solution suitable for a workgroup server (Linux box) where some project folders (10 to 20 Gigabytes) have to be mirrored daily.

For a while I tried several different backup techniques. But I was not really happy with any of them.

By accident I found rsync on my disk and tried to find out what it could be used for ... looks good ;)

Searching for a ready-to-use backup solution based on rsync in the net I found Mike Rubel's examples of rotating rsync snapshots [6]. I seemed to be a solution for my problem. To handle configurations of different baskup tasks more comfortably I finally made a kind of front end or wrapper based on Mike's sample scripts:

The result was rsback (RSync BACKup ... hmm)

back to index

How it works

Example

The explanations below will refer to a typical example like this:

We want to maintain a rotating backup repository of a file tree which resides on a remote host workbox. The remote host runs rsync in daemon mode on TCP port 873. The file tree on workbox consists of all subdirectrories and files of /var/projects which is accessible via workbox::work. The corresponding entry in the rsyncd configuration file workbox:/etc/rsyncd.conf may look like this (the most simple case):

        [work]
        path = /var/projects
        comment = project directories

Backup steps

The backup concept of rsback is based on two steps:

  1. Rotation
  2. Backup

The repetitive combination of rotation and backup results in backup archives which are comparable to classic combinations of full and incremental backups with respect to the content of the archives.

Task work-daily

All files and directories under workbox::work (or workbox:/var/projects respectively) should be saved every workday night to our local machine backbox. The five latest daily backup sets should be kept in the backup repository on backbox.

Task work-weekly

Additionally, a weekly backup of the most recent local archive should be made on saturday. The four latest weekly backup sets should also be kept in the repository on backbox.

That way we should have the data of the last five working days and the weekly shnapshots of the last four weeks (taken every Friday) in our backup repository.

Remark

Tasks are not restricted to be processed at daily or weekly intervals as in this example. It's up to you how often you perform backups and how many archives you keep in your repositories.

Backup repositories

Let us assume that our local host backbox has a large disk which is mounted to /backup. The directory /backup will hold our local backup repositories.

Archive structure

The backup repository on backbox in our example looks like this:

        /backup
         +--/work
         |   history.work-daily     history of task work-daily
         |   history.work-weekly    history of task work-weekly
         |   +--/daily.0            most recent daily archive tree
         |   +--/daily.1            \
         |   +--/daily.2             |
         |   +--/daily.3             |-previous daily archives
         |   +--/daily.4             |
         |   +--/daily.5            /
         |   +--/weekly.0           most recent weekly archive tree
         |   +--/weekly.1           \
         |   +--/weekly.2            |-previous weekly archives
         |   +--/weekly.3            |
         |   +--/weekly.4           /

The directories ../daily.0 to ../daily.5 contain copies of the original data of the most recent daily backup run (daily.0), of the backup run one day before (daily.1), ..., and of the backup run five days ago (daily.5) respectively. The directories ../weekly.0 to ../weekly.4 are the archives of the most recent weekly tasks and of the previous weekly tasks, respectively.

History file

A history file for each backup task keeps track of the time stamps of the archives. A History file consits of a table of two (tab separated) columns. For each consecutive backup run there is a row with the backup number in column one and the date and time in ISO format in column two:

        # rsback-0.4.0 (hjb -- 2002-07-16)
        0    2002-07-17 22:24:05
        1    2002-07-16 22:24:13
        2    2002-07-15 22:24:30
        3    2002-07-12 22:25:28
        4    2002-07-11 22:24:20
        5    2002-07-10 22:24:16
        6    2002-07-09 20:15:37

The history file is read before a backup task is processed. If no history file exists it will be created using the time stamps of the existing archive tree (if there is any). After the backup task has finished, the recent history will be written to the history file.

Daily rotation

When a backup task is executed, first the previous backup archives in the repository are rotated by hard-linking the archives among themselves. In our example:

        rm -rf daily.5
        mv -al daily.4 daily.5
        mv -al daily.3 daily.4
        mv -al daily.2 daily.3
        mv -al daily.1 daily.2

The backup set daily.1 is replaced by hard links to the most recent backup set daily.0:

        cp -al daily.0 daily.1

Daily backup

Using rsync the source tree is mirrored from a remote or local file system to the local backup repository. The default behaviour is, that only files and directories are copied which are different from their couterparts in the backup repository. Different means: the size, time stamp, or ownership of a file/directory has changed since the last backup to the same repository, or a file/directory doesn't (yet) exist in the repository. Items in the backup repository, which do not exist in the source tree, are removed from the backup repository.

This action is launched by invoking rsync like

        rsync -al --delete <source> <destination>

Weekly rotation

This is done in same manner as the daily rotation, execpt that (in our example) the archives from weekly.0 to weekly.4 are rotated.

Weekly backup

We want to make a snapshot of the most recent daily backup archive in our backp repository. Both the source and the destination are local directories. Therefore this backup executed by hard-linking daily.0 to weekly.0:

        rm -rf weekly.0
        cp -alf daily.0 weekly.0

back to index

Installation

To install rsback on a backup host, login as root and proceed as follows.

Copy the downloaded archive rsback-x.y.z.tar.gz (x.y.z is the actual version) to a installation directory, e.g. /usr/local/src. Change to this directory and unpack the archive:

        # cd /usr/local/src
        # tar zxvf rsback-x.y.z.tar.gz

Copy rsback to a bin directory in root's path, e.g.

        # cp rsback-x.y.z/bin/rsback /root/bin

Be sure that rsback is executable only by root:

        # chmod 700 /root/bin/rsback

Create a configuration directory and copy the sample configuration files from ../rsback-x.y.z/etc into it:

        # mkdir /etc/rsback
        # cp rsback-x.y.z/etc/* /etc/rsback

Be sure that only root has access to rsback.conf:

        # chown root.root /etc/rsback/rsback.conf
        # chmod 600 /etc/rsback/rsback.conf

Now you may delete the archive:

        # rm rsback-x.y.z.tar.gz

back to index

Configuration

Some configuration parameters will just be passed as options to rsync. Therefore it is strongly recommended that you consult the rsync documentation [5] and the man pages (rsync(1), rsyncd.conf(5)), if you are not sure, what rsback does. Before you run your configuration with production data, make some tests with dummy data first. Compare the results carefully with that, what you have expected.

You should consider some general precautions, if your machines can be accessed by more people than only you.

  • Don't allow data transfers to and from remote machines without authentication or other access restrictions.
  • Don't transfer clear text passwords.
  • Don't transfer unencrypted sensible data.
  • Don't give write access to the backup repositories to anyone else than root@backupbox.
  • Don't give read access to backup repositories to anyone else than the owner of the original data.
  • Don't give read or even write access to rsback.conf to anybody else than root@backupbox:
            # chown root.root /etc/rsback/rsback.conf
            # chmod 600 /etc/rsback/rsback.conf
    

Configuration file

Edit rsback.conf to customize rsback and to define your backup tasks.

Default location

If you want to have the default configuration file somewhere else than /etc/rsback/rsback.conf, edit the variable $rsback_conf in rsback to match your preferences. Or use option -c to tell rsback where to find the configuration file (see Usage).

File format

The file format is similar to that of rsyncd.conf(5).

The file is line-based - that is, each newline-terminated line represents either a comment, a section name or a parameter. Any line beginning with a hash # or a semicolon ; is ignored, as are lines containing only whitespace. The file consists of sections and parameters. A section begins with the name of the section in square brackets and continues until the next section begins. Sections contain parameters of the form name = list-of-values, where list-of-values is a list of one or more strings.

Global section

In the section [global] some general configuration parameters are defined. If not noted explicitly as optional, all parameters are mandatory.

Paths to commands

rsback needs to know where to find some programs. Set the paths with the parameters rsync_cmd, cp_cmd, mv_cmd, rm_cmd, and mkdir_cmd according to your system. The default settings in the sample configuration file comming with rsback are

parameter: rsync_cmd
        rsync_cmd = /usr/bin/rsync
parameter: cp_cmd
        cp_cmd = /bin/cp
parameter: mv_cmd
        mv_cmd = /bin/mv
parameter: rm_cmd
        rm_cmd = /bin/rm
parameter: mkdir_cmd
        mkdir_cmd = /bin/mkdir

parameter: tasks

tasks is a list of all backup tasks you want to execute. A back up task in this context is just a arbitrary word to denote a certain backup job. The specific parameters of each backup task listed in tasks have to be defined in a separate task section.

parameter: exclude_file (optional)

exclude_file points to a file containing global exclude patterns for rsync. 'global' means: these patterns are applied to all backup tasks wich are excuted with mode=rsync (see task_sections). Please refer to the rsync documentation (look for ``exclude patterns'') or to the man page (rsync(1)). The value given here will be passed to rsync with the command option --exclude-from as it is.

parameter: rsync_options (optional)

The optional parameter rsync_options defines additional options which will be passed to rsync. For example you my choose

        rsync_options = --stats

to tell rsync to report some statistics on the file transfer. This parameter applies to all backup tasks. You can also define additional options which will only applied to certain tasks within the task sections.

Task sections

Parameters specific to certain backup tasks are declared within corresponding task sections. There should be one task section for each backup task listed with the global parameter tasks (see global section). E.g., if you have declared

        [global]
        tasks = work-daily work-weekly misc

the task sections

        [work-daily]
        .
        .
        [work-weekly]
        .
        .
        [misc]
        .
        .

must be present.

parameter: mode

This parameter controls what backup mode will be used for execution of this task. Use mode=rsync, if you want to backup the original source tree either from a remote host or form the local machine using rsync.

mode=link is intended to be used for local copies on the backup host. This makes sense only, if both the source and the destination reside on the same physical partion, because hard links will be used.

parameter: source

source designates the location of the source data to be saved. The format depends on the backup mode and the loaction of the source files. This parameter will be passed as source to rsync if mode=rsync is selected or to cp if mode=link is selected. Please refer to the man pages rsync(1) and cp(1) to select the right one for your purpose.

E.g. if the source data resides on the remote host workbox which is running rsync in daemon mode (as in our example above) then source is something like this

        source = workbox::work/

If mode=link the parameter source designates the source directory on the local host. The task work-weekly in our example above needs a line like

        source = /backup/work/daily.0

in its task section.

parameter: destination

destination is the directory within the local backup repository. It is not a bad idea to use directory names in the destination path which can easily be related to a backup task (or vice versa). E.g. if we refer to the task work-daily of our example then it is something like

        destination = /backup/work

The definition for the task work-weekly or our example is also

        destination = /backup/work

This may be confusing, but consider, that the final archive directory will always be a subdirectory of this path, named according to your selection in the first rotate parameter (see below).

parameter: rotate

This parameter consists of a list of two values: the first value is an arbitrary name to designate the archive directory in the local depository. The second value is an positive integer number, which defines how many backup sets have to be kept in the repository.

Example:

        rotate = daily 5

parameter: rsync_options ( optional )

Same as parameter rsync_options in the [global] section, but applies only to this task.

parameter: exclude_file ( optional )

This parameter has the same purpose as in the global section. The only difference is, that it is applied to this task only (see also below).

Example:

        exclude_file = /etc/rsback/work-daily.exclude

Exclude files

Patterns to exclude files or directories from beeing rsync'd are collected in separate files, see parameter exclude_file above. Because these exclude files are directly passed to rsync with the option --exclude-from=FILE they must have a format as rsync wants to see. Please consult the section ``EXCLUDE PATTERNS'' in rsync(1).

Global and task specific exclude files are cumulative: both the exclude patterns in the global exclude file and the patterns in the exclude file defined in a task section will be applied to the source tree when a backup task is processed.

back to index

Usage

Starting backup tasks

To start a backup task invoke

        # rsback [options] task-list

where task-list is a list of one or more backup tasks as definded in the configuration file.

The possible options are

-h
Display a help message (usage)
-v
Be verbose
-d
Run rsync with option --dry-mode (simulation mode). That means: rsync does not copy anything, it just displays what it would do.
-i
Initialize the backup repositories to be used for the specified tasks. This isn't really necessary, because rsback will try do create the necessary directories, if a backup repository does not yet exist, when a backup task is processed.
-c configuration-file
If you want to use a configuration file other than the default one, use this option to tell it rsback.

Example:

        rsback -vc /etc/rsback/test.conf work-daily misc

Scheduling backup tasks

rsback is supposed to be executed by cron jobs at regular intervals. crontab entries in our example may look like

        0 22 * * 1-5 /root/bin/rsback -v work-daily >>/var/log/rsback/work-daily.log
        0 22 * * 6 /root/bin/rsback -v work-weekly >>/var/log/rsback/work-weekly.log

The daily backup task work-daily will be executed every workday night at 22:00. The weekly backup task work-weekly is processed at Saturday night.

back to index

Changes

see CHANGELOG

back to index

References

  1. rsync: http://rsync.samba.org
  2. Cygwin: http://cygwin.com
  3. Perl: http://www.perl.com
  4. file utilities: http://www.gnu.org/software/fileutils/fileutils.html
  5. rsync documentation: http://rsync.samba.org/documentation.html
  6. Mike Rubel's examples of ``rotating rsync snapshots'': http://www.mikerubel.org/computers/rsync_snapshots

back to index

Availability

http://www.pollux.franken.de/hjb/rsback

back to index

Author

Copyright (C) 2002 by Hans-Juergen Beie <hjb@pollux.franken.de>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

back to index